Gaema AI

26 posts

Gaema AI

@GaemaAI

Computing Architecture and AI Research

USA Katılım Nisan 2026

34 Takip Edilen6 Takipçiler

Gaema AI@GaemaAI·4h

Christmas came early from @tenstorrent

English

Gaema AI@GaemaAI·4h

@KuittinenPetri @loktar00 @HealthRanger Memory hierarchy throughput is the only thing that matters.

English

Petri Kuittinen@KuittinenPetri·9h

AMD Ryzen™ AI Max+ 395 user here and you are sadly correct. It is not a good option for dense models. It can of course run them, but they will be very slow. MoE is much better option for these computers. And you can pretty much forget running video and image models as they will be slow (better get Nvidia DGX Spark or Nvidia RTX 6000 for those). And I would recommend Beelink GTR 9 Pro over that GMKtec EVO-X2 Mini PC. The price difference is now $1000 for identical setup though. It is steep price to pay for improved thermals and power handling, but for me those things matter. I run my computer on 24h7. On desktop use only +35 C, 4k gaming + AI load both on at same time, +65 C. That's crazy! My Nvidia + Intel laptop would probably burn down the house if I would try to game + do agentic AI coding same time, I have reached +103 C with it (yes, it gets burning hot to touch as well).

English

169

HealthRanger@HealthRanger·13h

If you want to run local inference with Qwen 3.6-27b or other excellent medium-sized models without buying huge, bulky, expensive workstations and NVIDIA GPUs, I've found that the GMKtec EVO-X2 Mini PC (based on AMD Ryzen with 128GB of unified RAM) is very, very good. It's small, quiet and uses very little electricity. It runs LM Studio, Ollama or other inference software, and it's fast enough with Qwen models to make it practical and usable. I've had one running for about 30 days now, non-stop, with zero issues, running inference 24/7. It has enough RAM to run even 120 billion parameter models. In my mini data center, I have this replacing bulkier, more power-hungry workstations. Only downside? It doesn't handle the common image generation models, nor video generation. But for text-based inference, it's solid, and it works with all the common text models like Qwen. Expect to pay around $3300 for this unit right now. That price will probably rise soon due to RAM shortages, resulting from the over-investment bubble into AI data centers.

English

6.1K

Gaema AI@GaemaAI·7h

Compute can be reduced to n-dimensional jigsaw puzzle.

English

Gaema AI@GaemaAI·8h

@xyster B70 should be capable of 100+ tokens per seconds fully optimized.

English

Steve💙🇨🇦@xyster·17h

May the 4th B70 be worth the hassle.. So far I'm really struggling to get 27B to get much benefit from quad cards, so I may shift gears and just try to run a larger model instead.

English

2.8K

Gaema AI@GaemaAI·8h

@loktar00 @AIatAMD @IntelAI @NVIDIAAI Architecture and stack quirks feel infinite. We have dozens of vendor stacks problems to jump through.

English

Loktar 🇺🇸@loktar00·12h

@GaemaAI @AIatAMD @IntelAI @NVIDIAAI vendor agnostic engine noice!

Português

Gaema AI@GaemaAI·2d

Engine stack now fully supports @AIatAMD RDNA2, @IntelAI Iris Xe/Xe2, @NVIDIAAI Blackwell sm_120(a) with native TQ/DF support on Gemma 4 and Qwen 3.6 Dense and MoE variants. Performance figures coming soon!

English

Gaema AI@GaemaAI·12h

TQ below int8 is😆

Indonesia

Gaema AI@GaemaAI·1d

Nvidia is pushing all of the traffic on the entire internet through one wafer. At some point you run out of data.

Dustin@r0ck3t23

Ilya Sutskever just told the AI industry why scaling is finished. One word built it. One word is about to break it. Sutskever: “Scaling is just one word, but it’s such a powerful word because it informs people what to do.” For five years, that single word replaced an entire research culture. Nobody needed breakthroughs. They needed bigger checks. Sutskever: “If you mix some compute with some data into a neural net of a certain size, you will get results, and you will know that it will be better if you just scale the recipe up.” That’s not science. That’s a recipe. Sutskever: “Companies love this because it gives you a very low risk way of investing your resources.” The most transformative technology in human history ran on the same logic used to franchise a restaurant chain. More locations. More ingredients. Same recipe. Predictable returns. You didn’t need researchers who could see around corners. You needed accountants who could approve purchase orders. But recipes expire. Sutskever: “At some point though, pre-training will run out of data. The data is very clearly finite.” Five years of infrastructure. Five years of hiring. Five years of investor decks. All built on top of something temporary. Sutskever: “I don’t think that’s true.” The co-founder of OpenAI. The mind behind the breakthroughs that made this entire era possible. Saying more money won’t solve it. Sutskever: “In some sense we are back to the age of research.” Most of the companies racing to build AGI were never research companies. They were scaling companies. They hired for execution. Not discovery. They optimized for throughput. Not insight. The talent pipelines. The investor pitches. The board decks. All built around one assumption. That the recipe would never expire. It’s expiring. And the companies that spent five years perfecting the art of spending money are about to discover something. The next era demands what capital can’t purchase. An original idea.

English

Gaema AI@GaemaAI·2d

Gaema Engine beats cuBLAS by up to 89% for LLM shapes and formats on @NVIDIAAI RTX 5090 Blackwell, pushing throughput past 4TB/s or more than double the memory bandwidth.

English

Gaema AI@GaemaAI·2d

Vulkan Compute is a thing of beauty.

English

Gaema AI@GaemaAI·2d

Intelligence now has a price. Galaxy servers amortized is ~$100K per person. Model capability will scale up with price. This is the new economic baseline for poverty.

Tenstorrent@tenstorrent

Tenstorrent Galaxy Blackhole superclusters are deployed at scale with customers @aiand_ , @Cirrascale, @VirtuFinancial, and Turiyam across a broad range of use cases from neoclouds to financial to sovereign AI.

English

Gaema AI@GaemaAI·3d

When the @IntelAI B70 uses more power at idle than a Threadripper and RTX 5090 and there's no documentation to fix it in Linux...

English

Gaema AI@GaemaAI·3d

👀

Qwen@Alibaba_Qwen

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools： 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…

ART

Gaema AI@GaemaAI·5d

@AnushElangovan We developed our own stack with some secret sauce since AMD/Intel stacks showed most computation going to scalar units instead of the systolic arrays. We have two dozen 32GB cards so we haven't tested any models past 35B. Halos would be for that. We can't DM without a follow?

English

119

Anush Elangovan@AnushElangovan·5d

@GaemaAI DM me the address and what you want to do with it

English

277

Anush Elangovan@AnushElangovan·5d

Tell us how we can make ROCm better. We are listening and we will grind it down.

Aaryaman "Jam" Vasishta@adyaman

@AnushElangovan @wild_zones @AIatAMD Build commands shared here reddit.com/r/ROCm/comment… much simpler than before.

English

108

8.6K

Gaema AI retweetledi

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·5d

News from @Intel and @SoftBank SAIMEMORY from @VLSI_2026 Paper T17.5 First demo of HB3DM ➡️ 9 layer, 3 micron per stack ➡️ 1 logic + 8 DRAM layers, ➡️ 13.7k TSVs/layer with hybrid bonding ➡️ 1.125 GB/layer, so 10 GB per stack ➡️ 0.25 Tb/sec/mm2 bandwidth ➡️ 171 mm2 die, so 10 GB at 5.3 TB/sec/stack VLSI is held June 14-18 in Honolulu.

English

303

48.1K

Gaema AI@GaemaAI·5d

@TeksEdge @intel We are heavily invested in @Intel stock. Semi is the future, not the past.

English

117

David Hendrickson@TeksEdge·5d

@GaemaAI 🤦‍♂️ @Intel, please throw us a bone! It's like they are trying to go out of business.

English

388

David Hendrickson@TeksEdge·5d

🚀 Major Step Forward for Intel AI Arc Pro Intel released OpenVINO 2026.1 with a native llama.cpp backend so now is fully optimized for the Arc Pro B70 (32GB). 🔥 What this means • Significantly faster GGUF inference on Intel GPUs • Much better memory efficiency for 20B–70B models • Strong single-GPU performance for large local LLMs • Makes Arc Pro B70 a genuinely competitive option for local AI Intel’s edge & workstation AI strategy just got a lot more serious. Thank you for the software focus. Link in ALT

English

104

11.5K

Gaema AI@GaemaAI·5d

@AdinaYakup Nvidia is trying hard to keep the moat with CUDA tiles but the floodgates are open.

English

258

Adina Yakup@AdinaYakup·5d

TileLang is an interesting one 👀 In about a year, it went from a new research project to a high performance kernel language across major accelerators ✨Jan 2025: Open sourced ✨Feb 2025: v0.1.0 ✨Mar 2025: MLA decoding in ~80 lines of Python, matching FlashMLA on H100 ✨Apr 2025: AMD MI300X support, matching hand tuned assembly ✨Sep 2025: - Huawei Ascend backend added - DeepSeek-V3.2-Exp adopts TileLang for key kernels ✨Apr 2026: -DeepSeek releases TileKernels (LLM kernel library) -DeepSeek V4 built on TileLang kernels -Qwen releases FlashQLA on top of TileLang TileLang makes high performance GPU kernel easier and offering a viable path beyond CUDA.

English

136

14.1K

Gaema AI@GaemaAI·5d

@Lei_Wang_1999

Python Trending 🇺🇦@pythontrending

tilelang - Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels github.com/tile-ai/tilela…

QAM

Keşfet

@tenstorrent @KuittinenPetri @loktar00 @HealthRanger @xyster @AIatAMD @IntelAI @NVIDIAAI