Gaema AI

26 posts

Gaema AI

Gaema AI

@GaemaAI

Computing Architecture and AI Research

USA Katılım Nisan 2026
34 Takip Edilen6 Takipçiler
Petri Kuittinen
Petri Kuittinen@KuittinenPetri·
AMD Ryzen™ AI Max+ 395 user here and you are sadly correct. It is not a good option for dense models. It can of course run them, but they will be very slow. MoE is much better option for these computers. And you can pretty much forget running video and image models as they will be slow (better get Nvidia DGX Spark or Nvidia RTX 6000 for those). And I would recommend Beelink GTR 9 Pro over that GMKtec EVO-X2 Mini PC. The price difference is now $1000 for identical setup though. It is steep price to pay for improved thermals and power handling, but for me those things matter. I run my computer on 24h7. On desktop use only +35 C, 4k gaming + AI load both on at same time, +65 C. That's crazy! My Nvidia + Intel laptop would probably burn down the house if I would try to game + do agentic AI coding same time, I have reached +103 C with it (yes, it gets burning hot to touch as well).
English
1
1
4
169
HealthRanger
HealthRanger@HealthRanger·
If you want to run local inference with Qwen 3.6-27b or other excellent medium-sized models without buying huge, bulky, expensive workstations and NVIDIA GPUs, I've found that the GMKtec EVO-X2 Mini PC (based on AMD Ryzen with 128GB of unified RAM) is very, very good. It's small, quiet and uses very little electricity. It runs LM Studio, Ollama or other inference software, and it's fast enough with Qwen models to make it practical and usable. I've had one running for about 30 days now, non-stop, with zero issues, running inference 24/7. It has enough RAM to run even 120 billion parameter models. In my mini data center, I have this replacing bulkier, more power-hungry workstations. Only downside? It doesn't handle the common image generation models, nor video generation. But for text-based inference, it's solid, and it works with all the common text models like Qwen. Expect to pay around $3300 for this unit right now. That price will probably rise soon due to RAM shortages, resulting from the over-investment bubble into AI data centers.
English
16
4
95
6.1K
Gaema AI
Gaema AI@GaemaAI·
Compute can be reduced to n-dimensional jigsaw puzzle.
English
0
0
0
7
Gaema AI
Gaema AI@GaemaAI·
@xyster B70 should be capable of 100+ tokens per seconds fully optimized.
English
0
0
0
31
Steve💙🇨🇦
Steve💙🇨🇦@xyster·
May the 4th B70 be worth the hassle.. So far I'm really struggling to get 27B to get much benefit from quad cards, so I may shift gears and just try to run a larger model instead.
Steve💙🇨🇦 tweet media
English
15
1
35
2.8K
Gaema AI
Gaema AI@GaemaAI·
Engine stack now fully supports @AIatAMD RDNA2, @IntelAI Iris Xe/Xe2, @NVIDIAAI Blackwell sm_120(a) with native TQ/DF support on Gemma 4 and Qwen 3.6 Dense and MoE variants. Performance figures coming soon!
English
1
0
0
34
Gaema AI
Gaema AI@GaemaAI·
TQ below int8 is😆
Indonesia
0
0
0
6
Gaema AI
Gaema AI@GaemaAI·
Nvidia is pushing all of the traffic on the entire internet through one wafer. At some point you run out of data.
Dustin@r0ck3t23

Ilya Sutskever just told the AI industry why scaling is finished. One word built it. One word is about to break it. Sutskever: “Scaling is just one word, but it’s such a powerful word because it informs people what to do.” For five years, that single word replaced an entire research culture. Nobody needed breakthroughs. They needed bigger checks. Sutskever: “If you mix some compute with some data into a neural net of a certain size, you will get results, and you will know that it will be better if you just scale the recipe up.” That’s not science. That’s a recipe. Sutskever: “Companies love this because it gives you a very low risk way of investing your resources.” The most transformative technology in human history ran on the same logic used to franchise a restaurant chain. More locations. More ingredients. Same recipe. Predictable returns. You didn’t need researchers who could see around corners. You needed accountants who could approve purchase orders. But recipes expire. Sutskever: “At some point though, pre-training will run out of data. The data is very clearly finite.” Five years of infrastructure. Five years of hiring. Five years of investor decks. All built on top of something temporary. Sutskever: “I don’t think that’s true.” The co-founder of OpenAI. The mind behind the breakthroughs that made this entire era possible. Saying more money won’t solve it. Sutskever: “In some sense we are back to the age of research.” Most of the companies racing to build AGI were never research companies. They were scaling companies. They hired for execution. Not discovery. They optimized for throughput. Not insight. The talent pipelines. The investor pitches. The board decks. All built around one assumption. That the recipe would never expire. It’s expiring. And the companies that spent five years perfecting the art of spending money are about to discover something. The next era demands what capital can’t purchase. An original idea.

English
0
0
0
30
Gaema AI
Gaema AI@GaemaAI·
Gaema Engine beats cuBLAS by up to 89% for LLM shapes and formats on @NVIDIAAI RTX 5090 Blackwell, pushing throughput past 4TB/s or more than double the memory bandwidth.
English
0
0
0
30
Gaema AI
Gaema AI@GaemaAI·
Vulkan Compute is a thing of beauty.
English
0
0
0
11
Gaema AI
Gaema AI@GaemaAI·
When the @IntelAI B70 uses more power at idle than a Threadripper and RTX 5090 and there's no documentation to fix it in Linux...
Gaema AI tweet media
English
0
0
0
28
Gaema AI
Gaema AI@GaemaAI·
@AnushElangovan We developed our own stack with some secret sauce since AMD/Intel stacks showed most computation going to scalar units instead of the systolic arrays. We have two dozen 32GB cards so we haven't tested any models past 35B. Halos would be for that. We can't DM without a follow?
English
0
0
0
119
Gaema AI retweetledi
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠
News from @Intel and @SoftBank SAIMEMORY from @VLSI_2026 Paper T17.5 First demo of HB3DM ➡️ 9 layer, 3 micron per stack ➡️ 1 logic + 8 DRAM layers, ➡️ 13.7k TSVs/layer with hybrid bonding ➡️ 1.125 GB/layer, so 10 GB per stack ➡️ 0.25 Tb/sec/mm2 bandwidth ➡️ 171 mm2 die, so 10 GB at 5.3 TB/sec/stack VLSI is held June 14-18 in Honolulu.
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠 tweet media𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠 tweet media𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠 tweet media
English
8
57
303
48.1K
David Hendrickson
David Hendrickson@TeksEdge·
🚀 Major Step Forward for Intel AI Arc Pro Intel released OpenVINO 2026.1 with a native llama.cpp backend so now is fully optimized for the Arc Pro B70 (32GB). 🔥 What this means • Significantly faster GGUF inference on Intel GPUs • Much better memory efficiency for 20B–70B models • Strong single-GPU performance for large local LLMs • Makes Arc Pro B70 a genuinely competitive option for local AI Intel’s edge & workstation AI strategy just got a lot more serious. Thank you for the software focus. Link in ALT
David Hendrickson tweet media
English
13
15
104
11.5K
Gaema AI
Gaema AI@GaemaAI·
@AdinaYakup Nvidia is trying hard to keep the moat with CUDA tiles but the floodgates are open.
English
0
0
0
258
Adina Yakup
Adina Yakup@AdinaYakup·
TileLang is an interesting one 👀 In about a year, it went from a new research project to a high performance kernel language across major accelerators ✨Jan 2025: Open sourced ✨Feb 2025: v0.1.0 ✨Mar 2025: MLA decoding in ~80 lines of Python, matching FlashMLA on H100 ✨Apr 2025: AMD MI300X support, matching hand tuned assembly ✨Sep 2025: - Huawei Ascend backend added - DeepSeek-V3.2-Exp adopts TileLang for key kernels ✨Apr 2026: -DeepSeek releases TileKernels (LLM kernel library) -DeepSeek V4 built on TileLang kernels -Qwen releases FlashQLA on top of TileLang TileLang makes high performance GPU kernel easier and offering a viable path beyond CUDA.
Adina Yakup tweet media
English
2
20
136
14.1K