Aaryaman "Jam" Vasishta

614 posts

Aaryaman "Jam" Vasishta

Aaryaman "Jam" Vasishta

@adyaman

ML @ AMD Former ML+3D Engineer @ Stability AI Ex. AMD Research Engineer, RT & Neural Rendering 2021 Graduate, Computer Graphics Group @ University of Tokyo.

Tokyo, Japan Katılım Şubat 2009
989 Takip Edilen545 Takipçiler
Sabitlenmiş Tweet
Aaryaman "Jam" Vasishta
Aaryaman "Jam" Vasishta@adyaman·
Super happy to see this :) all those late nights fixing and getting ROCm + PyTorch running on Strix Halo with @scottttw on TheRock have finally paid off 🥲 x.com/AnushElangovan… It's now ready and widely available. This is just the beginning, and we're just getting started!
Anush Elangovan@AnushElangovan

Local [Superintelligence + Supercomputing] + Signed by @LisaSu 💻 🚀🚀🚀 Ryzen AI Max+ PRO 395 (Strix Halo) with ROCm * runs GPT-OSS locally * runs Battlefied 6 like a desktop * 16x Zen5 cores for builds @sama @gdb Sarah As promised please tag me if you run into any issues 🤙

English
0
0
3
562
mike64_t
mike64_t@mike64_t·
wine might actually be a better win32 implementation than win32 itself because last time I checked DirectX 9 on Windows 11 is so broken that you need dgVoodoo2 just to have ui rendering in old games
English
5
1
109
5.8K
Aaryaman "Jam" Vasishta retweetledi
Anush Elangovan
Anush Elangovan@AnushElangovan·
Speed is the moat. MI455X is both fast, and is the fastest I have seen in execution for bringup of a complex GPU platform. MI455X is right on target for shipments in 2H2026 - Irrespective of what @SemiAnalysis_ says - rev your engines because speed is coming.
English
39
62
585
163K
Aaryaman "Jam" Vasishta retweetledi
Anush Elangovan
Anush Elangovan@AnushElangovan·
The pace of OSS models suddenly seems to be picking up. We got you covered Day 0 on AMD GPUs
Andy Luo@linluo77

✅ Day 0 support of MiniMax-2.5 on AMD GPU 2 MI300X GPUs are all you need, instead of 4 Hopper GPUs, to run it in full context. uv pip install vllm --extra-index-url lnkd.in/gJdnn3kJ VLLM_ROCM_USE_AITER=1 vllm serve MiniMaxAI/MiniMax-M2.5 \ --tensor-parallel-size 2 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2_append_think \ --enable-auto-tool-choice \ --trust-remote-code

English
2
3
73
7K
Aaryaman "Jam" Vasishta retweetledi
Lei Zhang
Lei Zhang@LeiLMx·
I wrote a new blog post on Triton bespoke layouts — the traditional blocked / shared / MMA layout mechanisms still widely used inside the Triton compiler and now exposed via Gluon for more intuitive control. This builds on my earlier posts on linear layouts, and together they aim to provide a fairly complete mental model of how Triton represents and reasons about layouts. lei.chat/posts/triton-b…
English
1
24
184
15.7K
Aaryaman "Jam" Vasishta retweetledi
Anush Elangovan
Anush Elangovan@AnushElangovan·
Try it: @bkpaine1/i-use-amd-strix-halo-for-ai-video-inference-and-lora-daily-you-can-too-8b359b97e08c?postPublishedType=repub" target="_blank" rel="nofollow noopener">medium.com/@bkpaine1/i-us…
English
1
7
54
3.9K
Aaryaman "Jam" Vasishta retweetledi
Lei Zhang
Lei Zhang@LeiLMx·
It’s 2026. One of my goals this year is to more consistently write down what I learnt working on GPU software—compilers, runtimes, and performance—so others can benefit (and yes, future AI training material too!). I publish these notes at lei.chat. Many folks have told me they found them useful, so I’m trying to do this more often than once a year. 🙂
English
5
19
179
25.2K
Ashwini Vaishnaw
Ashwini Vaishnaw@AshwiniVaishnaw·
Met NVIDIA team and discussed development of sovereign GPUs and manufacturing of edge devices like DGX Spark in Bharat. ✅ This device delivers upto 1 petaFLOP performance with secure inferencing for models upto 200 billion parameters. ✅ This compact GPU doesn’t require Internet. Suitable for railways, shipping, healthcare, education and remote applications.
Ashwini Vaishnaw tweet media
English
96
428
2.4K
72.8K
Aaryaman "Jam" Vasishta retweetledi
the tiny corp
the tiny corp@__tinygrad__·
loving the density and the double wide
the tiny corp tweet media
English
10
10
349
18.7K
Aaryaman "Jam" Vasishta retweetledi
Anush Elangovan
Anush Elangovan@AnushElangovan·
I was able to almost one-shot this PR github.com/ROCm/rocm-libr… to remove boost from rocFFT with Opus. It did everything including benchmarking before / after and ensured all tests were similar.
English
1
1
22
2.1K
Aaryaman "Jam" Vasishta
teaching it to setup the proper powershell environment (usage of x64 tools cmd for vs2022 for example) and relevant environment variables. Overall it was quite enjoyable to use Claude w / Cursor to get these ports running as a POC, and I'm hopeful this motivates more such ports.
English
0
0
1
260
Aaryaman "Jam" Vasishta
So I had to gently remind it of the RDNA3 ISA bits like max. LDS size, and also teach it to use hipcc -s to dump metrics like VGPR/SGPR spills and scratch usage in order to help it optimize the kernel as much as it could. After that, it was just a matter of (3/n)
English
1
0
1
307
Aaryaman "Jam" Vasishta retweetledi
Anush Elangovan
Anush Elangovan@AnushElangovan·
⚡ CODE FOR HARDWARE CHALLENGE ⚡ Its the Holidays so we have 20 Strix Halo 128GB Laptops looking for new owners. Want one? The Deal: Fix 10 bugs in the @PyTorch or @vllm_project ROCm backlog. Start Squashing:🐛 PyTorch: github.com/orgs/pytorch/p…🐛 vLLM: github.com/orgs/vllm-proj… Post your 10 merged PRs in the messages below. Claim your Strix Halo 128GB Laptop. Let’s build. 💻🚀 <> No Purchase Necessary. While Supplies Last. Promotion ends when prize supply is exhausted. Open only to legal residents of the 50 United States (D.C.) and Canada. Must be 18 to enter and win. Void where prohibited. Sponsor: @AMD 2485 Augustine Drive, Santa Clara, CA, 95054
Anush Elangovan tweet media
English
15
25
155
54.7K