Aaryaman "Jam" Vasishta

614 posts

Aaryaman "Jam" Vasishta

@adyaman

ML @ AMD Former ML+3D Engineer @ Stability AI Ex. AMD Research Engineer, RT & Neural Rendering 2021 Graduate, Computer Graphics Group @ University of Tokyo.

Tokyo, Japan Katılım Şubat 2009

989 Takip Edilen545 Takipçiler

Sabitlenmiş Tweet

Aaryaman "Jam" Vasishta@adyaman·22 Eki

Super happy to see this :) all those late nights fixing and getting ROCm + PyTorch running on Strix Halo with @scottttw on TheRock have finally paid off 🥲 x.com/AnushElangovan… It's now ready and widely available. This is just the beginning, and we're just getting started!

Anush Elangovan@AnushElangovan

Local [Superintelligence + Supercomputing] + Signed by @LisaSu 💻 🚀🚀🚀 Ryzen AI Max+ PRO 395 (Strix Halo) with ROCm * runs GPT-OSS locally * runs Battlefied 6 like a desktop * 16x Zen5 cores for builds @sama @gdb Sarah As promised please tag me if you run into any issues 🤙

English

562

Aaryaman "Jam" Vasishta@adyaman·5d

@xennygrimmato_ @mike64_t Agreed. I tend to find older games running smoother on wine than on windows 11

English

Vaibhav Tulsyan@xennygrimmato_·5d

@mike64_t @adyaman wdyt?

Polski

mike64_t@mike64_t·25 Mar

wine might actually be a better win32 implementation than win32 itself because last time I checked DirectX 9 on Windows 11 is so broken that you need dgVoodoo2 just to have ui rendering in old games

English

109

5.8K

Aaryaman "Jam" Vasishta retweetledi

the tiny corp@__tinygrad__·3 Mar

And now it's documented. For people serious about performance, this is a real AMD advantage. Thanks @AnushElangovan for pushing this through! Code is here: github.com/ROCm/rocm-syst…

English

367

33.6K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·17 Şub

BS. Right on target for H2 2026.

Daniel Romero@HyperTechInvest

SemiAnalysis: Engineering samples and low-volume production of $AMD's first rack-scale MI455X system will begin in H2 2026. Due to manufacturing delays, mass production ramp, and the first production tokens generated on the MI455X will not occur until Q2 2027

English

421

67.3K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·17 Şub

Speed is the moat. MI455X is both fast, and is the fastest I have seen in execution for bringup of a complex GPU platform. MI455X is right on target for shipments in 2H2026 - Irrespective of what @SemiAnalysis_ says - rev your engines because speed is coming.

English

585

163K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·14 Şub

The pace of OSS models suddenly seems to be picking up. We got you covered Day 0 on AMD GPUs

Andy Luo@linluo77

✅ Day 0 support of MiniMax-2.5 on AMD GPU 2 MI300X GPUs are all you need, instead of 4 Hopper GPUs, to run it in full context. uv pip install vllm --extra-index-url lnkd.in/gJdnn3kJ VLLM_ROCM_USE_AITER=1 vllm serve MiniMaxAI/MiniMax-M2.5 \ --tensor-parallel-size 2 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2_append_think \ --enable-auto-tool-choice \ --trust-remote-code

English

Aaryaman "Jam" Vasishta retweetledi

Lei Zhang@LeiLMx·1 Şub

I wrote a new blog post on Triton bespoke layouts — the traditional blocked / shared / MMA layout mechanisms still widely used inside the Triton compiler and now exposed via Gluon for more intuitive control. This builds on my earlier posts on linear layouts, and together they aim to provide a fairly complete mental model of how Triton represents and reasons about layouts. lei.chat/posts/triton-b…

English

184

15.7K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·2 Şub

ROCm 7.2 Perf improvements.. Keep the feedback coming and we will keep pushing.. reddit.com/r/ROCm/comment…

English

4.7K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·22 Oca

The future of GPU programming is agentic. reddit.com/r/AMD_Stock/co…

English

386

195.7K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·13 Oca

Try it: @bkpaine1/i-use-amd-strix-halo-for-ai-video-inference-and-lora-daily-you-can-too-8b359b97e08c?postPublishedType=repub" target="_blank" rel="nofollow noopener">medium.com/@bkpaine1/i-us…

English

3.9K

Aaryaman "Jam" Vasishta retweetledi

Lei Zhang@LeiLMx·12 Oca

It’s 2026. One of my goals this year is to more consistently write down what I learnt working on GPU software—compilers, runtimes, and performance—so others can benefit (and yes, future AI training material too!). I publish these notes at lei.chat. Many folks have told me they found them useful, so I’m trying to do this more often than once a year. 🙂

English

179

25.2K

Aaryaman "Jam" Vasishta@adyaman·8 Oca

@AshwiniVaishnaw Do consider using AMD’s AI box as well. It’ll be more accessible to a wider audience due to it supporting windows and x86 applications out of the box. amd.com/en/products/pr…

English

253

Ashwini Vaishnaw@AshwiniVaishnaw·8 Oca

Met NVIDIA team and discussed development of sovereign GPUs and manufacturing of edge devices like DGX Spark in Bharat. ✅ This device delivers upto 1 petaFLOP performance with secure inferencing for models upto 200 billion parameters. ✅ This compact GPU doesn’t require Internet. Suitable for railways, shipping, healthcare, education and remote applications.

English

428

2.4K

72.8K

Aaryaman "Jam" Vasishta retweetledi

the tiny corp@__tinygrad__·7 Oca

loving the density and the double wide

English

349

18.7K

Aaryaman "Jam" Vasishta@adyaman·6 Oca

@HotAisle @AnushElangovan This has already been fixed as of rocWMMA 2.1.0. See #issuecomment-3339616481" target="_blank" rel="nofollow noopener">github.com/ROCm/rocWMMA/i… To use it, as you mentioned earlier, you can build using TheRock.

English

Hot Aisle@HotAisle·5 Oca

It is 2026 and I'm still seeing things like this. @AnushElangovan

English

1.9K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·1 Oca

I was able to almost one-shot this PR github.com/ROCm/rocm-libr… to remove boost from rocFFT with Opus. It did everything including benchmarking before / after and ensured all tests were similar.

English

2.1K

Aaryaman "Jam" Vasishta@adyaman·1 Oca

With this, I was able to generate a 480p video using TurboDiffusion (Wan2.1 1.3b quantized) in about 2-3 minutes on a strix halo (framework desktop).. all on Windows!

Aaryaman "Jam" Vasishta@adyaman

I have an initial POC of TurboDiffusion, SpargeAttn, triton-windows, all running on AMD Radeon, with assistance from Claude 4.5 Opus w/ cursor: github.com/thu-ml/TurboDi… github.com/thu-ml/SpargeA… github.com/woct0rdho/trit… (1/n)

English

1.1K

Aaryaman "Jam" Vasishta@adyaman·1 Oca

teaching it to setup the proper powershell environment (usage of x64 tools cmd for vs2022 for example) and relevant environment variables. Overall it was quite enjoyable to use Claude w / Cursor to get these ports running as a POC, and I'm hopeful this motivates more such ports.

English

260

Aaryaman "Jam" Vasishta@adyaman·1 Oca

So I had to gently remind it of the RDNA3 ISA bits like max. LDS size, and also teach it to use hipcc -s to dump metrics like VGPR/SGPR spills and scratch usage in order to help it optimize the kernel as much as it could. After that, it was just a matter of (3/n)

English

307

Aaryaman "Jam" Vasishta@adyaman·1 Oca

English

9.8K

Aaryaman "Jam" Vasishta retweetledi

Anush Elangovan@AnushElangovan·18 Ara

⚡ CODE FOR HARDWARE CHALLENGE ⚡ Its the Holidays so we have 20 Strix Halo 128GB Laptops looking for new owners. Want one? The Deal: Fix 10 bugs in the @PyTorch or @vllm_project ROCm backlog. Start Squashing:🐛 PyTorch: github.com/orgs/pytorch/p…🐛 vLLM: github.com/orgs/vllm-proj… Post your 10 merged PRs in the messages below. Claim your Strix Halo 128GB Laptop. Let’s build. 💻🚀 <> No Purchase Necessary. While Supplies Last. Promotion ends when prize supply is exhausted. Open only to legal residents of the 50 United States (D.C.) and Canada. Must be 18 to enter and win. Void where prohibited. Sponsor: @AMD 2485 Augustine Drive, Santa Clara, CA, 95054

English

155

54.7K

Keşfet

@xennygrimmato_ @mike64_t @AnushElangovan @SemiAnalysis_ @AshwiniVaishnaw @HotAisle @elonmusk @BarackObama