Jonghyun Choe (@jonghyunc_) - Twitter Profili | Zamantika Mersobahis Locabet

Jonghyun Choe retweetledi

Really excited to announce XGrammar2, specially designed and optimized for dyanmic agent tool calling. Brings structural tag abstraction to describe any format your agent needs, scales to 500+ strictly typed tools and integrated into vLLM, SGLang, TensorRT-LLM, and more

Yixin Dong@yi_xin_dong

Introducing XGrammar-2: structured generation for complex agent harnesses. Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡ From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way. Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products. 🧩 Structural Tag: one unified abstraction to describe any format your agent needs 🚀 Scales to 500+ strictly typed tools for complex agent harnesses 🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge 🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more Excited to see what agent builders create with it! Blog: blog.mlc.ai/2026/05/04/xgr… GitHub: github.com/mlc-ai/xgrammar

English

1

9

36

6.6K

Jonghyun Choe retweetledi

Yixin Dong@yi_xin_dong·11h

Introducing XGrammar-2: structured generation for complex agent harnesses. Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡ From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way. Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products. 🧩 Structural Tag: one unified abstraction to describe any format your agent needs 🚀 Scales to 500+ strictly typed tools for complex agent harnesses 🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge 🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more Excited to see what agent builders create with it! Blog: blog.mlc.ai/2026/05/04/xgr… GitHub: github.com/mlc-ai/xgrammar

English

6

43

98

15.3K

Jonghyun Choe@jonghyunc_·15 Nis

@junupark_ Yeah more or less, I think it is a matter of timing, and how remaining problems with data and reward hacking will be solved. One caveat is LLM performs better in easier DSL interface like Triton, but still terrible in CUDA, which is needed for full performance.

English

0

1

38

Junu Park@junupark_·15 Nis

@jonghyunc_ i personally think ai generated kernel will be the future (with some more techniques of RL & synthetic data etc.) but I still think it gives leverage to the mlsys ppl who can actually understand the generated result

English

1

0

1

210

Jonghyun Choe@jonghyunc_·14 Nis

A younger friend asked about my plan for the future in MLSys space out of worry from AI-generated coding. On one side, there's efforts to automate optimizations and kernels through compiler and AI, and on the other side, there's kernel engineers doing manual optimizations to squeeze the performance to the last drop. With inference demand for coding agents exploding and new Rubin GPU architecture coming out, there's always incentives to achieve maximum performance and I don't think the demand for strong kernel engineers will go away within next 2-3 years. But AI-generated kernels are improving quickly, and ignoring this trend would be at fault too. My plan is to develop expertise in kernel / compiler / scheduling layer and agentic systems at the same time so I can operate in both roles. Currently, I am working on manual kernel performance and AI-generated kernels in the lab, and in the summer, I will be working on automatic inference optimization at Nvidia. Given how quickly the field is changing, it’s better to stay flexible rather than having one fixed plan across the timeline.

English

3

0

20

1.7K

Jonghyun Choe@jonghyunc_·11 Nis

@MainzOnX @Google @metaai @PyTorch that’s amazing! congratz and looking forward to learning more from your posts :)

English

1

0

1

214

Adam Mainz@MainzOnX·11 Nis

I’m excited to officially announce I have joined the torchTPU team @Google ! My years at @metaai taught me so much about gpu programming and ML / ML systems. From the start to the very last moment @PyTorch was a big part of my journey. Especially on the triton team where I worked directly on my favorite backend for torch. The move from GPU to TPU is an exciting one and I will still very much be involved in GPU land either outside of work or within as the ecosystem grows. Where I’m able to for open source you can expect a lot more TPU content coming out as I learn as well. Very excited for my Google journey and I am excited to help expand PyTorch as my career grows! Dropping the torchTPU announcement post from Tuesday in the comments too

English

37

6

316

63.6K

Jonghyun Choe@jonghyunc_·6 Nis

@MainzOnX haha thanks! excited for the summer and getting hands-on

English

0

1

25

Adam Mainz@MainzOnX·6 Nis

@jonghyunc_ Thanks and gl at nvidia for your internship! Just saw on your profile

English

1

0

1

194

Adam Mainz@MainzOnX·6 Nis

Goodbye unemployment! New job starting today

GIF

English

7

0

21

892

Jonghyun Choe retweetledi

Tianqi Chen@tqchenml·29 Mar

We had a blasting day at CMU Catalyst Research Summit, bringing 120+ attendees on the future of agentic and systems in full stack. From applications to down to compilers and kernels

Zhihao Jia@JiaZhihao

Excited to see our inaugural CMU Catalyst Research Summit bring together 120+ attendees! A full day of discussions on the future of agentic AI systems, multi-modal AI, and ML compilation—with amazing energy from both academia and industry. Co-organized with @tqchenml @BeidiChen @Tim_Dettmers — this is just the beginning 🚀

English

0

11

53

6.3K

Jonghyun Choe@jonghyunc_·29 Mar

Loved the event! Especially like the creative takes on agentic systems and shift parallelism

Zhihao Jia@JiaZhihao

Excited to see our inaugural CMU Catalyst Research Summit bring together 120+ attendees! A full day of discussions on the future of agentic AI systems, multi-modal AI, and ML compilation—with amazing energy from both academia and industry. Co-organized with @tqchenml @BeidiChen @Tim_Dettmers — this is just the beginning 🚀

English

0

4

225

Jonghyun Choe retweetledi

Stuart Sul@stuart_sul·25 Mar

Happy to share this technical report! Building MXFP8/NVFP4 training kernels for Composer 2 with ThunderKittens/ParallelKittens was a lot of fun. We share some details in the report, including our novel variant of NVFP4:

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

10

21

199

19K

Jonghyun Choe@jonghyunc_·17 Mar

@MainzOnX would love to hear how HW/SW co-design is done at Meta, and how it affects kernel design. e.g. how collaboration with accelerator, compiler, inference engine teams is done also general kernel optimization workflow (profiling, DSL selection, key tradeoffs) would be super helpful!

English

1

0

10

342

Adam Mainz@MainzOnX·17 Mar

Thinking about writing blog posts / articles here again. Any topics people want? ML inference, kernel perf, cool projects from Meta etc?

English

20

6

111

17.5K

Jonghyun Choe retweetledi

Hieu Pham@hyhieu226·5 Mar

Inference compute will be the dominant workload.

Awni Hannun@awnihannun

Inference compute is on track to be a massive computational workload by the end of this decade. I think it will be much bigger than training (especially if you consider RL rollouts / inference needs for training). And it's still an open playing field in terms of the hardware, the platforms, and the models. It's also increasingly clear that people are willing to pay a premium for reduced latency. On the hardware side there are several interesting directions to keep an eye on: - SRAM style setups seem promising (GPT Spark on Cerebras, Groq acquisition by Nvidia) - Disaggregated systems (prefill on one machine / processor, generation on a different one) probably make a lot of sense. The computational characteristics of prefill vs decode are so different, specializing at the hardware level will yield efficiency gains - I also wouldn't discount more exotic technology like the Taalas chip / near memory computing / etc. While they are still pretty far out from large scale deployment, the economic pressure for efficiency gains could be a catalyst On the algorithm / architecture side: - Pretty much every major open-weights model has at least one optimization which makes it faster for inference. Whether it be MoE, SSM (or other hybrid variety), or sliding window or sparse attention. There are more differences here than there were a year ago. And it will be interesting to see where we converge. - Will diffusion models unify the prefill / decode split? - Still believe there are big gains to be had in further co-design of model to hardware and workload I also don't think we will have a one-size fits all solution in the future: - Cloud-based models may look very different than edge-optimized models - Models may be more and more co-designed for the hardware they are deployed on - There will be at least one knob which trades-off latency and power efficiency / cost.

English

5

7

149

21.8K

Jonghyun Choe retweetledi

Saksham@sgdescent·27 Şub

Started a ml sys reading group with friends @SCSatCMU Systolic arrays are so cool!

English

5

99

5.1K

Jonghyun Choe retweetledi

NVIDIA AI Developer@NVIDIAAIDev·12 Şub

From PyTorch to production-ready engines in days, not weeks 💡 With TensorRT LLM's new AutoDeploy feature, developers can automate the heavy lifting of inference optimization and reduce deployment time. Dive into architecture, performance numbers, and examples 👉 nvda.ws/4qwFEhe

English

13

6

93

4.5K

Jonghyun Choe retweetledi

Logan Kilpatrick@OfficialLoganK·7 Şub

the world rewards audacity, not potential

English

238

883

7K

417.3K

Jonghyun Choe retweetledi

Ion Stoica@istoica05·4 Şub

@ACMSIGOPS @ai4research_ucb AI isn’t replacing systems researchers; it’s shifting the job. From hand-crafting algorithms to defining problems and verifiers.

English

1

15

81

3.8K

Jonghyun Choe retweetledi

Zihao Ye@ye_combinator·24 Oca

🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥

English

4

57

296

74K

Jonghyun Choe@jonghyunc_·28 Kas

@mingyilu123 @lmsysorg Always learning a lot from your posts. Thanks!

English

1

0

1

108

Mingyi Lu@mingyilu123·28 Kas

My Thanksgiving gift this year is the official @lmsysorg account finally reaching 10K followers. Running an account is so much harder than I expected. I used to only read and never write. A tweet takes just a few seconds to read and scroll past, and half the time I didn’t even remember to tap Like. After doing it myself, I realized how much work sits behind those few seconds of attention. Sometimes it’s hours, sometimes it’s days of preparation. Respect to everyone creating content. It takes real effort and heart. Thank you to all the people who keep putting great work out there and making the online world more interesting.

English

5

1

38

16.4K

Jonghyun Choe@jonghyunc_·19 Kas

Fun meeting the xAI team in New York

English

0

3

203

Jonghyun Choe retweetledi

Chris Park@chrisparkX·16 Kas

our first xAI NYC Tech Event 🔥 more than 600 engineering students from top universities across the country showed up all excited to learn about xAI. all visibly moved to join the team to contribute to xAI and X great work by the entire team who took time out of their weekend to connect and inspire the next generation of eng talent 👏

English

65

85

1K

80.1K

Jonghyun Choe

Keşfet