Jonghyun Choe

42 posts

Jonghyun Choe

Jonghyun Choe

@jonghyunc_

incoming inference @NVIDIA trt-llm | ml systems @CarnegieMellon

Katılım Mart 2024
263 Takip Edilen95 Takipçiler
Jonghyun Choe retweetledi
Jonghyun Choe retweetledi
Yixin Dong
Yixin Dong@yi_xin_dong·
Introducing XGrammar-2: structured generation for complex agent harnesses. Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡ From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way. Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products. 🧩 Structural Tag: one unified abstraction to describe any format your agent needs 🚀 Scales to 500+ strictly typed tools for complex agent harnesses 🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge 🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more Excited to see what agent builders create with it! Blog: blog.mlc.ai/2026/05/04/xgr… GitHub: github.com/mlc-ai/xgrammar
Yixin Dong tweet media
English
6
43
98
15.3K
Jonghyun Choe
Jonghyun Choe@jonghyunc_·
@junupark_ Yeah more or less, I think it is a matter of timing, and how remaining problems with data and reward hacking will be solved. One caveat is LLM performs better in easier DSL interface like Triton, but still terrible in CUDA, which is needed for full performance.
English
0
0
1
38
Junu Park
Junu Park@junupark_·
@jonghyunc_ i personally think ai generated kernel will be the future (with some more techniques of RL & synthetic data etc.) but I still think it gives leverage to the mlsys ppl who can actually understand the generated result
English
1
0
1
210
Jonghyun Choe
Jonghyun Choe@jonghyunc_·
A younger friend asked about my plan for the future in MLSys space out of worry from AI-generated coding. On one side, there's efforts to automate optimizations and kernels through compiler and AI, and on the other side, there's kernel engineers doing manual optimizations to squeeze the performance to the last drop. With inference demand for coding agents exploding and new Rubin GPU architecture coming out, there's always incentives to achieve maximum performance and I don't think the demand for strong kernel engineers will go away within next 2-3 years. But AI-generated kernels are improving quickly, and ignoring this trend would be at fault too. My plan is to develop expertise in kernel / compiler / scheduling layer and agentic systems at the same time so I can operate in both roles. Currently, I am working on manual kernel performance and AI-generated kernels in the lab, and in the summer, I will be working on automatic inference optimization at Nvidia. Given how quickly the field is changing, it’s better to stay flexible rather than having one fixed plan across the timeline.
English
3
0
20
1.7K
Adam Mainz
Adam Mainz@MainzOnX·
I’m excited to officially announce I have joined the torchTPU team @Google ! My years at @metaai taught me so much about gpu programming and ML / ML systems. From the start to the very last moment @PyTorch was a big part of my journey. Especially on the triton team where I worked directly on my favorite backend for torch. The move from GPU to TPU is an exciting one and I will still very much be involved in GPU land either outside of work or within as the ecosystem grows. Where I’m able to for open source you can expect a lot more TPU content coming out as I learn as well. Very excited for my Google journey and I am excited to help expand PyTorch as my career grows! Dropping the torchTPU announcement post from Tuesday in the comments too
Adam Mainz tweet media
English
37
6
316
63.6K
Jonghyun Choe
Jonghyun Choe@jonghyunc_·
@MainzOnX haha thanks! excited for the summer and getting hands-on
English
0
0
1
25
Adam Mainz
Adam Mainz@MainzOnX·
@jonghyunc_ Thanks and gl at nvidia for your internship! Just saw on your profile
English
1
0
1
194
Adam Mainz
Adam Mainz@MainzOnX·
Goodbye unemployment! New job starting today
GIF
English
7
0
21
892
Jonghyun Choe retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
We had a blasting day at CMU Catalyst Research Summit, bringing 120+ attendees on the future of agentic and systems in full stack. From applications to down to compilers and kernels
Zhihao Jia@JiaZhihao

Excited to see our inaugural CMU Catalyst Research Summit bring together 120+ attendees! A full day of discussions on the future of agentic AI systems, multi-modal AI, and ML compilation—with amazing energy from both academia and industry. Co-organized with @tqchenml @BeidiChen @Tim_Dettmers — this is just the beginning 🚀

English
0
11
53
6.3K
Jonghyun Choe
Jonghyun Choe@jonghyunc_·
@MainzOnX would love to hear how HW/SW co-design is done at Meta, and how it affects kernel design. e.g. how collaboration with accelerator, compiler, inference engine teams is done also general kernel optimization workflow (profiling, DSL selection, key tradeoffs) would be super helpful!
English
1
0
10
342
Adam Mainz
Adam Mainz@MainzOnX·
Thinking about writing blog posts / articles here again. Any topics people want? ML inference, kernel perf, cool projects from Meta etc?
English
20
6
111
17.5K
Jonghyun Choe retweetledi
Hieu Pham
Hieu Pham@hyhieu226·
Inference compute will be the dominant workload.
Awni Hannun@awnihannun

Inference compute is on track to be a massive computational workload by the end of this decade. I think it will be much bigger than training (especially if you consider RL rollouts / inference needs for training). And it's still an open playing field in terms of the hardware, the platforms, and the models. It's also increasingly clear that people are willing to pay a premium for reduced latency. On the hardware side there are several interesting directions to keep an eye on: - SRAM style setups seem promising (GPT Spark on Cerebras, Groq acquisition by Nvidia) - Disaggregated systems (prefill on one machine / processor, generation on a different one) probably make a lot of sense. The computational characteristics of prefill vs decode are so different, specializing at the hardware level will yield efficiency gains - I also wouldn't discount more exotic technology like the Taalas chip / near memory computing / etc. While they are still pretty far out from large scale deployment, the economic pressure for efficiency gains could be a catalyst On the algorithm / architecture side: - Pretty much every major open-weights model has at least one optimization which makes it faster for inference. Whether it be MoE, SSM (or other hybrid variety), or sliding window or sparse attention. There are more differences here than there were a year ago. And it will be interesting to see where we converge. - Will diffusion models unify the prefill / decode split? - Still believe there are big gains to be had in further co-design of model to hardware and workload I also don't think we will have a one-size fits all solution in the future: - Cloud-based models may look very different than edge-optimized models - Models may be more and more co-designed for the hardware they are deployed on - There will be at least one knob which trades-off latency and power efficiency / cost.

English
5
7
149
21.8K
Jonghyun Choe retweetledi
Saksham
Saksham@sgdescent·
Started a ml sys reading group with friends @SCSatCMU Systolic arrays are so cool!
Saksham tweet media
English
5
5
99
5.1K
Jonghyun Choe retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
From PyTorch to production-ready engines in days, not weeks 💡 With TensorRT LLM's new AutoDeploy feature, developers can automate the heavy lifting of inference optimization and reduce deployment time. Dive into architecture, performance numbers, and examples 👉 nvda.ws/4qwFEhe
NVIDIA AI Developer tweet media
English
13
6
93
4.5K
Jonghyun Choe retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
the world rewards audacity, not potential
English
238
883
7K
417.3K
Jonghyun Choe retweetledi
Ion Stoica
Ion Stoica@istoica05·
@ACMSIGOPS @ai4research_ucb AI isn’t replacing systems researchers; it’s shifting the job. From hand-crafting algorithms to defining problems and verifiers.
English
1
15
81
3.8K
Jonghyun Choe retweetledi
Zihao Ye
Zihao Ye@ye_combinator·
🚀 MLSys 2026 Contest - @nvidia Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention) * GDN (Gated Delta Net) Human experts AND AI agents welcome — evaluated separately. Let's see who builds the best kernels! 🤖 🎁 Prizes: Winners take home NVIDIA GPUs and are invited for presentation at MLSys 2026. ⚡ First 50 teams to register get free GPU credits from @modal - huge thanks for the sponsorship @charles_irl ! Whether you're a kernel wizard or building autonomous coding agents, we want to see what you've got. 🔗 Contest details: mlsys26.flashinfer.ai See you at MLSys 2026! 🔥
English
4
57
296
74K
Mingyi Lu
Mingyi Lu@mingyilu123·
My Thanksgiving gift this year is the official @lmsysorg account finally reaching 10K followers. Running an account is so much harder than I expected. I used to only read and never write. A tweet takes just a few seconds to read and scroll past, and half the time I didn’t even remember to tap Like. After doing it myself, I realized how much work sits behind those few seconds of attention. Sometimes it’s hours, sometimes it’s days of preparation. Respect to everyone creating content. It takes real effort and heart. Thank you to all the people who keep putting great work out there and making the online world more interesting.
Mingyi Lu tweet media
English
5
1
38
16.4K
Jonghyun Choe
Jonghyun Choe@jonghyunc_·
Fun meeting the xAI team in New York
Jonghyun Choe tweet mediaJonghyun Choe tweet mediaJonghyun Choe tweet media
English
0
0
3
203
Jonghyun Choe retweetledi
Chris Park
Chris Park@chrisparkX·
our first xAI NYC Tech Event 🔥 more than 600 engineering students from top universities across the country showed up all excited to learn about xAI. all visibly moved to join the team to contribute to xAI and X great work by the entire team who took time out of their weekend to connect and inspire the next generation of eng talent 👏
Chris Park tweet mediaChris Park tweet mediaChris Park tweet mediaChris Park tweet media
English
65
85
1K
80.1K