Senthilkumar Gopal

1.2K posts

Senthilkumar Gopal

@sengopal

❤️ to code and solve new problems everyday @NVIDIAAI for planet scale distributed LLM inference | @GeorgiaTech | Opinions only my own.

California, USA Inscrit le Mayıs 2009

134 Abonnements234 Abonnés

Senthilkumar Gopal retweeté

Chris Fregly@cfregly·5d

Join me today at 9am PT for an awesome set of talks on performance highlights from @nvidia GTC 2026 and AI Inference including disaggregated refill-decode with NVIDIA GPUs and RadixAttention meetup.com/ai-performance…

English

615

Senthilkumar Gopal retweeté

NVIDIA Newsroom@nvidianewsroom·17 Mar

.@togethercompute is bringing its inference research with NVIDIA Dynamo 1.0 to deliver an accelerated, cost-effective inference stack for production AI workloads.

English

2.7K

Senthilkumar Gopal retweeté

LMSYS Org@lmsysorg·4 Mar

Excited to share our latest collaboration blog with @NVIDIA on how SGLang unlocks massive inference performance gains on GB300 NVL72 (Blackwell Ultra) vs H200 in InferenceXv2! Results: 1️⃣25× throughput on GB300 NVL72 vs H200 @ 50 TPS/user 2️⃣8× performance gain on GB200 NVL72 in under 4 months 3️⃣4× TPS/User improvement in high interactivity regime on GB200 NVL72 Key techniques include: 🧠 NVFP4 GEMM optimizations tailored for MoE reasoning models 🔄 Computation–communication overlap tuned specifically for NVL72 🚀 Deep integration with NVIDIA Dynamo for disaggregated inference Huge thanks to the @NVIDIAAIDev and SGLang teams for making this happen 🙌

English

19.9K

Senthilkumar Gopal retweeté

NVIDIA AI Developer@NVIDIAAIDev·19 Şub

🧵 NVIDIA Dynamo v0.9.0 is live and it's probably our biggest infrastructure upgrade yet. Highlights this time include ✅ Sneak preview of FlashIndexer ✅ Expanded multi-modal support ✅ Removed NATS & ETCD And bonus. . . @meituan (Chinese Doordash + LLM builders) recently dropped an OSS inference engine built on @sgl_project + Dynamo. 👇

English

5.1K

Senthilkumar Gopal retweeté

NVIDIA AI Developer@NVIDIAAIDev·11 Şub

A new playlist dropped: 16 sessions on NVIDIA Dynamo covering large-scale inference in production, including disaggregated serving, KV-aware routing, MoE, and multimodal workloads. Watch on demand ▶️ nvidia.com/en-us/on-deman… Featuring presenters from @baseten, @GoogleAI, @haoailab, @_llm_d_ , @Microsoft, @Pinterest, @PrimeIntellect, @sgl_project, @tensormesh, and @vllm_project. 🙌

English

121

8.1K

Senthilkumar Gopal retweeté

NVIDIA AI Developer@NVIDIAAIDev·6 Şub

🙌 Thank you, @baseten, for being an engaged and impactful contributor in the NVIDIA Dynamo ecosystem. By running Dynamo at scale and sharing learnings from real customer workloads, you’ve helped strengthen the project for the broader community. We’ve already seen the benefits--faster TTFT, lower per-token latency, and higher throughput on long-context workloads—along with valuable contributions across Dynamo, TensorRT LLM, and your open-sourced Suffix Automaton–based MTP accelerator. Learn more 👇

Baseten@baseten

Thanks @NVIDIAAI for inviting us to Dynamo Day! We're active users of Dynamo, iterating on it in production for performance gains like 50% lower TTFT and 34% lower TPOT, and regularly shipping our work back to the community. Read some of our highlights from Dynamo Day and working with NVIDIA Dynamo here: baseten.co/blog/nvidia-dy…

English

3.3K

Senthilkumar Gopal retweeté

Vijay Janapa Reddi@profvjreddi·16 Ara

Today I’m sharing Tiny🔥Torch—an educational framework for ML systems, built from scratch. You don’t just train models, you build tensors, autograd, optimizers, and data loaders, and see how design choices affect memory, performance, and efficiency. If you use @PyTorch or @TensorFlow, this helps learners see what’s really happening under the hood. Too many students learn how to use ML frameworks, but never how to build one. Tiny🔥Torch is about closing that gap. Early, open, and still evolving, looking for fellow educators and learners. Ideas and help welcome 🙏 mlsysbook.ai/tinytorch/intr…

English

168

1.3K

79K

Senthilkumar Gopal@sengopal·28 Eki

I just realized that the volume of work done by creatives, as described by @AdamMGrant in #Originals is due to "Agency" 😃

Andrej Karpathy@karpathy

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency? Grok explanation is ~close: “Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path. People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next. It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”

English

Senthilkumar Gopal retweeté

vLLM@vllm_project·10 Eki

🤩 checkout this blog from @awscloud about scaling Rufus, which is powered by @vllm_project on Inferentia and Trainium! Serving 3 million tokens a minute! "Within each container, an NVIDIA Triton Inference Server with a Python backend is used running vLLM with the Neuron SDK. vLLM is a memory-efficient inference and serving engine that is optimized for high throughput." "These choices allowed Rufus to scale up over 80,000 Trainium and Inferentia chips across three Regions serving an average of 3 million tokens a minute while maintaining P99 less than 1 second latency to the first response for Prime Day customers." aws.amazon.com/blogs/machine-…

English

10.5K

Senthilkumar Gopal@sengopal·18 Nis

Ah.. but the distribution of potential answers is known.. 😀 doesn't matter if it is random numbers or picking a response to an open ended question. With copious amount of data, central limit theorum always wins..

Andrej Karpathy@karpathy

Consider being a labeler for an LLM. The prompt is “give me a random number between 1 and 10”. What SFT & RM labels do you contribute? What does this do the network when trained on? In subtle way this problem is present in every prompt that does not have a single unique answer.

English

Senthilkumar Gopal retweeté

Woosuk Kwon@woosuk_k·2 Mar

The new vLLM release includes some optimizations for Gemma and Mixtral, and finally supports 8-bit GPTQ. Please give it a try!

Simon Mo@simon_mo_

vLLM v0.3.3 is released with Starcoder2 @BigCodeProject and Inferentia @awscloud support. I'm also excited about the addition of guided decoding* (JSON, regex) in server leveraging @OutlinesOSS. *experimental, the schema take some time to compile but will be cached.

English

Senthilkumar Gopal@sengopal·2 Ara

And even more integrations are coming in #NeuronSDK to make it simpler to onboard any model quickly :) Check it out awsdocs-neuron.readthedocs-hosted.com/en/latest/libr… #Trainium

Tristan Hume@trishume

I think TPU and Trainium optimization is even more fun than GPU optimization. The architectures are simpler and more like a puzzle, and our performance analysis tools are better than GPU ones. See Trainium's trace of every instruction: awsdocs-neuron.readthedocs-hosted.com/en/latest/tool…

English

Senthilkumar Gopal@sengopal·24 Kas

An excellent thought provoking lecture about LLMs as a central processing OS - youtu.be/zjkBMFhNj_g?t=… from @karpathy Calm in the eye of the OpenAI storm :)

YouTube

English

Senthilkumar Gopal@sengopal·5 May

Wow. They are extremely lucky to have @isbellHFh He is one of the main reasons we all took OMSCS :)

Michael Littman@mlittmancs

So proud of my friend Charles Isbell @isbellHFh , who will be U. Wisconsin's next provost, starting this Fall. For those not steeped in academic terminology, that's basically the school's CEO (one step below president). news.wisc.edu/charles-lee-is…

English

453

Senthilkumar Gopal@sengopal·1 Nis

Excited to write about #embeddings and #deeplearning usage to power visual discovery for ecommerce @eBay @eBayNewsroom tech.ebayinc.com/engineering/ho… #MachineLearning #DeepLearning #ComputerVision

English

Senthilkumar Gopal@sengopal·11 Şub

@BarstoolGT @isbellHFh is the GOAT.. from all of us OMSCS grads.. the ML and RL courses with @mlittmancs is brilliant!!

English

454

Barstool Georgia Tech@BarstoolGT·9 Şub

Who’s the best GT Professor you’ve ever had? ⬇️

English

17K

Senthilkumar Gopal@sengopal·25 Eki

This is happening today ☺️ @AIDevWorld Join me in our discussion about scaling embedding models. embed.emamo.com/event/api-worl… #MachineLearning #embeddings #scale #Mlinproduction

English

Senthilkumar Gopal retweeté

Jeremy Howard@jeremyphoward·16 Eyl

Big news: we're launching a new course in <4 weeks. "From Deep Learning Foundations to Stable Diffusion". Bigger news: for this course, we're teaming up with @StabilityAI! AFAIK, this is the 1st course that covers every method used in Stable Diffusion. fast.ai/posts/part2-20…

English

608

3.1K

Senthilkumar Gopal retweeté

eBay Tech@ebaytech·29 Ağu

Our Coded Coupons tool gives sellers flexibility and control over how they offer discounts to customers. ebayinc.to/3ADrd1G

English

Senthilkumar Gopal@sengopal·21 Ağu

@karpathy @blakecrouch1 - Recursion, Dark Matter and recently Upgrade. Excellent novels with a great scientific foundation , intertwined with human psychology and paced wonderfully.

English

Andrej Karpathy@karpathy·8 Tem

Enumerated and sorted some sci-fi I've read over time karpathy.ai/books.html seeking more favorites!

English

163

891

Découvrir

@nvidia @togethercompute @NVIDIA @NVIDIAAIDev @meituan @sgl_project @baseten @GoogleAI