Fred Oh

4.2K posts

Fred Oh

@fredo_ai

Seasoned product marketing professional specializing in GPU Acceleration, AI/ML, Data Analytics/Integration, Data Storage, and IoT GTM Strategy and Execution

Santa Clara, CA Katılım Eylül 2010

244 Takip Edilen1.1K Takipçiler

Fred Oh retweetledi

NVIDIA AI@NVIDIAAI·25 Nis

Happy Friday! We just put DeepSeek-V4-Pro up on build.nvidia.com. It’s the world’s largest open source model at 1.6T parameters, and you can run it for free running on NVIDIA Blackwell GPUs. Try the NVIDIA NIM API → build.nvidia.com/deepseek-ai/de…

English

294

2.6K

200.4K

Fred Oh retweetledi

NVIDIA AI Developer@NVIDIAAIDev·11 Nis

Introducing DWDP = “Distributed Weight Data Parallelism” for MoE LLM inference using NVL72-based clusters from #NVIDIAResearch. Keep data parallel, offload MoE experts across GPUs, fetch on demand with P2P copies, and drop layer‑wise collectives to accelerate your parallel LLM inference projects. Shows gain of ~9–11% higher output TPS/GPU on DeepSeek‑R1 at similar TPS/user, with bigger wins under imbalanced traffic. 📗 arxiv.org/abs/2604.01621

English

137

9.9K

Fred Oh retweetledi

Inferact@inferact·10 Mar

We are thrilled to announce that @nvidia is the latest investor in @inferact. We look forward to continuing the momentum driven by our deep collaboration: (1) Engineering velocity: a significant uptick in @nvidia pull requests to the @vllm_project repo. (2) Product synergy: close integration with NVIDIA Dynamo, ModelOpt, Nemotron, and more products! It’s an exciting time for the growth and development of vLLM, the world's AI inference engine!

English

29.8K

Fred Oh retweetledi

Jon Hernandez@JonhernandezIA·7 Oca

📁 Jensen Huang, CEO of Nvidia, says data centers are no longer supercomputers, they are AI factories. They don’t produce answers, they produce tokens. And those factories exist for one obsession train faster, arrive sooner, lead first. The jump from Blackwell to Rubin delivers a 4x speed increase. One month of training instead of four changes the entire race.

English

344

30.1K

Fred Oh retweetledi

NVIDIA AI Developer@NVIDIAAIDev·29 Eyl

🎊Congrats to @lmsysorg for advancing DeepSeek V3/R1 inference. ⚡️On NVIDIA GB200 NVL72, they’re achieving 26k input tokens/s and 13k output tokens/s per GPU — a nearly 4× / 5× speedup vs H100. They achieved this with NVFP4 MoE, FP8 attention, scaling-down expert parallelism by offloading weights via a 900GB/s bidirectional CPU-GPU interface, and compute-communication overlap. This work highlights how collaborative AI platforms with active communities speed learning and build stronger, high-performance infrastructure through shared expertise.👇

LMSYS Org@lmsysorg

🚀 Follow-up to our last breakthrough on DeepSeek V3/R1 inference! On NVIDIA GB200 NVL72, SGLang now achieves 26k input tokens/s and 13k output tokens/s per GPU with FP8 attention + NVFP4 MoE - that’s a 3.8× / 4.8× speedup vs H100 settings. See the details in the 🧵 (1/4)

English

198

21.5K

Fred Oh@fredo_ai·2 Haz

✨ NVIDIA DALI just got even better. New features make it easier to integrate with #PyTorch, boost video processing with smarter selective decoding, and optimize memory for flexible CPU ↔️ GPU data flows. Read more in our tech blog ➡️ nvda.ws/4dEH58s

English

Fred Oh retweetledi

NVIDIA AI Infrastructure@NVIDIAAIInfra·18 Nis

Scientists and engineers are using NVIDIA #CUDA-X libraries powered by the NVIDIA GB200 and GH200 Superchips to solve the world's toughest challenges. Read our #GTC25 blog to dive into these incredible scientific use cases. ⤵️ nvda.ws/4lCfjgn

English

1.4K

Fred Oh retweetledi

NVIDIA HPC Developer@NVIDIAHPCDev·4 Nis

Discover how #CUDA-X empowers you -- as developers, researchers, and engineers -- to drive scientific discovery, transform #AI, and redefine computation to make real-time digital twins truly possible. Watch the demo 📹 youtube.com/watch?v=CyP0Su… #GTC25

YouTube

English

1.1K

Fred Oh@fredo_ai·11 Mar

For 16 years running, NVIDIA technologies have powered every Academy Award-nominated film for Best #VFX. This year, NVIDIA researchers are honored for their groundbreaking work in simulation, denoising, and rendering. #GTC25 bit.ly/4hn3ibf

English

Fred Oh@fredo_ai·12 Şub

@Samsnart @NVIDIAAIDev @songhan_mit Hi Samsnart, have you tried posting on our Developer Forum to get experts to assist? forums.developer.nvidia.com/tags/c/acceler…

English

NVIDIA AI Developer@NVIDIAAIDev·12 Şub

Excited to build a high-performance AI workstation. 🙌 @songhan_mit just finished a brand-new system featuring the GeForce RTX 5090. Here are some key learnings they shared about configuration: nvda.ws/40XUAtM

English

4.3K

Fred Oh@fredo_ai·3 Şub

New #CUDA Toolkit 12.8 delivers NVIDIA Blackwell support with CUDA Graphs, accelerated Math and #Python libraries, compiler, and Nsight developer tools enhancements. Learn More➡️ bit.ly/4hKCnHb

English

Fred Oh retweetledi

NVIDIA HPC Developer@NVIDIAHPCDev·21 Kas

Explore a tutorial on implementing high-performance #CUDA math operations in #Python, interoperable with #PyTorch and #CuPy, using nvmath-python. Learn how to fuse epilog operations with matrix multiplication with nvmath-python. ➡️ nvda.ws/3CNxspo #SC24

English

2.2K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

14/ We had so much fun bringing this special community together IRL for an amazing day of technical talks/hacking. Big thanks to everyone who turned out and our planning committee (here’s an “after” photo taken at midnight). Full videos of the talks to follow soon #cudamodeirl.

English

2.1K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

13/ We had 40+ teams submit projects for judging and 10 projects demoed for the entire group. The top five won prizes. The top three were also awarded 4080s signed by Jensen Huang :).

English

12.1K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

12/ You may recognize Wen-mei from his famous textbook "Programming Massively Parallel Processors." He signed textbooks in the evening… and even someone’s iPad!

English

1.5K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

11/ Last but not least Wen-mei Hwu from @nvidia shared how to pick a hard problem to work on for a decade.

English

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

10/ @Tim_Dettmers spoke on how open source can win over closed source. Fun fact: he was the original author of the phrase “CUDA mode.”

English

1.1K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

9/ @eqhylxx then kicked off our evening talks where she spoke on designing performant speculative decoding solutions in @vLLM.

English

4.6K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

8/ Then the hacking began! Teams formed and paired up for hours of coding, guided by our session leads. Huge thanks to our generous sponsors: @anyscalecompute, @FAL, @LambdaAPI, @modal_labs, @nebiusai, @Oracle, @PrimeIntellect, and @togethercompute.

English

1.1K

Fred Oh retweetledi

Casey Aylward@caseyaylward·26 Eyl

6/ @karpathy wrapped up our morning talks with the story of how he built llm.c and gave the audience some great hack ideas. We had the core llm.c devs there too which was awesome.

English

1.1K

Keşfet

@nvidia @inferact @vllm_project @lmsysorg @Samsnart @NVIDIAAIDev @songhan_mit @Tim_Dettmers