NVIDIA AI

13K posts

NVIDIA AI banner
NVIDIA AI

NVIDIA AI

@NVIDIAAI

Teaching your AI new tricks.

Santa Clara, CA เข้าร่วม Haziran 2016
878 กำลังติดตาม310.4K ผู้ติดตาม
ทวีตที่ปักหมุด
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
English
199
463
3.5K
1.2M
acc-mu3n
acc-mu3n@AcceleratedMu3n·
(一瞬ですが)4台に増えました👏
acc-mu3n tweet media
日本語
8
6
160
23.4K
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Congrats to the @MiniMax_AI team on the release of MiniMax M3, a long-context multimodal model for text, image, and video reasoning. 🙌 Try it today with our free GPU-accelerated endpoint on build.nvidia.com. Details: nvda.ws/4v4BWhD
NVIDIA AI tweet media
MiniMax (official)@MiniMax_AI

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Mini… MiniMax Sparse Attention: huggingface.co/papers/2606.13…

English
51
116
1.3K
134.2K
Inception
Inception@_inception_ai·
The fastest reasoning LLM is now in production on Baseten. Mercury 2 is a diffusion LLM, so it generates tokens in parallel and hits 1,000+ tokens/sec on @NVIDIAAI GPUs, speeds that used to require specialized hardware. @augmentcode is already using Mercury 2, cutting cost 90% and latency 82%. Proud to partner with the @baseten team to bring dLLMs to production.
Baseten@baseten

We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000+ tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.

English
5
11
113
12K
Matthew
Matthew@slashreboot·
A peek behind the curtain. I really need to dust my home lab...
Matthew tweet media
English
5
1
32
1.9K
David Nix
David Nix@david_nix·
Absolute monster of a GPU. Pictures online don't do it justice.
David Nix tweet media
English
15
1
84
8K
Kamlesh Naidu
Kamlesh Naidu@knaidu78·
@NVIDIAAI Running OpenWebUI, searXNG, hermes agent and local RAG with Qwen3.6-35B-A3B.
Kamlesh Naidu tweet media
English
1
0
2
122
NVIDIA AI
NVIDIA AI@NVIDIAAI·
One open model. 350,000+ motion clips. 15,000 FPS. MotionBricks from NVIDIA Research runs real-time character animation at scale, without hand-crafted transitions or fine-tuning. And yes, it works for robotics too. #SIGGRAPH2026 paper, demos + code: nvlabs.github.io/motionbricks
English
36
129
1.1K
98.9K
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Shoutout to Caleb for putting together a great deep dive on Nemotron 3 🙌 Check it out.
Caleb Eom@calebfoundry

Nemotron 3 Full Breakdown With the help of Joey Conway from @NVIDIAAI getting into the specifics around why Nemotron 3 is kind of a big deal Biggest headline with Nemotron is: Hybrid Mamba Transformer, Latent MoE, and MTP Hybrid Mamba Transformer essentially attacks right at the Attention mechanism to make the overhead sub-quadratic, but unlike quantizing KV Cache or swapping out attention head, NVIDIA chose Mamba-2 Latent MoE helps further optimize on sparsity by down projecting the dimensions so you're doing less math and less memory movement between HBM and SRAM, you're saving a ton, and NVIDIA made a conscious choice to add more experts given the surplus Finally, MTP or multi token prediction where the model can see future tokens to be more expressive in training and also option to use for speculative decoding during inference Oh, also the model adopts the new OpenMDW 1.1 License

English
13
27
322
29.3K
Caleb Eom
Caleb Eom@calebfoundry·
Nemotron 3 Full Breakdown With the help of Joey Conway from @NVIDIAAI getting into the specifics around why Nemotron 3 is kind of a big deal Biggest headline with Nemotron is: Hybrid Mamba Transformer, Latent MoE, and MTP Hybrid Mamba Transformer essentially attacks right at the Attention mechanism to make the overhead sub-quadratic, but unlike quantizing KV Cache or swapping out attention head, NVIDIA chose Mamba-2 Latent MoE helps further optimize on sparsity by down projecting the dimensions so you're doing less math and less memory movement between HBM and SRAM, you're saving a ton, and NVIDIA made a conscious choice to add more experts given the surplus Finally, MTP or multi token prediction where the model can see future tokens to be more expressive in training and also option to use for speculative decoding during inference Oh, also the model adopts the new OpenMDW 1.1 License
English
5
19
153
47.3K
NVIDIA AI
NVIDIA AI@NVIDIAAI·
@Soveryn_AI Nice! Keep us posted once the Spark lands on your doorstep.
English
1
0
2
142
Soveryn Intelligence
Soveryn Intelligence@Soveryn_AI·
@NVIDIAAI waiting for my spark to arrive i am tying it to my main server with 3 gpus with 144 gb of vram should be epic
English
1
0
1
173
Tim Rocktäschel
Tim Rocktäschel@_rockt·
Excited to show results of the first steps towards automated AI research at @Recursive_SI. The same general system achieved state of the art on @NVIDIAAI's SOL-ExecBench GPU Kernel Optimization, nanoGPT Speedrun, and @karpathy's NanoChat autoresearch benchmarks.
Tim Rocktäschel tweet media
English
6
12
133
9K