Pruthviraj P

817 posts

Pruthviraj P banner
Pruthviraj P

Pruthviraj P

@spidernvdev

Deep Learning SDE @nvidia · ex-nobody cuda guy

California, USA เข้าร่วม Ocak 2017
188 กำลังติดตาม301 ผู้ติดตาม
ทวีตที่ปักหมุด
Pruthviraj P
Pruthviraj P@spidernvdev·
what’s missing from this setup?
Pruthviraj P tweet media
English
2
0
7
742
Pruthviraj P
Pruthviraj P@spidernvdev·
simple mental model: large model training is like moving a huge machine across multiple rooms. each gpu holds part of the work. but the hard part is not only splitting it. the hard part is making all parts communicate fast enough.
English
1
0
0
10
Pruthviraj P
Pruthviraj P@spidernvdev·
morning from the green matrix. green matrix note 01: why big models need parallelism a 70b parameter model needs ~140gb just for fp16 weights. that is before activations, gradients, optimizer states, and training data. one gpu is not enough.
Pruthviraj P tweet media
English
1
0
2
22
Xbotter
Xbotter@Xbotter·
@spidernvdev Indeed, products only need imagination, but engineers have a ton of other stuff to handle.
English
1
0
1
55
Pruthviraj P
Pruthviraj P@spidernvdev·
most ai advice online is about prompts. but if you want to build real ai products, you also need to understand: latency memory cost evaluation deployment prompting helps. systems knowledge compounds.
English
1
0
3
148
Sam Altman
Sam Altman@sama·
we're starting rollout of GPT-5.5-Cyber, a frontier cybersecurity model, to critical cyber defenders in the next few days. we will work with the entire ecosystem and the government to figure out trusted access for cyber; we want to rapidly help secure companies/infrastructure.
English
885
583
9.5K
608.1K
Thefitdoc
Thefitdoc@Siddurp2·
I don't know who needs to hear this but, Your physique makes the 1st impression whever you go. Build your body 🙂
English
9
0
29
458
Sam Altman
Sam Altman@sama·
GPT-5.5 is going to have a party for itself. it chose 5/5 at 5:55 pm for the date and time. if you'd like to come, let us know here: luma.com/5.5 codex will help the team pick people from the replies. 5.5 had some good ideas/requests for the party, which we'll do.
English
1.7K
328
5.2K
553K
Pruthviraj P
Pruthviraj P@spidernvdev·
one thing i underestimated early in ai: the model is only one layer. behind every good ai product there is data loading, preprocessing, gpu memory, serving, monitoring, and debugging. that full stack is where engineering gets interesting.
English
0
0
6
204
Pruthviraj P
Pruthviraj P@spidernvdev·
@seunosewa @hiarun02 It can, but total cost isn’t just per-token, its also how many passes it takes. Fewer retries can offset higher pricing.
English
0
0
0
41
Arun
Arun@hiarun02·
Anyone cancelled Claude Code for Codex yet? Feels like dev's are switching not because Codex is better. but because it’s cheaper to actually get work done. What’s your experience?
English
422
6
691
58.2K
Pruthviraj P
Pruthviraj P@spidernvdev·
anthropic adding claude connectors for tools like blender, adobe, autodesk, and splice is the right direction. the next ai jump may not be only smarter models. it may be models that understand the tools people already use.
English
0
0
1
226
Pruthviraj P
Pruthviraj P@spidernvdev·
sometimes i wonder if i'm learning fast enough or just keeping up with noise
English
0
0
1
167
Unkonfined
Unkonfined@unkonfined·
I want 1M followers.
English
1.7K
383
2.1K
88.3K
Pruthviraj P
Pruthviraj P@spidernvdev·
@PengmingWang this took me a while to learn you can have a great model and terrible evals, and you won't know until something breaks in production
English
0
0
1
54
Pengming Wang
Pengming Wang@PengmingWang·
My updated take on "The 'it' in AI models is the dataset." is that the 'it' is the evals you're using. I rather have bad scores but great benchmarks than good scores but bad benchmarks.
English
2
1
19
1.3K
Pruthviraj P
Pruthviraj P@spidernvdev·
@nvidia blackwell costs 2x more than hopper on paper delivers 35x lower cost per million tokens in practice that gap is the what needs to be understood
English
0
0
0
197
Pruthviraj P
Pruthviraj P@spidernvdev·
groot-x has 24,000 simulated teleoperation runs across humanoids and manipulators mixing this synthetic data with real improved model performance by 40% n1.7 now runs with a 3.3B param backbone on ~6GB vram open source physical ai is moving faster than anyone expected
NVIDIA Robotics@NVIDIARobotics

The Physical AI Robotics GR00T‑X Embodiment Sim dataset has surpassed 10 million downloads on @HuggingFace. 🥳 A huge shoutout to the global research and developer community exploring the future of embodied AI and robotics with this open dataset — you made this milestone possible. 📥 Try it on Hugging Face 👉 nvda.ws/3Qv64Ul

English
0
0
1
288
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Look at the amazing performance boost after applying a Metal Kernel as suggested by @kernelpool 🤩 Night & Day in Prefill that is now much faster! Gonna publish an 8bit comparison test soon and then testing this super model with coding soon!
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
Ivan Fioravanti ᯅ@ivanfioravanti

MLX Ling-2.6-flash support added! 💪 Here my (preliminary, because I bet @angeloskath will improve performance) context benchmark for the 4bit version running on M3 Ultra (cooking a new version) I created the PR with the amazing transformer_to_mlx skill by @pcuenq and Opus 4.7. Few iterations and it seems 😂 working 100%! Ultra fast model created by @TheInclusionAI. Can't wait to test it with a code harness! Raw results: Ling-2.6-flash-mlx-4bit MLX Benchmark Results Hardware: Apple M3 Ultra, 512.0GB RAM, 32 CPU cores, 80 GPU cores 0.5k pp 632 tg 79 t/s mem 59.4GB kv 0.03GB 1k pp 676 tg 79 t/s mem 59.9GB kv 0.04GB 2k pp 693 tg 79 t/s mem 61.1GB kv 0.04GB 4k pp 704 tg 79 t/s mem 61.1GB kv 0.05GB 8k pp 708 tg 78 t/s mem 61.2GB kv 0.07GB 16k pp 700 tg 77 t/s mem 61.9GB kv 0.11GB 32k pp 678 tg 74 t/s mem 64.5GB kv 0.18GB 64k pp 637 tg 70 t/s mem 69.5GB kv 0.33GB 128k pp 564 tg 63 t/s mem 79.6GB kv 0.64GB Total generated tokens: 1135 Batch TPS: b1 78 b2 123 b4 164 b8 218 b16 307 b32 418 Batch KV : b1 0.04GB b2 0.08GB b4 0.16GB b8 0.32GB b16 0.63GB b32 1.26GB

English
1
1
22
2.9K