pranav

1.2K posts

pranav

@_pranavnt

robot learning @uwcse • prev @morph_labs @atlasfellow

sf / seattle Katılım Ocak 2021

969 Takip Edilen1.4K Takipçiler

pranav retweetledi

Runtime@RuntimeBRT·6d

🚨 Bengaluru-based @Airbound_Aero has conducted 700 flights for Narayana Health since January 2026 with a zero failure rate.

English

666

4.2K

372.2K

pranav@_pranavnt·14 Mar

@abinayaaaa @physical_int congrats!!

English

168

abinaya@abinayaaaa·14 Mar

for PI day, i'm thrilled to share that I'll be joining @physical_int in a few weeks to work on accelerating robot learning research through fast, observable runtimes! i started playing with robots relatively late - my sophomore year of college - and have learned almost entirely through personal projects. its a dream come true to collaborate w the brilliant folks at PI to make better robot software. stay tuned for more !!!

English

232

12.1K

pranav retweetledi

Jesse Zhang@Jesse_Y_Zhang·3 Mar

A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)

English

104

399

86.7K

pranav retweetledi

Dan Wang@danwwang·28 Şub

For the better part of a decade, I've been writing that the US government's restrictions on Chinese tech companies were unstrategic, poorly explained, and based in caprice (foreignaffairs.com/united-states/…). Now it looks like the administration is turning these controls on American firms

Dean W. Ball@deanwball

Think about the power Hegseth is asserting here. He is claiming that the DoD can force all contractors to stop doing business of any kind with arbitrary other companies. In other words, every operating system vendor, every manufacturer of hardware, every hyperscaler, every type of firm the DoD contracts with—all their services and products can be denied to any economic actor at will by the Secretary of War. This is obviously a psychotic power grab. It is almost surely illegal, but the message it sends is that the United States Government is a completely unreliable partner for any kind of business. The damage done to our business environment is profound. No amount of deregulatory vibes sent by this administration matters compared to this arson.

English

363

64.8K

pranav retweetledi

TBPN@tbpn·25 Şub

Standard Intelligence's @devanshpandey responds to @tszzl's tweet that "text is the universal interface," and explains why their new foundation model is trained on video: "At some point in the arbitrarily long future, if we only use text models, we could force most things to be text. But I think there are just a lot of things that are much more native when done from a computer-use [perspective]." "GUIs are designed for humans to use. We have this massive long tail of things on the internet that are entirely undoable by LLMs." "For example, when I do ML engineering most of my time is spent doing the grunt work of engineering. It's a lot of looking at graphs, analyzing, and comparing loss curves. You can do this in text, but it's a much larger pain than doing it in the native interface." "There's a reason humans don't interact with a computer purely through text, it would kind of suck."

roon@tszzl

text is the universal interface

English

311

60K

pranav@_pranavnt·24 Şub

@_neelr_ @comma_ai scary

English

Neel Redkar@_neelr_·24 Şub

@_pranavnt @comma_ai shhhhhh

255

Neel Redkar@_neelr_·23 Şub

chat i made a computer use model drive a car @comma_ai

Standard Intelligence@si_pbc

True computer use is fully general. FDM-1 uses arrow keys on a computer to steer a car in San Francisco with less than 1 hour of fine-tuning data. The action policy is critical: tuning FDM-1 to drive gets much higher accuracy than tuning just the video encoder on the same data.

English

142

11.2K

pranav@_pranavnt·24 Şub

Incredibly impressive! VPT-style approach to computer use at 30fps, done entirely by a team of 5(!)

Standard Intelligence@si_pbc

Computer use models shouldn't learn from screenshots. We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.

English

3.3K

pranav retweetledi

galen@G413N·23 Şub

computer use is too important to relegate to post-training. this has been many months in the making, I'm super proud of what we've achieved as a team and excited to scale!

Standard Intelligence@si_pbc

English

173

13.6K

pranav retweetledi

Standard Intelligence@si_pbc·23 Şub

GIF

English

186

402

3.9K

1.1M

pranav retweetledi

Standard Intelligence@si_pbc·23 Şub

We’ve made two main advances: the ability to train on our 11M+ hour computer action dataset and understand long-context video. Our video encoder can fit nearly two hours of 30FPS, high-resolution video into a 1M token context window, ~50x more efficient than existing SOTA.

English

682

122.5K

pranav retweetledi

gavin leech (Non-Reasoning)@g_leech_·19 Şub

improve AI discourse about 5% just by renaming evals accurately Humanity's Last Exam: PubQuizFromHell MATH: RemedialMath FrontierMath: QuarterFrontierMath SWE-Bench: DjangoBench MMLU Virology: NoiseBench Terminal Bench 2: NoiseBench METR HCAST: GreenfieldCodeGigworkBench

English

384

12.9K

pranav retweetledi

Justin@justinwangx·18 Şub

new @OpenAI — in collaboration with @paradigm, we developed an evaluation to measure AI on critical smart contract security capabilities

OpenAI@OpenAI

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. openai.com/index/introduc…

English

196

45.1K

pranav retweetledi

Seattle Seahawks@Seahawks·9 Şub

SUPER BOWL LX CHAMPIONS ‼️

Deutsch

679

10.3K

35.9K

1.5M

pranav retweetledi

NFL@NFL·9 Şub

SEAHAWKS ARE THE CHAMPIONS!

English

354

2.8K

17.2K

328.8K

pranav retweetledi

Seattle Seahawks@Seahawks·9 Şub

Came out to play.

English

161

2.1K

10.2K

201.9K

pranav@_pranavnt·30 Oca

@eumycotan amazing

English

pranav retweetledi

Kushal Thaman@kushal1t·28 Oca

I spent a bunch of time a year ago thinking about the data wall. A blackpill at the time for me was when I realized that the total stock of natural text data is depleting much faster than Chinchilla's infamous 20 tokens per param compute optimal ratio suggested. Here is a naive BOTEC from back then: Famously, Chinchilla showed that using about 20 tokens per param was compute optimal, measured at 6*10^23 FLOPs. It turns out that even though MoEs are more compute efficient than dense models, training them compute optimally needs a lot more data! In fact, at a 1:32 (97%) sparsity it uses ~6x more tokens per active params (see [1]). The Llama 3 405B report measured 40 token per param to be optimal with their data at 4*10^25 FLOPs. And for a 1:32 sparse MoE model such as DeepSeek v3, this suggests 240 tokens per param could well end up being optimal! At this ratio, things would break down. A 4*10^27 FLOPs model (a pretraining run that might be planned e.g. for 2026) will need 400T tokens. A 5*10^28 FLOPs model would require O(1400T) tokens. These are insane numbers, and they only get worse into the 2030s! The totally unfiltered Common Crawl is about 240T tokens. People have been offsetting this to some extent by training for multiple epochs or repeating the same data a la "Scaling Data-Constrained Language Models" by Muennighoff et al. (2023). Of course, this is a naive BOTEC, and I'm happy to dive into more details, e.g. how much compute might be put into other uses, such as long-horizon RLVR which could well require a lot of those 5*10^28 FLOPs. But we are casually talking about hundreds of trillions to over a quadrillion tokens as compute-optimal! It makes one question whether these numbers are actually necessary for the kind of capability gains we want. We are working on this question at @flappyairplanes, and we're excited to be advised by @karpathy. I will end here with this @ilyasut quote from the @dwarkesh_sp episode with him: "The data is very clearly finite. What do you do next? Either you do some kind of souped-up pre-training, a different recipe from the one you’ve done before, or you’re doing RL, or maybe something else. But now that compute is big, compute is now very big, in some sense we are back to the age of research. [...] Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling. But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers." [1] arxiv: 2501.12370

Andrej Karpathy@karpathy

A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!

English

123

27K

pranav retweetledi

aidan@aidanmantine·28 Oca

There might be fast takeoff at SFO, but people are forgetting about it in AI. We're building Flapping Airplanes to train models radically differently and fly over the data wall. We can’t wait to show you what we’ve been working on soon.

Flapping Airplanes@flappyairplanes

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

English

381

63K

pranav retweetledi

Ethan Shen@ethnlshn·27 Oca

Today, we release SERA-32B, an approach to coding agents that matches Devstral 2 at just $9,000. It is fully open-source and you can train your own model easily - at 26x the efficiency of using RL. Paper: allenai.org/papers/opencod… Here’s how 🧵

Ai2@allen_ai

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵

English

691

90.4K

Keşfet

@Airbound_Aero @abinayaaaa @physical_int @devanshpandey @tszzl @_neelr_ @comma_ai @OpenAI