Pete Florence

601 posts

Pete Florence banner
Pete Florence

Pete Florence

@peteflorence

Co-Founder & CEO @GeneralistAI

San Francisco, CA Katılım Mayıs 2012
371 Takip Edilen5.2K Takipçiler
Kevin Zakka
Kevin Zakka@kevin_zakka·
Impending graduation is making me really sad, PhD has been so lovely 😭
English
8
0
126
6.2K
Fabian Kerj
Fabian Kerj@Fabian13Kerj·
Meeting the @GeneralistAI team has been the highlight so far. They are doing such incredible things
Fabian Kerj tweet media
English
2
0
5
202
Pete Florence
Pete Florence@peteflorence·
As more of that experience goes into training, performance improves. The learning loop compounds, and systems get better the more they’re used. We’re starting to see the early signs of a pretraining era for robotics – and what that unlocks. Great discussion with folks bringing AI to their industries @FabianHedin @anna_shedletsky @maithra_raghu @matanSF.
English
1
0
4
412
Pete Florence
Pete Florence@peteflorence·
Spoke at #NVIDIAGTC about how AI is changing everyday work. 1 thing I keep coming back to: AI needs to get out, to spend time in the physical world, to “touch grass.” More in 🧵
Pete Florence tweet media
English
4
6
36
3K
Pete Florence
Pete Florence@peteflorence·
At the booth live at #NVIDIAGTC! The hockey sticks are out, and the models are rolling. At Generalist we’re excited to be here as Preferred Model Partner with @Universal_Robot, the world’s #1 cobot manufacturer by volume!
Generalist@GeneralistAI

Happening now at #NVIDIAGTC: Generalist’s GEN-0 model autonomously packing phones on @Universal_Robot arms in our first public demo. To move robotics beyond the lab, systems need to operate in real time on industrial hardware. See the demo below, and stop by booth #1840 👇🤖

English
3
8
51
6.8K
Pete Florence retweetledi
Chris Paxton
Chris Paxton@chris_j_paxton·
Live demo from Generalist AI at gtc -- inserting phone into case. Next to it you can try out doing the task yourself with a Universal Robots leader/follower setup. Its harder than the robot makes it look! Seeing demos like this live and in person always hits different
English
7
13
147
9.7K
Pete Florence retweetledi
Generalist
Generalist@GeneralistAI·
Happening now at #NVIDIAGTC: Generalist’s GEN-0 model autonomously packing phones on @Universal_Robot arms in our first public demo. To move robotics beyond the lab, systems need to operate in real time on industrial hardware. See the demo below, and stop by booth #1840 👇🤖
English
11
30
195
30.8K
Pete Florence retweetledi
Generalist
Generalist@GeneralistAI·
Physical commonsense is everywhere, yet hard to pin down. It’s the reactive, closed-loop intelligence behind interaction—an intuition of physics learned from a lifetime of sensorimotor experience, compiled into reflex and muscle memory. What's been elusive in robotics may, for the first time, emerge from scaling. More from our co-founder and chief scientist @andyzengineer 👇
Andy Zeng@andyzengineer

The dark matter of robotics is “physical commonsense” Those tiny corrections, subtle recoveries that your hands do (and rarely notice). It’s everywhere, and yet—hard to pin down Second nature to us, but hard for machines They’re starting to emerge in our foundation models👇

English
1
4
69
12.3K
Pete Florence
Pete Florence@peteflorence·
Ask and ye shall receive, Exhibit A: x.com/xiao_ted/statu…
Ted Xiao@xiao_ted

Compelling advances in scaling laws for robotics from @GeneralistAI! Scaling laws are without a doubt one of the key components that enabled the rapid hyperscaling of language model pre-training over the past years. Establishing predictive scaling laws would be a watershed moment for general purpose robotics, and the recent analysis from Generalist is some of the most promising I've seen so far. But, there are some nuances that I'd like to raise; think of these ideas as an open bounty for high-impact future work 🚀 #1 Language modeling has found that scaling laws measured on metrics like training loss [1] or monorepo perplexity [2] provide accurate approximations of observed downstream evaluation performance. However, an open secret in robotics is that offline metrics, such as training or validation loss, exact-match token accuracy, or open-loop offline action MSE, have been notoriously uncorrelated with real-world close-loop performance! Reliable offline evaluation metrics have been a holy grail for robotics that remains unsolved [3]. Oftentimes, model checkpoints with higher validation loss may actually result in better end-to-end performance in the real world; this makes checkpoint selection, model iteration, and power law analysis extremely difficult. This has many reasons, but an intuitive illustration is that robot policies often have very jagged properties in terms of their learned behavior: if model A fits the training distribution extremely well but is brittle and sometimes makes rare but catastrophic errors, it may exhibit much lower action MSE compared to a model B which is slightly worse on all states but never makes unrecoverable errors. There are many such considerations when comparing offline metrics with online closed-loop rollouts: compounding errors, model brittleness, in-distribution vs. out-of-distribution generalization performance. But even offline closed-loop evaluation is difficult, and leveraging simulation [4] or world models [5] for evaluation are an open research problem. Generalist proposed scaling laws across dataset size and model scale, as measured against validation loss during a post-training phase on target tasks; while an interesting result at a larger scale than previous work ([6], [7], [8]), this result is not as convincing as I would like because while one can argue that the proposed power laws are true, it is unclear if they are meaningful power laws because the correlation between validation loss and real-world performance has been highly dependent on the policy type, model class, task complexity, and deployment situation. #2 Scaling laws in language modeling have been immensely useful during pretraining laddering (where ideas are explored at smaller scales and can be used to make critical decisions for larger hero runs) because of their predictive power. The trends and slopes of metrics like compute efficiency when measuring specific domains like code or Wikipedia oftentimes accurately measure how general model intelligence will improve on other tasks as varied as MMLU or GPQA. However, in robotics, scaling law analysis is overfit to a single embodiment on a single set of tasks; it does not translate to a universal predictive scaling law to apply to other robots or other scenarios! A derived scaling law for the relationship between model size, FLOPs, and hours of demonstrations needed to solve Task A on an ALOHA may not necessarily tell you much at all about the scaling law for Task B on a humanoid. In robotics, scaling laws are often time only backwards-looking: if you were to re-do the same project under the exact same requirements, you could have saved X amount of time by collecting less data / training a smaller model. But it may not tell you anything about the next task or next environment you deploy in. Overall, I am excited and impressed by the scale of Generalist's immense data collection operations, the gorgeously smooth and performant model behaviors, and the scientifically rigorous push to make real progress on scaling laws for robotics. The team has been cooking! In the future, I look forward to extensions to this scaling analysis (from Generalist or the community!) for (A) moving beyond validation loss to more trustworthy performance measurements and (B) showing predictive power of truly universal scaling laws as opposed to backwards looking task and embodiment specific scaling laws. These are hard problems, and I look forward to seeing progress on this front!

English
1
0
4
824
Pete Florence
Pete Florence@peteflorence·
Today we're releasing extra evals of GEN-0. Before, we showed pretraining scaling laws using the y axis of "validation loss", as is common for LLM scaling. But for LLMs and especially robotics, the golden metric is how do models do in closed-loop evals. Here are those numbers!
Generalist@GeneralistAI

More pretraining improves GEN-0 real-robot performance (via blind A/B evals with closed-loop rollouts). Improvements are significant in the low-data regime, but the best models thrive with both pretraining and ample post-training. See blog addendum: generalistai.com/blog/nov-04-20…

English
3
0
13
2K
Pete Florence retweetledi
Generalist
Generalist@GeneralistAI·
More pretraining improves GEN-0 real-robot performance (via blind A/B evals with closed-loop rollouts). Improvements are significant in the low-data regime, but the best models thrive with both pretraining and ample post-training. See blog addendum: generalistai.com/blog/nov-04-20…
Generalist tweet media
English
5
28
187
79.4K
Pete Florence retweetledi
Andy Zeng
Andy Zeng@andyzengineer·
Took time to make sure we did this right with results we could trust: blind A/B evals 📊 on success rates. Still feels crazy to me that data from random unrelated activities around the world yields strong pretraining transfer. Yet here we are. It's quite the beast. Stay tuned.
Generalist@GeneralistAI

More pretraining improves GEN-0 real-robot performance (via blind A/B evals with closed-loop rollouts). Improvements are significant in the low-data regime, but the best models thrive with both pretraining and ample post-training. See blog addendum: generalistai.com/blog/nov-04-20…

English
2
11
99
15.4K
Ashish Vaswani
Ashish Vaswani@ashVaswani·
We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….
Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English
103
173
1.8K
603K
Pete Florence
Pete Florence@peteflorence·
@kunlei15 Nice post! Large-scale infra realities also make multi-step a good fit. Also, very nice work with RL-100.
English
1
0
4
557
Kun Lei
Kun Lei@kunlei15·
RL feels messy, but a two-axis view—data source (on/off/offline) × update schedule (one-step/multi-step/iterative)—brings order. I wrote a post unifying them with shared equations. lei-kun.github.io/blogs/rl.html Robotic FMs (e.g., GEN-0, pi_0.5) grow via a data flywheel. Best fit: multi-step updates—conservative yet exploratory—then switch to iterative RL to surpass/align human ceilings.
Kun Lei tweet media
English
11
56
473
27.7K
Pete Florence
Pete Florence@peteflorence·
Even if you have some of the brain remote, you want as competent a brain as you can locally. In 2023 with PaLM-E we did a large model running in the cloud on TPUs (at 1 Hz) that planned over long horizons, and a local smaller model (at 5 Hz). That was a System1-System2 type system which passed language as the embedding between S2 to S1. Showed the possibility of running “huge” models in the cloud. For latency + reliability reasons, you probably want all core sensorimotor decision making local, but can access the internet for certain functions/info/very long horizon planning. Tesla/Waymo cars today work like this too.
English
1
0
2
78
Eugene Mironov
Eugene Mironov@helper2424·
I have one more question: scaling should work, true, but what about edge device specification limitations? Inference with 10B++ models could be challenging on current devices. Yeah, optimizations, distillation, quantization, etc., could help, but with heavy model scaling, there's no way devices will catch up to model sizes. Do you see VLAs (or other VLM-based models—I don’t know what you build) as fully API-based (like remote inference first) or edge devices first?
English
1
0
0
117
Pete Florence
Pete Florence@peteflorence·
In LLMs, pretraining scaling is one of the essential ingredients. In robotics, we hadn't yet seen evidence we can predictably scale in a fully general-purpose way, as in language pretraining. We've learned a ton along the way. Excited we shared some of that this week!
Generalist@GeneralistAI

Introducing GEN-0, our latest 10B+ foundation model for robots ⏱️ built on Harmonic Reasoning, new architecture that can think & act seamlessly 📈 strong scaling laws: more pretraining & model size = better 🌍 unprecedented corpus of 270,000+ hrs of dexterous data Read more 👇

English
7
2
38
4.6K