
Benoit Vandevivere
828 posts

Benoit Vandevivere
@benvdv
Entrepreneur. 🇪🇺 eu/acc. @euacchq Speaker and advisor on disruption, innovation. Strong believer in innovation for positive change and impact.


Today, we are emerging from stealth and launching PrismML, an AI lab with Caltech origins that is centered on building the most concentrated form of intelligence. At PrismML, we believe that the next major leaps in AI will be driven by order-of-magnitude improvements in intelligence density, not just sheer parameter count. Our first proof point is the 1-bit Bonsai 8B, a 1-bit weight model that fits into 1.15 GBs of memory and delivers over 10x the intelligence density of its full-precision counterparts. It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class. We are open-sourcing the model under Apache 2.0 license, along with Bonsai 4B and 1.7B models. When advanced models become small, fast, and efficient enough to run locally, the design space for AI changes immediately. We believe in a future of on-device agents, real-time robotics, offline intelligence and entirely new products that were previously impossible. We are excited to share our vision with you and keep working in the future to push the frontier of intelligence to the edge.



Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI



With EU Inc., we are making it drastically easier to start and grow a business all across Europe ↓ twitter.com/i/broadcasts/1…

What we said to the EU Commission when building EU Inc is simple. Look at the World’s best : fastest, most affordable, most digital company law. Make it as fast or faster. Same price. Make Europe number one. In 2 days we’ll find out how ambitious is the @EU_Commission




I had early access to Gemini 3 Flash (ty @GoogleDeepMind) and it shocked my vibe test as I walked in with 2.5 Pro/Flash expectations. Looking at the evals now it all makes sense. The latent story here, from my POV, is Google has cannibalized a big chunk of 3.0 Pro use cases (besides smoking competition). The fact that Google pushed this out shortly after 3.0 makes me think they already know future 3.x Pro will have stellar performance. That’s something to look forward to. Flash is now the best agentic model hands down (tau2, mcp atlas, swe verified) for its price point. The lower score on HLE and GPQA diamond over Pro means it is not as knowledgeable as Pro, which makes sense. The choice is clear: Flash 3.0 should be the de facto agentic model unless you are in a knowledge heavy domain. But I suspect, even there, with sufficient context management you can get good value out of Flash 3.0. Gemini LLMs have been a black swan for a big chunk of 2025. I doubt any outsider could’ve predicted total Pareto frontier domination by the Gemini franchise by EOY. Congratulations to the Google/DeepMind teams for this exciting program execution!




Google DeepMind's CoFounder Shane Legg sitsdown to talk about SuperIntelligence tl;dr Yes, we will reach superintelligence.

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year


Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop

Poetiq has officially shattered the ARC-AGI-2 SOTA 🚀 @arcprize has officially verified our results: - 54% Accuracy – first to break the 50% barrier! - $30.57 / problem – less than half the cost of the previous best! We are now #1 on the leaderboard for ARC-AGI-2!


We also appear on the ARC AGI2 leaderboard. Not best score, but clearly on the Pareto frontier with a much lower cost than best scores.


ARC Prize 2025 Paper Award Winners 1st / "Less is More: Recursive Reasoning with Tiny Networks" (TRM) / A. Jolicoeur-Martineau / $50k 2nd / "Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI" (SOAR) / J. Pourcel et al. / $20k 3rd / "ARC-AGI Without Pretraining" / I. Liao et al. / $5k

I’m really happy to share that we’re launching UMA. Together with @RemiCadene, @alibert_s, @therobotstudio, and an exceptional founding team, we’re building general-purpose mobile and humanoid robots. If you want to be part of this adventure, reach out at uma.bot Throughout my career, I have been obsessed with scalable learning and data acquisition methods that require little to no labels. Back in 2005 with @ylecun, we were self-supervising our “deep” 2-layer network to do long range vision using short range stereo information, this was running live onboard our robot. However, because our deep model was so slow, the robot would crash constantly, so I designed a decoupled fast & far architecture for robust navigation, allowing fast control to coexist with slow long horizon thinking, much like systems 1 & 2 in modern humanoids. My PhD was focused on making deep learning work for computer vision, including unsupervised feature learning with @koraykv, writing and open-sourcing a C++ deep learning library with @soumithchintala, and open-sourcing one of the first deep learning vision systems. I came back towards robotics at @Google Brain and @GoogleDeepMind, where I pushed for entirely label-free methods on real robots. In 2017, @coreylynch and I managed to make our robot imitate human motion by co-training self-supervision across sim and real domains jointly, without any labels. With @imkelvinxu and @svlevine , we showed that unsupervised visual reward learning could be used for RL in the real world. In 2020, Corey and I developed the first manipulation VLA, which was trained with very few language labels thanks to self-supervision on play data (playing is an efficient way to demonstrate and practice a broad set of skills and is essential for human development). I was never satisfied with the status quo of top-down data collection, where researchers decide a few tasks to collect data on. Instead, I believed that we should let the data speak: tasks should be automatically discovered bottom-up (scalable and general) from cheap and continuous data collection, with a sprinkle of more expensive data and labels. In 2022, I explored long-horizon reasoning for robotics using scalable automatic labeling augmentations for VQA tasks and studied the economics of different data collection schemes. Most recently, I developed approaches to scalably discover laws of robotics from real data (images, hospital reports, sci-fi literature) in a broad and bottom-up fashion, which improved robot behavior over top-down approaches like Asimov’s laws. All these experiences nourished my vision for UMA as Chief Scientist, I’m incredibly excited to put everything together and so grateful I get to contribute to this incredible moment in human history. Picture: Yann supporting UMA as an advisor and investor, with the team in Paris a couple weeks ago.
