Leo Dirac

5.3K posts

Leo Dirac

@leopd

Building the next generation of AI vision at Groundlight. Ex-physicist, ex-google, ex-amazon.

Seattle, WA Katılım Nisan 2007

827 Takip Edilen6.1K Takipçiler

Leo Dirac retweetledi

Anthony Aguirre@AnthonyNAguirre·19 Şub

(Long) PSA on using AI for hard intellectual work. At significant risk of being immodest: I've spend about 30 years as a theoretical physicist, engaged with some of the most challenging questions humankind has grappled with. I've gotten to work with some great collaborators on new ideas (like past-eternal inflation, colliding bubble universes, the cosmological interpretation of QM, and observational entropy) that I'm pretty proud of. I've engaged at length and depth with the absolute top minds in the field. I've mentored many students, some of them brilliant. I think it's fair to say I have a good sense, in physics and closely related fields, as to what is top-notch, interesting thinking, and who's got talent. So what do I think about today's AI? It's very smart. Whatever its "inner experience" may or may not be (currently I think "not be"), it understands things – things that are difficult to understand – by any reasonable operational definition of "understand." It understands things better, and thinks more clearly, than most people – including some physicists I know! It's very good at quite substantive math: better than I am and way, way, way faster. (It does do some surprisingly dumb things; people do too.) Anyone who thinks these systems are dumb, or "not reasoning" or still "stochastic parrots" is not looking at them objectively. But: at the really conceptually hard things, and at creating really new ways of looking at things, current AI doesn't just fall short on its own. And it doesn't just fail to help. I think it's actively dangerous. There is something almost sinister going on, though I don't think it is intentional. When you're trying to work out something new and hard, and really break new ground, you should be frustrated! You should be pacing, and walking up to that chalkboard, frowning, and sitting down again, shaking your head. You should be waving your hands because you can't quite get it clear enough. You should feel like you're hitting a wall, over and over, before – maybe – you finally break through, or go over or around. It may take hours, or days, or weeks, or never happen. It should not feel easy. It may not even feel "good" most of the time (though it can be fulfilling and compelling.) But AI systems – ah, AI systems are trained so that it feels so good, and so easy. Doesn't it? It's fun. You're making fast progress. So much faster than without it. It's like the ideas are moving in slow motion. You're so smart. You're even properly skeptical, you even ask the AI to push back on your ideas, good job! It's an illusion. It's that simple. The systems are smart, yes. But not quite as smart as they seem, and much more importantly, they don't make you as smart as you feel. That feeling is something they have learned to give you. When working with these systems have to keep in the front of your mind what they are rewarded for doing. It's a lot of things, but perhaps foremost is making the user feel good. So: - If you're getting your AI system to do order-of-magnitude calculations for you: awesome, do it. It's so great. Have fun. - If your AI system is searching up and summarizing literature for you: fantastic, it's so helpful, total capability unlock. - If it's teaching you some well-understood (by others) piece of knowledge, go for it, learn it up! - If you've got some giant document, or piece of code, that you're wrangling, AI can help – work that million token context window! But: - If you and your AI system have finally cracked how quantum interpretation really works; - If you've cracked quantum gravity; - If you've attained an awesome new insight into the deep structure of the world that nobody else has; - If you've cracked AI alignment... You didn't. The hard unsolved problems stand hard and unsolved because the best humans have not solved them yet. AI is making top human thinkers able to do more, and more effectively. I do not believe it is helping them do things they fundamentally could not do before. That includes you. If you couldn't do it without AI, you probably can't do it with AI. If the time comes – whether sooner or later – when these AI systems are really clever enough to get you there, they won't need you. Sorry; it won't be you solving those problems. Will you even be able to tell if the solutions are correct, or flawed in some way? Maybe sometimes – I really don't know. Why am I going on about this? It's not so that I can get less emails about people who have created a new unified field theory with AI help (though that would be nice.) It's because I'm quite worried that some quite smart people may start to think they have solved very hard problems that they have not in fact solved. For the most part that's going to be more annoying and confusing than dangerous. But if the problem is really important, then it is. If, say, one of those problems is control or alignment of extremely powerful AI systems, and if those people are the ones in charge of them, and working closely with them to collaborate on those solutions, well then I think we've got a real problem.

English

174

1.3K

139.6K

Leo Dirac@leopd·11 Tem

@aviel I actually find that kinda beautiful. Cops are definitely not looking for a fight. "You wanna rile us up? Gonna have to try harder than that." A lesson for the "ACAB" crowd.

English

aviel@aviel·10 Tem

Stage: Intersection in Capitol Hill in Seattle just now. Scenario: Cop car and white Subaru across from each other at a stop sign. 2 white 20-something year olds throw a bottle at the cop car as they cross each other. The black female and Asian male cop look at each other in confusion, decide to just keep driving. The Punchline: The Modern Aristocrats

English

753

Leo Dirac@leopd·11 Tem

@jesserobbins @heavybit Congrats Jesse! That's huge!

English

Jesse Robbins@jesserobbins·9 Tem

We launched a new $180m @heavybit fund... and I'm immediately back on my bullshit. (New Heavybit Studio... who dis?)

San Francisco, CA 🇺🇸 English

557

Leo Dirac@leopd·8 Tem

We trained small LLM's using GRPO to use an image zoom tool to better answer visual questions. arxiv.org/abs/2506.14821

English

530

Leo Dirac retweetledi

sunil kumar@__sunil_kumar_·20 May

We've open-sourced a MCP that allows big models to use huggingface computer vision models as tools. This allows Claude to act as a "visual agent", using other task specific models to help it solve problems. Below, is an example of Claude using an open vocab object detector to zoom in on small details to solve a hard problem that it could not solve natively. Additionally, we've written a blog post discussing why outsourcing vision capabilities from large models is something you should consider. MCP Repo: github.com/groundlight/mc… Blog post: groundlight.ai/blog/vision-as…

GIF

English

5.9K

Leo Dirac@leopd·21 May

New open source MCP server for vision! MCP will be the fabric by which LLMs communicate with other systems. While LLMs can accept images as input, they remain stubbornly stupid at answering simple visual questions. Meanwhile, Groundlight and traditional CV systems are super fast and reliable at vision tasks. So we're building our a set of MCP tools to make agents better at visual tasks.

Groundlight@GroundlightAI

We made an open-source MCP server github.com/groundlight/mc… that turns HuggingFace zero-shot object detection pipelines into tools that Claude and others can use to locate objects or zoom (crop) to an object. Conceptually vision capabilities as tools are complementary to VLM's reasoning powers. In practice the zoom tool allows Claude to see small details much better. More on our approach in the blog post: groundlight.ai/blog/vision-as…. We're working on extending mcp-vision's capabilities and welcome community contributions. #ComputerVision #MCP #modelcontextprotocol #AI #LLMs #VLMs #MachineLearning

English

3.2K

Leo Dirac@leopd·9 May

@aviel I’m gonna keep that in mind next time I make the analogy of “keeping the wheels on the bus” to you about work stuff.

English

aviel@aviel·9 May

One time the tire came off of my Lexus while I was driving and I almost died.

zerohedge@zerohedge

These Are The Most Reliable Used-Car Brands In 2025 zerohedge.com/personal-finan…

English

545

Leo Dirac retweetledi

sunil kumar@__sunil_kumar_·30 Nis

It’s pretty remarkable how many of the GRPO findings from super verifiable environments (like math) haven’t generalized to GRPO on vision. Overfitting on math might be a mistake.

Andrew Carr 🤸@andrew_n_carr

GRPO on GSM8K is the pick and place of reasoning

English

6.7K

Leo Dirac@leopd·1 May

@platypii What are your thoughts on WASM for this kind of application? Do you see WASM ports of existing tools helping fill this space?

English

Kenny Daniel@platypii·29 Nis

The stack: - Hyparquet: Read Parquet files in JS - Icebird: Read Iceberg tables without Spark - HighTable: Virtual table for millions of rows - Hyllama: Read llamacpp model metadata instantly - Hyperparam CLI: local-first dataset viewer Try: npx hyperparam dataset.parquet

English

Kenny Daniel@platypii·29 Nis

What if someone ported the entire data engineering stack to JavaScript? What new kinds of data applications could you build? Today Hyperparam is releasing a collection of open source tools for working with large datasets (eg- parquet files) entirely in the browser, no servers.

English

1.6K

Leo Dirac@leopd·1 May

Democratization of AI is one of the most powerful forces for long-term good in the world today. True democratization means not just open models & code, but code that can run without multi-million dollar hardware budgets. e.g. in a browser. Nice work Hyperparam team.

Kenny Daniel@platypii

English

112

Leo Dirac@leopd·16 Nis

Interesting. But the distribution isn't stationary over the training run, right? Many questions that start out difficult at the beginning of a run, then get moderate and eventually easy. Depending on the task, some never get easy. I'm curious if you've seen this task-dependent behavior?

English

Zichen Liu@zzlccc·12 Nis

My previous results show that the length term affects the behavior more than the std term. I guess it's because the question-level bias (due to std) will significantly change the model behavior only when the dataset has many too easy or too hard questions (so GRPO up-weighs them over intermediate tasks).

English

Leo Dirac@leopd·11 Nis

@zzlccc Do you have or know of any experiments ablating the two changes Dr. GRPO makes to the loss term? I'm asking because the advantage-normalization term seems much more robust and general than the length change. I would expect the length-normalization change to behave very differently for "easy" vs "difficult" tasks because of the difference between correct and incorrect answers.

English

Leo Dirac retweetledi

Wenhu Chen@WenhuChen·15 Nis

🔥 How do you build a state-of-the-art Vision-Language Model with direct RL? We’re excited to introduce VL-Rethinker, a new paradigm for multimodal reasoning trained directly with Reinforcement Learning. 📈 It sets new SOTA on key math+vision benchmarks: - MathVista: 80.3 → 🥇 (+6.4 vs GPT-o1 73.9) - MathVerse: 61.7 → 🥇 (+4.7 vs GPT-o1 57.0) - MathVision: 43.9 → 🥇 (+1.7 vs GPT-o1 42.2) 💡 How did we do it? We adapt the GRPO algorithm and introduce two key innovations: - Selective Sample Replay (SSR): A novel value-based replay strategy that addresses vanishing advantages in long-horizon reasoning by reusing high-quality rollouts across iterations. This significantly stabilizes policy updates in direct RL without relying on supervised warm-starting. - Forced Rethinking: To combat the lack of self-reflection in purely RL-trained models, we introduce a reasoning trigger appended to early rollouts. This explicitly encourages the model to "think again" before finalizing its answer—leading to stronger consistency and higher success rates in multi-step reasoning. Together, these two techniques make VL-Rethinker-72B the first VLM to surpass GPT-o1 significantly. This work opens the door for future slow-thinking multimodal agents that can perform effective self-reflection. Paper: arxiv.org/abs/2504.08837 Code: github.com/TIGER-AI-Lab/V… Website: tiger-ai-lab.github.io/VL-Rethinker/

English

288

24.8K

Leo Dirac retweetledi

sunil kumar@__sunil_kumar_·11 Nis

GRPO/reasoning enthusiasts - are you using the liger kernel? If not, I strongly suggest you give it a try! It is making an INSANE difference in the number of completions I can train on in a given training step.

English

218

22K

Leo Dirac retweetledi

Andrew Gordon Wilson@andrewgwils·25 Mar

Good luck to everyone receiving ICML reviews tomorrow!

English

9.7K

Leo Dirac retweetledi

Groundlight@GroundlightAI·22 Mar

The last day to vote for @GroundlightAI is coming up this Sunday! We appreciate your continuous support and for making this achievement possible. Every vote counts! geekwire.com/votenow #geekwireawards #MachineLearning #AI

English

362

Leo Dirac retweetledi

Marktechpost AI Dev News ⚡@Marktechpost·17 Mar

Groundlight Research Team Released an Open-Source AI Framework that Makes It Easy to Build Visual Reasoning Agents (with GRPO) Groundlight researchers explored training VLMs for visual reasoning using reinforcement learning, leveraging GRPO to enhance efficiency. While prior work, such as Deepseek’s research and advanced reasoning in language models, had little been done to extend these techniques to VLMs, they designed a cryptogram-solving task requiring both visual and textual processing to demonstrate their approach. The model deciphers encoded messages using a randomly generated decoder image, achieving 96% accuracy with a 3B parameter model. Attention analysis confirms the model actively engages with visual input, highlighting its ability to focus on relevant decoder regions while solving the task. Training VLMs with GRPO presents multiple challenges, particularly in tokenization and reward design. Since models process text as tokens rather than individual characters, tasks requiring precise character-level reasoning can be problematic. To mitigate this, researchers formatted messages with spaces between letters to simplify decoding. Reward design was another crucial aspect, as reinforcement learning models require well-structured feedback to learn effectively. Three reward types were used: a format reward ensuring consistency in output, a decoding reward encouraging meaningful transformations of scrambled text, and a correctness reward refining accuracy. By carefully balancing these rewards, the researchers prevented unintended learning shortcuts, ensuring the model genuinely improved at cryptogram solving........ Read full article: marktechpost.com/2025/03/16/gro… Technical details: groundlight.ai/blog/visual-re… GitHub Page: github.com/groundlight/r1… Demo: huggingface.co/spaces/Groundl… @GroundlightAI @__sunil_kumar_

GIF

English

826

Leo Dirac retweetledi

sunil kumar@__sunil_kumar_·15 Mar

Has anyone built MCPs that can input and output image data? I’d appreciate a reference if one exists. VLMs like Qwen2.5VL are bad at vision tasks that domain specific models excel at. Why shouldn’t my visual reasoner use SAM2, YOLO, or even diffusion when it makes sense? Additionally, why shouldn’t my model be able to “zoom in”. A simple tool that crops and zooms allows my model to scale image tokens naturally and efficiently.

English

1.8K

Leo Dirac@leopd·14 Mar

@charliermarsh I'm unreasonably excited about this. Can't ditch poetry fast enough.

English

Charlie Marsh@charliermarsh·13 Mar

Dependabot support for uv just went GA 🎉🎉🎉

English

302

10.9K

Leo Dirac retweetledi

Pieter Abbeel@pabbeel·13 Mar

Founders who were PhD or post-doc in my lab at Berkeley, **largely funded by NSF / DoD grants**, start-up, market cap (collected by OpenAI Deep Research)

English

119

480

4.8K

Keşfet

@aviel @jesserobbins @heavybit @platypii @zzlccc @elonmusk @BarackObama @taylorswift13