Brandon

158 posts

Brandon

@brandonzhao

Partner @twosmallfishvc | Previous, ML @wattpad

Katılım Nisan 2009

778 Takip Edilen116 Takipçiler

Brandon retweetledi

Abhishek Gupta@abhishekunique7·26 Mar

Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: weirdlabuw.github.io/omnireset/ And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)

English

614

81.8K

Brandon retweetledi

Yixuan Wang@YXWangBot·5 Mar

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English

500

123.8K

Brandon retweetledi

Glen Berseth@GlenBerseth·7 Ağu

@RL_Conference will be in Montréal next year at @UMontreal. We are looking forward to welcoming you all! Bienvenue!

English

105

13.5K

Brandon retweetledi

clem 🤗@ClementDelangue·3 Ağu

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0). This is why we're releasing the Ultra-Scale Playbook. 200 pages to master: - 5D parallelism (DP, TP, PP, EP, FSDP) - ZeRO - Flash Attention - Compute/communication overlap and bottlenecks All with accessible theory intros and 4,000+ scaling experiments.

English

270

2.1K

169.8K

Brandon retweetledi

Omar Khattab@lateinteraction·11 May

DSPy's biggest strength is also the reason it can admittedly be hard to wrap your head around it. It's basically say: LLMs & their methods will continue to improve but not equally in every axis, so: - What's the smallest set of fundamental abstractions that allow you to build downstream AI software that is "future-proof" and rides the tide of progress? - Equivalently, what are the right algorithmic problems that researchers should focus on to enable as much progress as possible for AI software? But this is necessarily complex, in the sense that the answer has to be composed of a few things, not one concept only. (Though if you had to understand one concept only, the fundamental glue is DSPy Signatures.) It's actually only a handful of bets, though, not too many. I've been tweeting them non-stop since late 2022, but I've never collected them in one place. All of these have proven beyond a doubt to have been the right bets so far for 2.5 years, and I think they'll stay the right bets for the next 3 years at least. 1) Information Flow is the single most key aspect of good AI software. As foundation models improve, the bottleneck becomes basically whether you can actually (1) ask them the right question and (2) provide them with all the necessary context to address it. Since 2022, DSPy addressed this in two directions: (i) free-form control flow ("Compound AI Systems" / LM programs) and (ii) Signatures. Prompts have been a massive distraction here, with people thinking they need to find the magical keyword to talk to LLMs. From 2022, DSPy put the focus on *Signatures* (back then called Templates) which force you to break down LM interactions into *structured and named* input fields and *structured and named output fields*. Getting simply those fields right was (and has been) a lot more important than "engineering" the "right prompt". That's the point of Signatures. (We know it's hard for people to force them to define their signatures so carefully, but if you can't do that, your system is going to be bad.) 2) Interactions with LLMs should be Functional and Structured. Again, prompts are bad. People are misled from their chat interaction with LLMs to think that LLMs should take "strings", hence the magical status of "prompts". But actually, you should define a functional contract. What are the things you will give to the function? What is the function supposed to do with them? What is it then supposed to give you back? This is again Signatures. It's (i) structured *inputs*, (ii) structured *outputs*, and (iii) instructions. You've got to decouple these three things, which until DSP (2022) and really until very recently with mainstream structured outputs, were just meshed together into "prompts". This bears repeating: your programmatic LLM interactions need to be functions, not strings. Why? Because there are many concerns that are actually not part of the LLM behavior that you'd otherwise need to handle ad-hoc when working with strings: - How do you format the *inputs* to your LLM into a string? - How do you separate *instructions* and *inputs* (data)? - How do you *specify* the output format (string) that your LLM should produce so you can parse it? - How do you layer on top of this the inference strategy, like CoT or ReAct, without entirely rewriting your prompt? Signatures solve this. They ask you to *just* specify the input fields, output fields, and task instruction. The rest are the job of Modules and Optimizers, which instantiate Signatures. 3) Inference Strategies should be Polymorphic Modules. This sounds scary but the point is that all the cool general-purpose prompting techniques or inference-scaling strategies should be Modules, like the layers in DNN frameworks like PyTorch. Modules are generic functions, which in this case take *any* Signature, and instantiate *its* behavior generically into a well-defined strategy. This means that we can talk about "CoT" or "ReAct" without actually committing at all to the specific task (Signature) you want to apply them to. This is a huge deal, which again only exists in DSPy. One key thing that Modules do is that they define *parameters*. What part(s) of the Module are fixed and which parts can be learned? For example, in CoT, the specific string that asks the model to think step by step could be learned. Or the few-shot examples of thinking step by step should be learnable. In ReAct, demonstrations of good trajectories should be learnable. 4) Specification of your AI software behavior should be decoupled from learning paradigms. Before DSPy, every time a new ML paradigm came by, we re-wrote our AI software. Oh, we moved from LSTMs to Transformers? Or we moved from fine-tuning BERT to ICL with GPT-3? Entirely new system. DSPy says: if you write signatures and instantiate Modules, the Modules actually know exactly what about them can be optimized: the LM underneath, the instructions in the prompt, the demonstrations, etc. The learning paradigms (RL, prompt optimization, program transformations that respect the signature) should be layered on top, with the same frontend / language for expressing the programmatic behavior. This means that the *same programs* you wrote in 2023 in DSPy can now be optimized with dspy.GRPO, the way they could be optimized with dspy.MIPROv2, the way they were optimized with dspy.BootstrapFS before that. The second half of this piece is Downstream Alignment or compile-time scaling. Basically, no matter how good LLMs get, they might not perfectly align with your downstream task, especially when your information flow requires multiple modules and multiple LLM interactions. You need to "compile" towards a metric "late", i.e. after the system is fully defined, no matter how RLHF'ed your models are. 5) Natural Language Optimization is a powerful paradigm of learning. We've said this for years, like with the BetterTogether optimizer paper, but you need both *fine-tuning* and *coarse-tuning* at a higher level in natural language. The analogy I use all the time is riding a bike: it's very hard to learn to ride a bike without practice (fine-tuning), but it's extremely inefficient to learn *avoiding to ride the bike on the side walk* from rewards, you want to understand and learn this rule in natural language to adhere ASAP. This is the source of DSPy's focus on prompt optimizers as a foundational piece here; it's often far superior in sample efficiency to doing policy gradient RL if your problem has the right information flow structure. That's it. That's the set of core bets DSPy has made since 2022/2023 until today. Compiling Declarative AI Functions into LM Calls, with Signatures, Modules, and Optimizers. 1) Information Flow is the single most key aspect of good AI software. 2) Interactions with LLMs should be Functional and Structured. 3) Inference Strategies should be Polymorphic Modules. 4) Specification of your AI software behavior should be decoupled from learning paradigms. 5) Natural Language Optimization is a powerful paradigm of learning.

DSPy@DSPyOSS

Is this guy talking about DSPy?

English

147

987

336.9K

Brandon retweetledi

Karl Pertsch@KarlPertsch·20 Haz

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

English

426

111.2K

Brandon retweetledi

Hugo Larochelle@hugo_larochelle·26 Nis

Today is my last day at Google. I started over 8 years ago, with a mandate to build a team doing bleeding edge AI research from Montreal, in what would be the first big tech AI research lab in the city. These years led to countless amazing scientific contributions from my team, to several initiatives nurturing the Montreal AI ecosystem, and to many new invaluable friendships across the globe at Google. It is with a heavy heart that I say goodbye, but I know I’m leaving behind an exceptionally strong Google DeepMind group in Montreal for which its best accomplishments are still ahead. There are too many people to thank, but I can’t pass on thanking Samy Bengio and @JeffDean who first believed in me and the opportunity of building a research lab in Montreal. I’m still working on determining the details of my next chapter, but certainly it will be grounded in my continuing motivation to leverage and make the most out of our enormous and talented local AI ecosystem.

English

1.4K

159.6K

Brandon retweetledi

Deedy@deedydas·19 Nis

Rich Sutton just published his most important essay on AI since The Bitter Lesson: "Welcome to the Era of Experience" Sutton and his advisee Silver argue that the “era of human data,” dominated by supervised pre‑training and RL‑from‑human‑feedback, has hit diminishing returns; the future will belong to agents that — act continuously in real or simulated worlds, — generate and label their own training data through interaction — optimise rewards grounded in the environment rather than in human preference alone, and — refine their world‑models and plans over lifelong streams of experience.

English

265

1.9K

420.4K

Brandon retweetledi

ViggleAI@ViggleAI·9 Nis

Viggle’s Mic 2.0 lets you turn image into fully animated character — with synced voice and motion, all in one take. Achieve: 1. perfect lip sync from text or audio 2. full body expression 3. control how your character talks and moves simultaneously Examples & tutorials: (1/7)

English

148

17.4K

Brandon retweetledi

Ideogram@ideogram_ai·26 Mar

Meet Ideogram 3.0 — stunning realism, creative designs, and consistent styles, all in one powerful model. And it's blazingly fast. Now available to all Ideogram users for free.

English

126

267

1.4K

287.7K

Brandon retweetledi

steve.prophet@nilslice·23 Şub

@martin_casado @hypersoren @ankrgyl sort of both… prompts are programs, and the missing libc / runtime is tool calling. the “AI runtime” is where this comes together. docs.mcp.run/blog/2025/02/1…

English

3.8K

Brandon retweetledi

Jaden Clark@jadenvclark·12 Şub

How can we leverage human video data to train generalist robot policies? 🤖 Enter RAD: Reasoning through Action-Free Data, a new way to train robot policies using both robot and human video data via action reasoning. rad-generalization.github.io

English

102

27.5K

Brandon retweetledi

Jiao Sun@sunjiao123sun_·28 Oca

I read the DeepSeek-R1 paper the day it came out, and I don’t think GRPO is the key to its success. Instead, here’s what truly matters (ranked by importance): 1. Iterative RL and SFT 2. A hybrid reward model—mixing rule-based RM and neural RM for deterministic tasks 3. High-quality synthetic data, with human post-processing only when necessary 4. Evaluation with 64 inference samples These open exciting opportunities for PhD students with limited compute to explore further. I might tweet some potential research projects inspired by DeepSeek-R1 later. Beyond the technical aspects, what I appreciate even more: 1/ Openness: without it, people won’t follow. 2/ **Exceptional writing**: Strong storytelling: from proof of concept to a more complex process demonstrating full potential. Clear, easy-to-follow methodology. Final note: Heroes admire each other, while losers resent each other. Let’s stay competitive and grateful!

English

444

3.1K

420K

Brandon retweetledi

Peiyi Wang@sybilhyz·28 Oca

Last year, I joined DeepSeek with no RL experience. While conducting Mathshepherd and DeepSeekMath research, I independently derived this unified formula to understand various training methods. It felt like an "aha moment", though I later realized it was PG.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

If you can only read one DeepSeek paper in your life, read DeepSeek Math. Everything else is either ≈obvious in hindsight or clever optimization. DeepSeek Math is a tour de force of data engineering, general DL LLM methodology, RL, and just beautiful. Just 22 pages.

English

536

6.5K

1.2M

Brandon retweetledi

Eric Migicovsky@ericmigi·27 Oca

We’re bringing Pebble back → rePebble.com

GIF

English

349

739

4.4K

703.1K

Brandon retweetledi

Andrew Ng@AndrewYNg·24 Oca

Discussion at Davos with @Yoshua_Bengio, @YejinChoinka, @JonathanRoss321, @Thom_Wolf moderated by @nxthompson. We share excitement for the future of AI, the science to be done, and the many things yet to be built. Take a look!

World Economic Forum@wef

The Dawn of Artificial General Intelligence? with @Thom_Wolf (@HuggingFace), @Yoshua_Bengio, Yejin Choi (@Stanford) @AndrewYNg (@DeepLearningAI), Jonathan Ross (@GroqInc), @nxthompson (@theatlantic) #WEF25 x.com/i/broadcasts/1…

English

269

75.8K

Brandon retweetledi

Douwe Kiela@douwekiela·2 Oca

I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: pp.events/felix

English

109

728

300.8K

Brandon retweetledi

Sam Altman@sama·27 Ara

it is hard to overstate how much alec radford has contributed to the field, and how much of everyone's current progress traces back to his work. i believe he is a genius at the level of einstein, and also he is one of my favorite people ever--hard to imagine a nicer, warmer, or more thoughtful person. he is extremely modest and isn't as well-known as he should be, though i'm certain he doesn't care. super grateful for everything he has done for openai, and so looking forward to collaborating with him as an independent researcher--i think it will fit his style really well.

English

294

378

8.4K

Brandon retweetledi

Vincent Weisser@vincentweisser·14 Ara

.@ilyasut full talk at neurips 2024 "pre-training as we know it will end" and what comes next is superintelligence: agentic, reasons, understands and is self aware

English

736

3.3K

790.9K

Brandon retweetledi

Ideogram@ideogram_ai·22 Eki

Today, we’re introducing Ideogram Canvas, an infinite creative board for organizing, generating, editing, and combining images. Bring your face or brand visuals to Ideogram Canvas and use industry-leading Magic Fill and Extend to blend them with creative, AI-generated content.

English

124

391

2.6K

520.5K

Keşfet

@RL_Conference @UMontreal @JeffDean @martin_casado @hypersoren @ankrgyl @Yoshua_Bengio @YejinChoinka