Jean Michel A. Sarr

1.1K posts

Jean Michel A. Sarr

@jmamathsarr

Jean Michel A. Sarr PhD in CS. Engineer working on LLM systems and data. Building small tools for founder/operator workflows. Accra, Ghana My opinions, my own.

Accra, Ghana Katılım Nisan 2012

346 Takip Edilen502 Takipçiler

Sabitlenmiş Tweet

Jean Michel A. Sarr@jmamathsarr·17 Kas

RLHF can't scale. Here's why 🧵 I just published a 4-part research series digging into its fundamental limits and mapping the synthetic alignment methods taking over. Starting an n-day daily thread walking through the evidence, one insight at a time. Join me? Day 1/n: full roadmap jmamath.github.io/blog/synthetic…

English

329

Jean Michel A. Sarr@jmamathsarr·2 Oca

@YiTayML @JeffDean @quocleix @benoitschilling @denny_zhou @leehsienloong Happy new year 🎊

English

Yi Tay@YiTayML·1 Oca

2025 recap / highlights - imo gold medal 🏅 - contributions to Gemini 2.5/3 and deep think (via research and sometimes captaining) - hosting @JeffDean, @quocleix, @benoitschilling, @denny_zhou in singapore - meeting and chatting with the legend @leehsienloong at the istana - organizing the gdm singapore Gemini event with @divy93t (thevar was a big highlight man!) - hanging out at mtv gradient canopy multiple times and reuniting with many friends in person (@m__dehghani @vqctran and more!) - visiting the deepmind London office and having dishoom there - playing pickle ball with a ultra legendary person whom I will not name haha - came to bay area a few times and hung out with many non google friends - chilling at iclr in sg and hanging out with @_jasonwei at lau pat sa. also bringing some gdm-ers to have durian. - started hiring and building out the gdm singapore gemini team.. - made new non-AI swe friends in the google sg office. - picked up badminton again and started to play intensely / regularly. - lost about 6-7kg from start of the year (20kg from the start of my entire healthmaxxing start in Aug 2024) - won the local internal mixed doubles badminton competition in google singapore. 😎 - HRV (heart rate variability) almost doubled (higher is better) - resting heart rate decreased by almost 20 BPM! (lower is better) - spent a ton of quality time with family - saw my daughter grow up from 1 year old to 2 years old. 😃 was a great year! now on to an even more amazing 2026!! 🥳

English

258

28.2K

Jean Michel A. Sarr@jmamathsarr·11 Ara

@nikitabier Impact

English

Nikita Bier@nikitabier·10 Ara

In 2 words, what is your purpose in life?

English

11.4K

427

7.7K

4.3M

Jean Michel A. Sarr@jmamathsarr·9 Ara

In preference learning, who judges quality and how those judgments update the policy are two distinct decisions that people often mix together. • Human-written principles (e.g Constitutional AI) provide an interpretable judging mechanism, where explicit rules guide the model in labeling responses before those labels are used to train a reward model. • Expert model judges such as GPT-4 generate preference labels that can either train a reward model for RL or feed directly into DPO for policy optimization. • Self-judgment allows the model to prefer on response over the other, either by relying on emergent judging ability or by leveraging explicit judge training shown to outperform the emergent approach. • Hybrid methods combine multiple sources of judgments, such as Constitutional AI mixing AI-labeled harmlessness with human-labeled helpfulness to balance safety and utility. Decoupling who judges from how a judgment is done gives you orthogonal control knobs over two fundamentally different parts of the system Have you found any other paradigm ? #factor-5-preference-signal-source-and-training-implementation" target="_blank" rel="nofollow noopener">jmamath.github.io/blog/synthetic… [12/n]

English

Jean Michel A. Sarr@jmamathsarr·2 Ara

In some cases, you want to refine responses to generate natural preference pairs. How do you do that? You can: - Use heuristics (e.g., bigger models produce better responses). - In an online setting where you continue training the reward model on new data, sample preference pairs from the current reward model. - With DPO, use the policy itself as a reward model to directly rank its own answers. - Use Constitutional AI: generate a critique of a bad response, then apply a human-written constitution to revise it. - Use self-play: let the model engage with itself in multi-turn conversation and select the latest refined answer. - Use tree search: generate multiple responses, select the best, critique it, and generate improved ones until satisfied. Have you used any of these methods before? #factor-4-response-refinement-and-filtering" target="_blank" rel="nofollow noopener">jmamath.github.io/blog/synthetic… [11/n]

English

Jean Michel A. Sarr@jmamathsarr·17 Kas

English

329

Jean Michel A. Sarr@jmamathsarr·1 Ara

@zhansheng @NeurIPSConf I like vibes ... and alignment, happy to connect !

English

204

Jason Phang@zhansheng·30 Kas

I'll be at @NeurIPSConf in San Diego! No agenda just vibes, ping me if you want to chat!

English

113

12.2K

Jean Michel A. Sarr@jmamathsarr·30 Kas

I just arrived to San Diego for Neurips ! I am excited to discuss the latest research in synthetic data, alignment and more. Excited to meet new folks and discover the local food !

English

Jean Michel A. Sarr@jmamathsarr·29 Kas

@kdqg1 Interesting, I'm curious to learn more.

English

445

Siddharth Mishra-Sharma@kdqg1·29 Kas

I'll be at NeurIPS next week. Our team at Anthropic works across the stack on Claude's scientific capabilities, and we’re hiring research engineers: job-boards.greenhouse.io/anthropic/jobs…. If this resonates, and in particular if you have a strong infra background, I’d love to chat! I’d also love to chat with prospective students interested in working with me at @BU_CDS, and more broadly learn about what folks are working on in AI for science. Whether you’re excited about science in the age of scaling or the age of research, get in touch!

English

459

55.2K

Jean Michel A. Sarr@jmamathsarr·10 Kas

@sarahcat21 Very interesting, I believe startups could also focus on lower priority research currently done by incumbent. Typically, synthetic data generation tools could be a meaningful piece of the AI infrastructure. Despite investments by incumbents like Nvidia acquiring gretel

English

146

Sarah Catanzaro@sarahcat21·10 Kas

My AI investment thesis is that AI application startups will need to solve research and engineering problems that the labs are not currently focused on; thereby accumulating more technical defensibility. At times, their objectives may even diverge- we already see this in creative industries where post-training alignment impedes the ability of models to produce diverse outputs. It will be hard to survive - since the app companies will also need to define compelling workflows and user experiences - but with the right team and support, some (but not all) will make it.

Yishan@yishan

My AI investment thesis is that every AI application startup is likely to be crushed by rapid expansion of the foundational model providers. App functionality will be added to the foundational models' offerings, because the big players aren't slow incumbents (it is wrong to apply the analogy of "fast startup, slow incumbent" here), they are just big. Far more so than with any other prior new technology, there is a massive and fast-moving wave that obsoletes every new app almost as fast as it can be invented. There is almost no time to build a company and scale it. There are two ways AI application startup founders can make money: - Make a flash-in-the-pan app that generates a ton of cash and bank the cash (my estimate is that you have about 12-18 months cashflow generation) - Make a good enough app that you get acquired by one of the big players for sufficient equity The situation is highly unstable - we don't know if it's going to crash or go to the moon but both scenarios make it very unlikely that any AI application startup will independently become a generational supercompany (baseline odds are low to begin with). The best odds are finding an application niche in a highly specialized field with extremely unique and specific data barriers, ideally ones relating to real atoms (hardware or world-related) data and not software/finance.

English

363

88.4K

Jean Michel A. Sarr@jmamathsarr·24 Eki

@sirbayes @karpathy @natashajaques @_AndrewZhao I also like SPar - arxiv.org/abs/2412.11605. Very cool paper leveraging a tree-like data structure to build a competitive synthetic preference dataset.

English

Kevin Patrick Murphy@sirbayes·23 Eki

Hi @karpathy . I loved your interview! However, you said there is no work on LLM self-play. Not true. See eg "Spiral" from @natashajaques et al (agent-v-agent) and "Absolute Zero Reasoner" from @_AndrewZhao et al. (agent-v-env). Probably others.

Andrej Karpathy@karpathy

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet x.com/karpathy/statu… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast x.com/karpathy/statu… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. x.com/karpathy/statu… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" x.com/karpathy/statu…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" x.com/karpathy/statu… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": x.com/karpathy/statu… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" x.com/karpathy/statu… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: x.com/karpathy/statu… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) x.com/karpathy/statu… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. x.com/karpathy/statu… Job automation. How the radiologists are doing great x.com/karpathy/statu… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell x.com/karpathy/statu… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!

English

442

126.7K

Keşfet

@YiTayML @JeffDean @quocleix @benoitschilling @denny_zhou @leehsienloong @divy93t @m__dehghani