We just posted a blog + paper on a a simple but effective approach to model honesty called "Confessions" TL; DR: normal RL training rewards for high performance on a task. Confession training is a separate phase that rewards only for honesty. Test look promising! More:

OpenAI@OpenAI

In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. openai.com/index/how-conf…

English

1.1K

Jason Yosinski retweetledi

Jasmine Wang@j_asminewang·1 Ara

Today, OpenAI is launching a new Alignment Research blog: a space for publishing more of our work on alignment and safety more frequently, and for a technical audience. alignment.openai.com

English

136

1.2K

466.9K

Jason Yosinski@jasonyo·22 Eki

@oh_that_hat @hugo_larochelle I still remember seeing those plots in group meeting showing way better than chance performance at iter 0 and thinking "yay, another bug to track down".

English

hattie@oh_that_hat·22 Eki

@jasonyo @hugo_larochelle I peaked at those egg key diagrams!

English

400

Jason Yosinski retweetledi

ML Collective@ml_collective·22 Ağu

A little while ago, many of you gave generously to support a number of MLC-Nigeria researchers in attending Deep Learning Indaba #DLI2025. Here's the crew 👇that attended; from what we hear it was a bustle of talks, posters, mentorship, and sparks of collaboration!

Mardiyyah@hemhemoh

Today we say goodbye to @DeepIndaba after six inspiring days in Kigali rich with keynotes, tutorials, workshops, mentorship circles, and insightful posters that kept us learning non-stop. Some of us were only able to make it down to #DLI2025 because of your generous support.

English

Jason Yosinski retweetledi

Mardiyyah@hemhemoh·22 Ağu

Rosanne Liu@savvyRL

The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.

English

11.4K

Jason Yosinski@jasonyo·14 Tem

Help send a bunch of researchers to DL Indaba this year! For less than one H100 we can send 25 people!

Rosanne Liu@savvyRL

English

4.5K

Jason Yosinski@jasonyo·12 Tem

@mrcslws Wow can't believe that tweet was from 2012!

English

Marcus Lewis@mrcslws·12 Tem

Hat tip to Coding Noir x.com/jasonyo/status…

Jason Yosinski@jasonyo

#CodingNoir in 3 easy steps: 1. Open in one tab: bit.ly/LhZlTI 2. Second tab: bit.ly/JhIFYy 3. Let gritty coding commence.

English

183

Marcus Lewis@mrcslws·12 Tem

My new favorite hidden iOS feature: Background Sounds Add it to your Control Center Then listen to Daft Punk + rain

English

341

Jason Yosinski retweetledi

ML Collective@ml_collective·6 Haz

Starting in 1 hour: @thebasepoint presents Anthropic's "Biology of a Large Language Model" work at the DLCT reading group. Paper: transformer-circuits.pub/2025/attributi… Come for the chain of thought, stay for the rabbits and habbits. Zoom info below 👇

English

Jason Yosinski@jasonyo·6 Haz

@MFarajtabar Nice work! Any of you authors want to present this at the @ml_collective Friday reading group? mlcollective.org/dlct/ @ParshinShojaee @i_mirzadeh

English

489

Mehrdad Farajtabar@MFarajtabar·5 Haz

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks, but we found their fundamental limitations are more severe than expected. In our latest work, we compared each “thinking” LRM with its “non-thinking” LLM twin. Unlike most prior works that only measure the final performance, we analyzed their actual reasoning traces—looking inside their long "thoughts". Our analysis reveals several interesting results ⬇️ 📄 machinelearning.apple.com/research/illus… Work led by @ParshinShojaee and @i_mirzadeh, and with @KeivanAlizadeh2, @mchorton1991, Samy Bengio.

English

110

568

3.1K

907.6K

Jason Yosinski@jasonyo·28 May

Starting in 30 min!

ML Collective@ml_collective

Next Research Jam is in 14 hours, tomorrow morning at 8am PT. Stop by this virtual lab meeting to hear research ideas and updates on projects in progress! Zoom info at mlcollective.org/events/researc…

English

1.3K

Jason Yosinski@jasonyo·28 May

Next MLC Research Jam is tomorrow; sharing two ideas myself to mix things up :)

ML Collective@ml_collective

English

3.1K

Jason Yosinski@jasonyo·16 May

Starting in 15 min!

ML Collective@ml_collective

This week at Deep Learning: Classics and Trends we're kicking off a new five part mini-series on LLM Interpretability. Up first: @thesubhashk shows how LLMs represent numbers on a helix and use it to add! Join Friday at 10am PT, zoom here: mlcollective.org/dlct/

English

1.4K

Jason Yosinski@jasonyo·14 May

I am sitting here watching my HF smolagent slowly reason about and click on Captcha squares one a time 🙈. Is this general AI?

English

709

Jason Yosinski@jasonyo·13 May

@scychan_brains Love this simple/sticky mental model!

English

Stephanie Chan@scychan_brains·4 May

When we have low confidence (imposter syndrome), we are often afraid to take action in the world. Specifically, I mean actions that have external impact. Because of this fear of taking action, we then get very little feedback (positive or negative), which reduces learning and growth. Crucially, we also get very few successes! This lack of growth and success further decreases our confidence, and the cycle continues… 1/

English

Stephanie Chan@scychan_brains·4 May

Some years ago, I got trapped in a Massive Trough of Imposter Syndrome. It took more than a year to dig myself out of it, but the following framework really helped me. It feels a bit vulnerable to share, but I hope it might help a few others too! A short thread 🧵🙂

English

301

38.7K

Jason Yosinski@jasonyo·13 May

@NotACyborgYet @AttarZiv @insightpartners @PraveenAkkiraju @JonahWaldman2 Congrats, Tom and team!

English

Tom Bishop@NotACyborgYet·12 May

Exciting news: we raised a $20M Series A at Glass Imaging to massively improve image quality using AI. @AttarZiv and I are excited to have @insightpartners @PraveenAkkiraju and @JonahWaldman2 join us for the next phase of our growth!

GLASS Imaging@GlassImaging

Proud to see the incredible progress at @GlassImaging – $20M in new funding, led by @insightpartners , and the addition of @PraveenAkkiraju and @JonahWaldman to our board. Here’s to pushing the limits of what cameras can do!

English

164

Jason Yosinski retweetledi

ML Collective@ml_collective·3 May

Tomorrow at 10am PT we'll have our next MLC OpenClubHouse, our 25th 🎉! Stop by to hang out, catch up with friends, and chat about ML or anything else. We'll meet in the MLC Discord #openclubhouse channel: discord.gg/6Za9MBr4?event…

English

1.2K

Keşfet

@kchonyc @ManasJoglekar @GabrielDWu1 @j_asminewang @boazbaraktcs @mia_glaese @techreview @oh_that_hat