Samuel Miserendino (@samuelp1002) - Twitter Profili

Samuel Miserendino retweetledi

Jerry Zhang@zjearbear·14 Nis

Introducing Lemma. Your AI agents are failing in ways you can’t see. Lemma is the world’s first reliability platform that finds and fixes these issues fast.

English

59

37

226

37.7K

Samuel Miserendino@samuelp1002·5 Nis

Really enjoyed working on this report with @RyanKaufman at @OpenAI & excited to see it released to the public! As we shift towards harder and more realistic evaluations, it’s crucial that we do not underinvest in data quality. Our critical decisions — about what models can and can’t do, and what deployment settings are safe or unsafe — are increasingly routing through opaque and hard-to-understand evals. I’m grateful to OpenAI for supporting and publishing this work and hope it inspires others to closely scrutinize their datasets for quality and contamination. The capabilities frontier is jagged, diffusion is complicated, and part of making sure AGI goes well is being realistic about what we can measure and ensuring that our claims about evaluation construct validity actually hold up.

OpenAI Developers@OpenAIDevs

The standard for frontier coding evals is changing with model maturity. We now recommend reporting SWE-bench Pro and are sharing more detail on why we’re no longer reporting SWE-bench Verified as we work with the industry to establish stronger coding eval standards. SWE-bench Verified was a strong benchmark, but we’ve found evidence it is now saturated due to test-design issues and contamination from public repositories. openai.com/index/why-we-n…

English

0

5

306

Samuel Miserendino retweetledi

Anthropic@AnthropicAI·21 Kas

New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.

English

216

580

4.1K

2.4M

Samuel Miserendino retweetledi

OpenAI@OpenAI·13 Kas

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach helps us begin to close that gap. openai.com/index/understa…

English

217

705

5.5K

1.7M

Samuel Miserendino retweetledi

Josh McGrath@j_mcgraph·28 Eki

OpenAI research is so AGI pilled we bet our whole codebase that we’ll hit superhuman coding before tech debt bankruptcy

English

98

61

2.4K

340K

Samuel Miserendino retweetledi

OpenAI@OpenAI·21 Eki

Meet our new browser—ChatGPT Atlas. Available today on macOS: chatgpt.com/atlas

English

2.4K

4.2K

29.9K

14M

Samuel Miserendino retweetledi

OpenAI@OpenAI·30 Eyl

Sora 2 is here.

English

1.7K

2.3K

20.9K

9M

Samuel Miserendino retweetledi

OpenAI@OpenAI·25 Eyl

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. openai.com/index/gdpval-v0

English

208

650

4.7K

1.8M

Samuel Miserendino retweetledi

Wojciech Zaremba@woj_zaremba·27 Ağu

It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results. Frontier AI companies will inevitably compete on capabilities. But this work with Anthropic is a small, meaningful pilot toward a “race to the top” in safety. The fact that competitors collaborated is more significant than the findings themselves, which are mostly basic. Transparency + accountability → safer AI. Read the report: openai.com/index/openai-a…

English

108

358

2.4K

411.6K

Samuel Miserendino retweetledi

Boris Power@BorisMPower·22 Ağu

At @OpenAI, we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences, where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. ⬇️

English

160

622

3.6K

2.1M

Samuel Miserendino@samuelp1002·16 Ağu

ZXX

0

2

14

1.4K

Samuel Miserendino retweetledi

OpenAI@OpenAI·5 Ağu

Our open models are here. Both of them. openai.com/open-models

English

1.1K

3.1K

19.4K

6.7M

Samuel Miserendino retweetledi

Sam Altman@sama·3 Ağu

pantheon is such a good show!

English

871

477

7.7K

1.4M

Samuel Miserendino retweetledi

Anthropic@AnthropicAI·1 Ağu

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

English

228

888

5.8K

1.4M

Samuel Miserendino retweetledi

Owain Evans@OwainEvans_UK·22 Tem

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

English

281

1.1K

8.4K

2M

Samuel Miserendino retweetledi

Joel Becker@joel_bkr·10 Tem

it’s out! we find that, against the forecasts of top experts, the forecasts of study participant, _and the retrodictions of study participants_, early-2025 frontier AI tools slowed ultra-talented + experienced open-source developers down. x.com/METR_Evals/sta…

METR@METR_Evals

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

English

9

21

147

29.6K

Samuel Miserendino retweetledi

Miles Wang@MilesKWang·18 Haz

We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by “misaligned persona” features - can be detected and mitigated 🧵:

OpenAI@OpenAI

Understanding and preventing misalignment generalization Recent work has shown that a language model trained to produce insecure computer code can become broadly “misaligned.” This surprising effect is called “emergent misalignment.” We studied why this happens. Through this research, we discovered a specific internal pattern in the model, similar to a pattern of brain activity, that becomes more active when this misaligned behavior appears. The model learned this pattern from training on data that describes bad behavior. We found we can make a model more or less aligned, just by directly increasing or decreasing this pattern’s activity. This suggests emergent misalignment works by strengthening a misaligned persona pattern in the model. We also showed that training the model again on correct information can push it back toward helpful behavior. Together, this means we might be able to detect misaligned activity patterns, and fix the problem before it spreads. This work helps us understand why a model might start exhibiting misaligned behavior, and could give us a path towards an early warning system for misalignment during model training. openai.com/index/emergent…

English

216

387

2K

866.6K

Samuel Miserendino retweetledi

OpenAI@OpenAI·18 Haz

Understanding and preventing misalignment generalization Recent work has shown that a language model trained to produce insecure computer code can become broadly “misaligned.” This surprising effect is called “emergent misalignment.” We studied why this happens. Through this research, we discovered a specific internal pattern in the model, similar to a pattern of brain activity, that becomes more active when this misaligned behavior appears. The model learned this pattern from training on data that describes bad behavior. We found we can make a model more or less aligned, just by directly increasing or decreasing this pattern’s activity. This suggests emergent misalignment works by strengthening a misaligned persona pattern in the model. We also showed that training the model again on correct information can push it back toward helpful behavior. Together, this means we might be able to detect misaligned activity patterns, and fix the problem before it spreads. This work helps us understand why a model might start exhibiting misaligned behavior, and could give us a path towards an early warning system for misalignment during model training. openai.com/index/emergent…

English

341

419

3.3K

1.2M

Samuel Miserendino retweetledi

Transluce@TransluceAI·5 Haz

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎