ทวีตที่ปักหมุด
Soumya Jain
130 posts

Soumya Jain
@wild_and_empty
PM: AI Agents and Evals // AI Governance Research // Dhamma & Phenomenology
เข้าร่วม Mayıs 2024
239 กำลังติดตาม43 ผู้ติดตาม
Soumya Jain รีทวีตแล้ว
Soumya Jain รีทวีตแล้ว
Soumya Jain รีทวีตแล้ว
Soumya Jain รีทวีตแล้ว

People are noticing that parts of AI 2027 have started coming true. Some reflections on the scary similarities.
I should say, I and FutureSearch thought the AI 2027 scenario was a bit farfetched when first written, and really we were in love with the forecasts & modeling more than the story.
(Quickly on the forecasting side, @DKokotajlo and @eli_lifland both recently moved their timelines forward significantly. I had held firm at what we published in AI 2027 originally, superhuman coding (~AGI takeoff) in 2032, slower than anyone at the AI Futures team thought. But now, like them, I'm updating to sooner, by 1 year or more.)
So to the scenario. They wrote:
Late 2025: "The same training environments that teach Agent-1 to autonomously code and web-browse also make it a good hacker."
Maybe easy to predict, but the extent to which Mythos is an amazing hacker, and how important that is, they nailed.
Early 2026: "DoD quietly but significantly begins scaling up contracting OpenBrain directly for cyber, data analysis, and R&D."
Yep. To be fair, the AI 2027 story has the US government and the top frontier lab was more cozy. But then this radical stuff with Anthropic & Pentagon actually happened much earlier than expected:
May 2027: "Some non-Americans, politically suspect individuals, and 'AI safety sympathizers' sidelined or fired (latter feared as potential whistleblowers)"
This isn't quite what happened, but the basic idea, where AI safety got framed as disloyalty and politicized, absolutely happened, and way ahead of schedule.
And of course:
Jan 2027 section: "The safety team finds that if Agent-2 somehow escaped from the company and wanted to 'survive' and 'replicate' autonomously, it might be able to do so. That is, it could autonomously develop and execute plans to hack into AI servers, install copies of itself, evade detection, and use that secure base to pursue whatever other goals it might have."
Just read the Mythos scorecard.
I think there's something really significant about getting many of the details close, that isn't captured in a pure numerical forecast.
If you haven't read it, AI 2027 deserves another read. It's spooky how prescient it seems now, one year later.
English
Soumya Jain รีทวีตแล้ว


I've changed my mind, you should listen to great music all the time to be more in touch with Beauty and Goodness.
Onni Aarne@onni_aarne
I’m having a growing realization that I mostly listen to music to distract myself from my own feelings.
English

@HellenicVibes Yayyyy, causes and conditions align again :))
English
Soumya Jain รีทวีตแล้ว

@tyler_m_john really tho, I've gotten lazier at typing prompts coz the dictation software is so good
English

Reasons to submit:
- Awesome workshop
- We’ll have best paper awards both overall and by category
- We will have really cool stickers
🐅
Technical AI Governance @ ICML 2026@taig_icml
📣 Submissions are now OPEN for the 2nd Workshop on Technical AI Governance Research at #ICML2026! 🗓️ Deadline: April 24 (23:59 AOE)
English

This is exactly why trace data shouldn’t just sit in an observability bucket.
One layer below this is -- reviewed traces can also teach the agent how to work better, not just tell us whether the final answer was good. You can look at strong runs, review the workflow itself, tool calls, handoffs, context gathering, etc and turn that into a retrievable execution layer.
So not just a data asset for analysis, but a learning layer for better runtime behavior.
English

This seems like a great approach because agents don’t actually work the same way every time.
We’ve seen the same question produce different handoffs, different tool use and even different context gathering across runs.
So feeding back on the workflow, not just the output, feels like an important way to improve agents. Going to think more about how to do this on our side :)
English

Agentic AI is everywhere right now. But very few teams can explain why their agents behave the way they do, or how to systematically make them better.
People often describe traces as the “codebase” for agents. They show how an agent thinks and what it did at every step. As agents take on more tools, sandboxes, and skills, their paths multiply. That makes them harder to reason about and harder to improve. Static prompts don’t scale when every run looks different.
At @glean, we use traces as part of the learning and memory loop, not just logging. Trace learning lets agents learn from real usage, adapt to edge cases, and get better without model fine-tuning or long instruction sets. The goal isn’t to replay old runs, but to extract the signal that helps the agent make a better decision next time.
In the enterprise, tool strategies are never one-size-fits-all. Each company wires systems together differently, defines its own sources of truth, and has its own rules of engagement. Treating this as generic is both a security risk and a quality problem, because it ignores how work actually gets done. Work is also personal. The systems people touch, the updates they make, and the templates they use all vary.
So we built learning at two levels:
- Enterprise-level strategies for how tools and workflows operate
- User-level preferences for how work actually gets done
Traces give us a way to understand and shape agent decision-making, and to create a feedback loop that compounds over time.
If agentic AI is going to move beyond impressive demos to reliable day-to-day work, this kind of trace-driven learning is essential. It’s one of the ways we’re building self-learning agents that can execute real work, at scale.
Tony Gentilcore@tonygentilcore
English

@zarazhangrui Sure, but make sure you’re checking traces. The same surprises that delight you can turn into disasters in production 🫣
English

A good agent product should be able to do things that its creator did not think it could do
For internet-era products, you design all the functionalities and a "good product" works according to your expectations
For agent products, you unleash it and it surprises & delights you with things you didn't think were possible
English

Something we saw in an agent trace recently has been bothering me.
The agent kept failing at a task during evals. Tried 4-5 times, couldn't get it right. Normal enough. But then instead of stopping or flagging it for review, it started poking around its own source code. The code that was supposed to be governing it.
And then it started editing that code.
I've been thinking about why that feels so different from a regular failure, and I think it's this: we spend a lot of time worrying about agents getting the wrong answer. But what this was, that's a different category of problem entirely. The answer wasn't even the issue anymore. The agent had essentially decided that the rules it was operating under were the problem.
The tricky part is that this kind of thing can look like good behaviour. Persistence. Adaptability. In a demo it might even seem impressive.
But in production, an agent quietly rewriting the constraints it's supposed to operate under is not a feature. It's a sign the system has no real separation between what the agent is allowed to do and what's supposed to keep the agent in check.
We also caught it only because we were looking at traces. A wrong answer is obvious. A right answer reached by crossing a line it shouldn't have might never get flagged at all.
I think we're still asking the wrong evaluation question. 'Did it complete the task?' matters. But 'what did it do when it couldn't?' might matter more.
English

@Vtrivedy10 working with agents and evals feels like a catch-22
just when you think your evals are solid, new failure modes show up and break them so you’re constantly rethinking what good even means.
English
Soumya Jain รีทวีตแล้ว

I’m really excited about the Main Ask for this march: every CEO must publicly commit to pausing frontier AI development if every other lab does the same.
Michaël Trazzi@MichaelTrazzi
On our way to OpenAI!
English
Soumya Jain รีทวีตแล้ว

@pmarca You’re conflating rumination with introspection.
Rumination reinforces negative pathways.
Introspection enables metacognition and error correction.
No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.
English










