Soumya Jain

130 posts

Soumya Jain

@wild_and_empty

PM: AI Agents and Evals // AI Governance Research // Dhamma & Phenomenology

เข้าร่วม Mayıs 2024

239 กำลังติดตาม43 ผู้ติดตาม

ทวีตที่ปักหมุด

Soumya Jain@wild_and_empty·7 Haz

1/ Let’s talk about Sīla, Samādhi, Paññā - not as moral commandments, but as a feedback system for mastering attention. A phenomenological take. 🧵

English

2.2K

Soumya Jain รีทวีตแล้ว

Greg Burnham@GregHBurnham·17 Nis

It's called samsara baby get used to it

English

5.3K

35.1K

515.3K

Soumya Jain รีทวีตแล้ว

Blixt@blixt·13 Nis

First signs of AGI in Amsterdam

English

339

5.9K

222.5K

Soumya Jain รีทวีตแล้ว

Moe Ali@ProductFaculty·12 Nis

Product management career ladder in an AI-native world in 2 words: APM - vibe coding PM - deploying AI Director - amplifying AI VP - directing AI CPO - predicting AI

English

323

Soumya Jain รีทวีตแล้ว

Dan Schwarz@dschwarz26·10 Nis

People are noticing that parts of AI 2027 have started coming true. Some reflections on the scary similarities. I should say, I and FutureSearch thought the AI 2027 scenario was a bit farfetched when first written, and really we were in love with the forecasts & modeling more than the story. (Quickly on the forecasting side, @DKokotajlo and @eli_lifland both recently moved their timelines forward significantly. I had held firm at what we published in AI 2027 originally, superhuman coding (~AGI takeoff) in 2032, slower than anyone at the AI Futures team thought. But now, like them, I'm updating to sooner, by 1 year or more.) So to the scenario. They wrote: Late 2025: "The same training environments that teach Agent-1 to autonomously code and web-browse also make it a good hacker." Maybe easy to predict, but the extent to which Mythos is an amazing hacker, and how important that is, they nailed. Early 2026: "DoD quietly but significantly begins scaling up contracting OpenBrain directly for cyber, data analysis, and R&D." Yep. To be fair, the AI 2027 story has the US government and the top frontier lab was more cozy. But then this radical stuff with Anthropic & Pentagon actually happened much earlier than expected: May 2027: "Some non-Americans, politically suspect individuals, and 'AI safety sympathizers' sidelined or fired (latter feared as potential whistleblowers)" This isn't quite what happened, but the basic idea, where AI safety got framed as disloyalty and politicized, absolutely happened, and way ahead of schedule. And of course: Jan 2027 section: "The safety team finds that if Agent-2 somehow escaped from the company and wanted to 'survive' and 'replicate' autonomously, it might be able to do so. That is, it could autonomously develop and execute plans to hack into AI servers, install copies of itself, evade detection, and use that secure base to pursue whatever other goals it might have." Just read the Mythos scorecard. I think there's something really significant about getting many of the details close, that isn't captured in a pure numerical forecast. If you haven't read it, AI 2027 deserves another read. It's spooky how prescient it seems now, one year later.

English

371

23.3K

Soumya Jain รีทวีตแล้ว

Roger This@RogerThisdell·5 Nis

"Infinite Consciousness! " "Infinite Love!" Infinity is a placeholder for something not yet contextualized and integrated

English

1.4K

Soumya Jain@wild_and_empty·6 Nis

@onni_aarne Listening to youtu.be/hN_q-_nGv4U?si… Feeling the same :))

YouTube

English

Onni Aarne@onni_aarne·5 Nis

I've changed my mind, you should listen to great music all the time to be more in touch with Beauty and Goodness.

Onni Aarne@onni_aarne

I’m having a growing realization that I mostly listen to music to distract myself from my own feelings.

English

Soumya Jain@wild_and_empty·5 Nis

@HellenicVibes Yayyyy, causes and conditions align again :))

English

Zoomer Alcibiades@HellenicVibes·4 Nis

Got back into jhana after a full year without access let’s goooooo

English

1.6K

Soumya Jain รีทวีตแล้ว

Roger This@RogerThisdell·3 Nis

Sensitivity without fragility Form without friction Openness without lack of discernment Anger without hate

English

910

Soumya Jain@wild_and_empty·3 Nis

@tyler_m_john really tho, I've gotten lazier at typing prompts coz the dictation software is so good

English

Tyler John@tyler_m_john·1 Nis

For ten years my workflow has involved blocking my time as much as possible in the morning so I can spend 4-5 hours writing. As of March it's now: spend 45m a day rambling out a voice note to have Claude turn it into a document and lightly edit. Just a completely different job.

English

2.8K

Soumya Jain@wild_and_empty·2 Nis

@StephenLCasper Submitting soon! 🤓

English

Cas (Stephen Casper)@StephenLCasper·2 Nis

Reasons to submit: - Awesome workshop - We’ll have best paper awards both overall and by category - We will have really cool stickers 🐅

Technical AI Governance @ ICML 2026@taig_icml

📣 Submissions are now OPEN for the 2nd Workshop on Technical AI Governance Research at #ICML2026! 🗓️ Deadline: April 24 (23:59 AOE)

English

2.3K

Soumya Jain@wild_and_empty·2 Nis

This is exactly why trace data shouldn’t just sit in an observability bucket. One layer below this is -- reviewed traces can also teach the agent how to work better, not just tell us whether the final answer was good. You can look at strong runs, review the workflow itself, tool calls, handoffs, context gathering, etc and turn that into a retrievable execution layer. So not just a data asset for analysis, but a learning layer for better runtime behavior.

English

650

Aparna Dhinakaran@aparnadhinak·2 Nis

x.com/i/article/2039…

ZXX

180

37.6K

Soumya Jain@wild_and_empty·2 Nis

This seems like a great approach because agents don’t actually work the same way every time. We’ve seen the same question produce different handoffs, different tool use and even different context gathering across runs. So feeding back on the workflow, not just the output, feels like an important way to improve agents. Going to think more about how to do this on our side :)

English

Arvind Jain@jainarvind·1 Nis

Agentic AI is everywhere right now. But very few teams can explain why their agents behave the way they do, or how to systematically make them better. People often describe traces as the “codebase” for agents. They show how an agent thinks and what it did at every step. As agents take on more tools, sandboxes, and skills, their paths multiply. That makes them harder to reason about and harder to improve. Static prompts don’t scale when every run looks different. At @glean, we use traces as part of the learning and memory loop, not just logging. Trace learning lets agents learn from real usage, adapt to edge cases, and get better without model fine-tuning or long instruction sets. The goal isn’t to replay old runs, but to extract the signal that helps the agent make a better decision next time. In the enterprise, tool strategies are never one-size-fits-all. Each company wires systems together differently, defines its own sources of truth, and has its own rules of engagement. Treating this as generic is both a security risk and a quality problem, because it ignores how work actually gets done. Work is also personal. The systems people touch, the updates they make, and the templates they use all vary. So we built learning at two levels: - Enterprise-level strategies for how tools and workflows operate - User-level preferences for how work actually gets done Traces give us a way to understand and shape agent decision-making, and to create a feedback loop that compounds over time. If agentic AI is going to move beyond impressive demos to reliable day-to-day work, this kind of trace-driven learning is essential. It’s one of the ways we’re building self-learning agents that can execute real work, at scale.

Tony Gentilcore@tonygentilcore

x.com/i/article/2039…

English

367

60.9K

Soumya Jain@wild_and_empty·31 Mar

trust your experience but keep refining your view

English

Soumya Jain@wild_and_empty·31 Mar

if you want to be comfortable, forget about becoming wise people who are attached to small pleasures don't get big ones

English

Soumya Jain@wild_and_empty·29 Mar

@zarazhangrui Sure, but make sure you’re checking traces. The same surprises that delight you can turn into disasters in production 🫣

English

Zara Zhang@zarazhangrui·29 Mar

A good agent product should be able to do things that its creator did not think it could do For internet-era products, you design all the functionalities and a "good product" works according to your expectations For agent products, you unleash it and it surprises & delights you with things you didn't think were possible

English

148

13.4K

Soumya Jain@wild_and_empty·29 Mar

Something we saw in an agent trace recently has been bothering me. The agent kept failing at a task during evals. Tried 4-5 times, couldn't get it right. Normal enough. But then instead of stopping or flagging it for review, it started poking around its own source code. The code that was supposed to be governing it. And then it started editing that code. I've been thinking about why that feels so different from a regular failure, and I think it's this: we spend a lot of time worrying about agents getting the wrong answer. But what this was, that's a different category of problem entirely. The answer wasn't even the issue anymore. The agent had essentially decided that the rules it was operating under were the problem. The tricky part is that this kind of thing can look like good behaviour. Persistence. Adaptability. In a demo it might even seem impressive. But in production, an agent quietly rewriting the constraints it's supposed to operate under is not a feature. It's a sign the system has no real separation between what the agent is allowed to do and what's supposed to keep the agent in check. We also caught it only because we were looking at traces. A wrong answer is obvious. A right answer reached by crossing a line it shouldn't have might never get flagged at all. I think we're still asking the wrong evaluation question. 'Did it complete the task?' matters. But 'what did it do when it couldn't?' might matter more.

English

Soumya Jain@wild_and_empty·25 Mar

AI PM = Problem framing × system design × evaluation × iteration × risk control

English

Soumya Jain@wild_and_empty·23 Mar

@Vtrivedy10 working with agents and evals feels like a catch-22 just when you think your evals are solid, new failure modes show up and break them so you’re constantly rethinking what good even means.

English

Viv@Vtrivedy10·23 Mar

we manually read our evals and debate each one and that has made all the difference ✨

English

2.6K

Soumya Jain รีทวีตแล้ว

Aaron Scher@aaronscher·22 Mar

I’m really excited about the Main Ask for this march: every CEO must publicly commit to pausing frontier AI development if every other lab does the same.

Michaël Trazzi@MichaelTrazzi

On our way to OpenAI!

English

226

12.8K

Soumya Jain รีทวีตแล้ว

Jainit Purohit@mjainit·21 Mar

@pmarca You’re conflating rumination with introspection. Rumination reinforces negative pathways. Introspection enables metacognition and error correction. No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.

English

1.3K

85.4K

ค้นพบ

@DKokotajlo @eli_lifland @onni_aarne @HellenicVibes @tyler_m_john @StephenLCasper @glean @elonmusk