Jainit Purohit

687 posts

Jainit Purohit

@mjainit

CTO @ Terrabase, AI agents, eval harnesses, decision infrastructure Meditation | Dhamma | phenomenology

Katılım Ocak 2010

1.3K Takip Edilen294 Takipçiler

Sabitlenmiş Tweet

Jainit Purohit@mjainit·30 Ara

May you find release with boundless freedom, With formless expansion stretching in all directions, Boundless, immeasurable, limitless, with goodwill in its core, and filled with joy, reaching far, evermore.

English

1.2K

Jainit Purohit@mjainit·17 Nis

@thsottiaux Codex has been great, thanks! One issue though on Mac: in long threads, scrolling up sometimes jumps to random positions much higher in the thread. It breaks context and you have to scroll back down again. Pretty frustrating, would be great to see this fixed.

English

6.6K

Tibo@thsottiaux·16 Nis

Codex Compute efficient ✅ Always up, never down ✅ Best at hardcore engineering ✅ Crazy good app, first to escape the terminal ✅

English

454

187

5.1K

2.4M

Jainit Purohit retweetledi

Harrison Chase@hwchase17·11 Nis

x.com/i/article/2042…

ZXX

109

519

3.9K

1.9M

Jainit Purohit retweetledi

Soumya Jain@wild_and_empty·2 Nis

This is exactly why trace data shouldn’t just sit in an observability bucket. One layer below this is -- reviewed traces can also teach the agent how to work better, not just tell us whether the final answer was good. You can look at strong runs, review the workflow itself, tool calls, handoffs, context gathering, etc and turn that into a retrievable execution layer. So not just a data asset for analysis, but a learning layer for better runtime behavior.

English

650

Jainit Purohit@mjainit·22 Mar

@manthanguptaa same fear cycle every year. first "software engineering is solved", now "agents running companies". wonder how many actually survive production and real users most teams still fighting evals and edge cases. every line shipped is future maintenance debt.

English

183

Manthan Gupta@manthanguptaa·22 Mar

AI Twitter lately feels like pure fear mongering. If you are not spinning up 20 agents in parallel or pushing 200 commits a day, you are “falling behind.” If your agents aren't earning you a million then you will be poor. Most of it feels lot like noise right now

English

110

515

20.8K

Jainit Purohit@mjainit·21 Mar

@pmarca You’re conflating rumination with introspection. Rumination reinforces negative pathways. Introspection enables metacognition and error correction. No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.

English

1.3K

85.4K

Marc Andreessen 🇺🇸@pmarca·21 Mar

My big conclusion from this week: Introspection causes emotional disorders.

English

1.6K

647

10.6K

52.2M

Jainit Purohit@mjainit·21 Mar

@elonmusk You’re conflating rumination with introspection. Rumination reinforces negative pathways. Introspection enables metacognition and error correction. No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.

English

700

Elon Musk@elonmusk·21 Mar

Reinforcing negative neural pathways via therapy or introspection is a recipe for misery. Don’t cut a rut in the road.

Marc Andreessen 🇺🇸@pmarca

My big conclusion from this week: Introspection causes emotional disorders.

English

3.8K

6.7K

76.4K

52.1M

Jainit Purohit@mjainit·18 Mar

yep, this is basically the exact workflow I use right now. I intended to automate the entire self-improvement loop but apart from obvious overfitting issues, you start seeing unknown behaviors and unnecessary layers of abstraction creeping in. The last remaining stabilizing step I had to add was a human-in-the-loop. very interesting you mentioned human-in-the-loop "today" because it probably won’t be required in the future. part of why karpathy’s autoresearch works well is because it is optimizing a single clean objective: validation bits per byte. whereas an agent harness is much messier. you end up having to measure and tune multiple things at once like trajectory correctness, tool call success rate, end outcome quality, handoff efficiency, context groundedness, and a bunch of other interacting metrics.

English

Viv@Vtrivedy10·17 Mar

exciting avenues where evals/specs become the base language to build agents: - start with a base harness, pretty barebones - specify a goal to your agent. build up exactly what you mean with the agent - map your crafted goal to specs/evals with the agent. Together you think really hard about “what do I want the agent behavior to be” - agent loops and adjusts the harness until a threshold of evals pass - human in the loop today for cheating/overfitting Evals are a great language to specify behavior Every row in your Eval dataset is a little vector that shifts the agent definition towards behavior to make that Eval pass

English

Jainit Purohit retweetledi

Hamel Husain@HamelHusain·9 Ara

Made this video to explain evals

English

458

85.3K

Jainit Purohit@mjainit·12 Mar

agents are commodity edge = context quality × harness design × eval loops compounding comes from faster: trace → failure → tweak → re-run cycles

English

128

Jainit Purohit@mjainit·12 Mar

@Vtrivedy10 are you guys thinking about trace —> eval —> harness improvement loops in langsmith? feels like missing infra for fast harness iteration

English

Jainit Purohit@mjainit·12 Mar

@Vtrivedy10 agreed. big unlock now is trace-driven iteration not just measuring runs, but learning from traces what and how to tweak in harness to hill climb fast building with deepagents around this. measure, eval, tweak, repeat for long-horizon data tasks

English

Jainit Purohit@mjainit·11 Mar

another solid piece by @vtrivedy10 been a fan since he coined haas agent performance is now as much a harness problem as a model problem same model, different harness, wildly different outcomes we’re still early. a lot of the alpha is still here

Viv@Vtrivedy10

x.com/i/article/2031…

English

305

Jainit Purohit retweetledi

Alex Mordvintsev@zzznah·4 Mar

Every time you read or listen to something, you're running untrusted code on your wetware with no sandbox. Reading is code execution. Text is the oldest exploit. Choose your inputs carefully. Including this post.

English

113

183

1.5K

93.8K

Jainit Purohit retweetledi

Viv@Vtrivedy10·17 Şub

x.com/i/article/2022…

ZXX

136

1.3K

540.6K

Jainit Purohit retweetledi

ThePrimeagen@ThePrimeagen·15 Şub

Before ai, you were bad decisions you could feel how bad they were and often could correct course on the after a couple weeks. Slowing down often helped make decisions I am happy about for months or years Now our brick laying meme is the norm

English

643

36.2K

Jainit Purohit retweetledi

ThePrimeagen@ThePrimeagen·15 Şub

I hate these "coding isn't the hard part" tweets I have been a part of and seen several companies not just struggling with "the right decision" but the culmination of their past technical decisions. AI won't magically make this go away. Lines of Code is still a liability and producing it faster doesn't change or reduce it, if anything it increases liability. Room temperature Twitter take strikes yet again

English

241

213

4.6K

236.6K

Jainit Purohit@mjainit·4 Şub

LangChain is moving so fast that upgrading to the latest version and aligning my harness with new features has become a routine every few days!!! 🚀

LangChain JS@LangChain_JS

🚀 deepagents @1.6.2 is out! • Skills and memory now properly restore from StateBackend checkpoints • Fixed infinite loop when agents read large files • Removed unnecessary REMOVE_ALL_MESSAGES operations in PatchToolCallsMiddleware — fewer message mutations during tool call handling Upgrade: npm i @langchain/deepagents@latest github.com/langchain-ai/d…

English

Jainit Purohit retweetledi

Harrison Chase@hwchase17·28 Oca

🧵 Context Management for DeepAgents We wrote an in depth blog on how we do context management in DeepAgents, our open source agent harness

Mason Daugherty@masondrxy

x.com/i/article/2015…

English

353

53.7K

Jainit Purohit retweetledi

LangChain@LangChain·14 Oca

📊 New blog: Choosing the right multi-agent architecture Start with a single agent. But when you need multi-agent capabilities, pick the right pattern: 👥 Subagents - Centralized orchestration for multiple domains 💡 Skills - Progressive disclosure, load capabilities on-demand 🔄 Handoffs - Sequential workflows with state transitions 🧭 Router - Parallel dispatch across specialized agents Includes performance benchmarks, decision framework, and code examples. 📖 Read the full guide: blog.langchain.com/choosing-the-r…

English

354

34.9K

Jainit Purohit retweetledi

Soumya Jain@wild_and_empty·9 Oca

Goal for this lifetime: Fierce in discipline, soft in heart, empty of self, precise in craft.

English

Keşfet

@thsottiaux @manthanguptaa @pmarca @elonmusk @Vtrivedy10 @vtrivedy10 @BarackObama @taylorswift13