ทวีตที่ปักหมุด
Jainit Purohit
687 posts

Jainit Purohit
@mjainit
CTO @ Terrabase, AI agents, eval harnesses, decision infrastructure Meditation | Dhamma | phenomenology
เข้าร่วม Ocak 2010
1.3K กำลังติดตาม294 ผู้ติดตาม

@thsottiaux Codex has been great, thanks! One issue though on Mac: in long threads, scrolling up sometimes jumps to random positions much higher in the thread. It breaks context and you have to scroll back down again. Pretty frustrating, would be great to see this fixed.
English
Jainit Purohit รีทวีตแล้ว
Jainit Purohit รีทวีตแล้ว

This is exactly why trace data shouldn’t just sit in an observability bucket.
One layer below this is -- reviewed traces can also teach the agent how to work better, not just tell us whether the final answer was good. You can look at strong runs, review the workflow itself, tool calls, handoffs, context gathering, etc and turn that into a retrievable execution layer.
So not just a data asset for analysis, but a learning layer for better runtime behavior.
English

@manthanguptaa same fear cycle every year.
first "software engineering is solved", now "agents running companies".
wonder how many actually survive production and real users
most teams still fighting evals and edge cases. every line shipped is future maintenance debt.
English

@pmarca You’re conflating rumination with introspection.
Rumination reinforces negative pathways.
Introspection enables metacognition and error correction.
No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.
English

@elonmusk You’re conflating rumination with introspection.
Rumination reinforces negative pathways.
Introspection enables metacognition and error correction.
No cognitive tool is inherently good or bad. Outcomes depend on whether it produces emotional loops or better models of reality.
English

Reinforcing negative neural pathways via therapy or introspection is a recipe for misery. Don’t cut a rut in the road.
Marc Andreessen 🇺🇸@pmarca
My big conclusion from this week: Introspection causes emotional disorders.
English

yep, this is basically the exact workflow I use right now.
I intended to automate the entire self-improvement loop but apart from obvious overfitting issues, you start seeing unknown behaviors and unnecessary layers of abstraction creeping in. The last remaining stabilizing step I had to add was a human-in-the-loop.
very interesting you mentioned human-in-the-loop "today" because it probably won’t be required in the future.
part of why karpathy’s autoresearch works well is because it is optimizing a single clean objective: validation bits per byte.
whereas an agent harness is much messier. you end up having to measure and tune multiple things at once like trajectory correctness, tool call success rate, end outcome quality, handoff efficiency, context groundedness, and a bunch of other interacting metrics.
English

exciting avenues where evals/specs become the base language to build agents:
- start with a base harness, pretty barebones
- specify a goal to your agent. build up exactly what you mean with the agent
- map your crafted goal to specs/evals with the agent. Together you think really hard about “what do I want the agent behavior to be”
- agent loops and adjusts the harness until a threshold of evals pass
- human in the loop today for cheating/overfitting
Evals are a great language to specify behavior
Every row in your Eval dataset is a little vector that shifts the agent definition towards behavior to make that Eval pass
English
Jainit Purohit รีทวีตแล้ว

@Vtrivedy10 are you guys thinking about trace —> eval —> harness improvement loops in langsmith? feels like missing infra for fast harness iteration
English

@Vtrivedy10 agreed. big unlock now is trace-driven iteration
not just measuring runs, but learning from traces what and how to tweak in harness to hill climb fast
building with deepagents around this. measure, eval, tweak, repeat for long-horizon data tasks
English

another solid piece by @vtrivedy10
been a fan since he coined haas
agent performance is now as much a harness problem as a model problem
same model, different harness, wildly different outcomes
we’re still early. a lot of the alpha is still here
Viv@Vtrivedy10
English
Jainit Purohit รีทวีตแล้ว
Jainit Purohit รีทวีตแล้ว
Jainit Purohit รีทวีตแล้ว
Jainit Purohit รีทวีตแล้ว

I hate these "coding isn't the hard part" tweets
I have been a part of and seen several companies not just struggling with "the right decision" but the culmination of their past technical decisions.
AI won't magically make this go away. Lines of Code is still a liability and producing it faster doesn't change or reduce it, if anything it increases liability.
Room temperature Twitter take strikes yet again
English

LangChain is moving so fast that upgrading to the latest version and aligning my harness with new features has become a routine every few days!!! 🚀
LangChain JS@LangChain_JS
🚀 deepagents@1.6.2 is out! • Skills and memory now properly restore from StateBackend checkpoints • Fixed infinite loop when agents read large files • Removed unnecessary REMOVE_ALL_MESSAGES operations in PatchToolCallsMiddleware — fewer message mutations during tool call handling Upgrade: npm i @langchain/deepagents@latest github.com/langchain-ai/d…
English
Jainit Purohit รีทวีตแล้ว

🧵 Context Management for DeepAgents
We wrote an in depth blog on how we do context management in DeepAgents, our open source agent harness
Mason Daugherty@masondrxy
English
Jainit Purohit รีทวีตแล้ว

📊 New blog: Choosing the right multi-agent architecture
Start with a single agent. But when you need multi-agent capabilities, pick the right pattern:
👥 Subagents - Centralized orchestration for multiple domains
💡 Skills - Progressive disclosure, load capabilities on-demand
🔄 Handoffs - Sequential workflows with state transitions
🧭 Router - Parallel dispatch across specialized agents
Includes performance benchmarks, decision framework, and code examples.
📖 Read the full guide: blog.langchain.com/choosing-the-r…
English
Jainit Purohit รีทวีตแล้ว






