Devansh Jain
116 posts

Devansh Jain
@devanshrjain
agent artist @letta_ai | @LTIatCMU @bitspilaniindia
San Francisco, CA Katılım Temmuz 2015
823 Takip Edilen199 Takipçiler

Everyone please welcome @xtinatong, one of the wonderful engineers at @Letta_AI
English
Devansh Jain retweetledi

@swyx We built a cowork replica powered by our stateful agents: github.com/letta-ai/letta…
English

@adityaag @Letta_AI's stateful agents enable this since memory and context are managed server-side: github.com/letta-ai/letta…
English
Devansh Jain retweetledi

Introducing Context Repositories: git-tracked files for storing agent context
Agents can now write scripts and spawn memory subagents to programmatically restructure prior context and learn in token-space.
letta.com/blog/context-r…
English

There seem to be two dominant strategies for agent memory:
1. Single compact memory file (CLAUDE.md, MEMORY.md, etc.): user + agent-owned and updated during runs, always loaded at session-start
2. Full session logs with semantic search: backend system where all memory files are chunked, vectorized and embedded, and either dynamically RAGed or searched as an explicit CLI/skill
The problem is that #1 is fast, simple, and stable, but is either horribly incomplete or horribly bloated, whereas #2 can handle memory over larger datasets, but is slower, brittle, less secure, requires constant re-indexing, and the results can still be kinda shit without some very intelligent chunking strats (which are often datatype specific).
Both of these seem...wrong to me. Maybe this is fine for coding applications where a lot of the on-the-job capability is already stored in Claude's weights (syntax, schema, process etc.). In that case, maybe #1 + grep/glob is fine?
But as a business user, I feel like I have a LOT more context and memories that I need Claude to remember--content that it hasn't been RLed to death on and which is very costly to grep every time it needs to retrieve it (and it still needs to remember to retrieve it). And sometimes those memories are updating in weird ways (client A has XYZ needs in one week, but XY+AB the next; I don't want Z to show back up again or AB to be missed).
I also want to know what Claude knows. I don't want an opaque vector database; I want files that I can read that are structured in ways I can understand (ideally with links to documents).
This leads me to a growing belief that the management of memory itself is a two agent task: consciousness and subconsciousness. The conscious agent that actually takes action is responsible for writing down what's important, mostly to facilitate its own cross-compaction memory. The subconscious agent is responsible for synthesizing all of that afterwards (or perhaps even simultaneously?), and organizing memories into useful memory structures (Obsidian-style links anyone?). You can then search or RAG over those memory docs.
Obviously, this is super costly, doesn't even solve file retrieval (a memory of what something was does not necessarily tell you what/where it is now), and is probably "insufficiently bitter-lesson-pilled". But idk. There's a reason we have a subconscious, and a lesson I learned in consulting is that making sense of what matters in order to take an action is a very separate activity from organizing what matters for future reference. It is hard to build a plane while flying it, but harder still to build the training manual. The latter requires putting yourself in the mind of the uninitiated, to intuit how they might be confused--it almost requires clearing your mind of the very tacit knowledge and instinct that allows you to perform the former tasks. Different activity, different mindset, different agent.
alex fazio@alxfazio
> active recall > it's just updating the claude.md until it turns into a useless 6k line context rot never trust your clankers to manage their own memory
English
Devansh Jain retweetledi

Introducing LettaBot 👾: A proactive personal assistant with perpetual memory -- living on your computer
- import skills from @openclaw hub & @vercel
- message from @telegram @signalapp @SlackHQ @WhatsApp
- built on (and with) Letta Code
github.com/letta-ai/letta…
English
Devansh Jain retweetledi
Devansh Jain retweetledi
Devansh Jain retweetledi

Claude Skills might be the new MCP - but does it work outside of @AnthropicAI?
Find out with the "Skills Suite" in Context-Bench, our benchmark for Agentic Context Engineering
GPT-5 and GLM 4.6 excel at skill-use, but smaller models (e.g. GPT-5-mini) struggle
English
Devansh Jain retweetledi
Devansh Jain retweetledi
Devansh Jain retweetledi
Devansh Jain retweetledi

While Sonnet-4.5 remains a popular choice among developers, our benchmarks show it underperforms GPT-5 on SRE-related tasks when both are run with default parameters.
However, using the @notdiamond_ai prompt adaptation platform, Sonnet-4.5 achieved up to a 2x performance improvement in some test cases, effectively closing the gap between the models.
Learn more about our findings or run our benchmark with a single command using @GroqInc's OpenBench.
English
Devansh Jain retweetledi

(Thu Oct 9, 11:00am–1:00pm) Poster Session 5
𝐏𝐨𝐬𝐭𝐞𝐫 #𝟏𝟑: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages; w/ amazing @kpriyanshu256, @devanshrjain
PolyGuard is among the SOTA multilingual safety moderation tool + we release comprehensive multilingual evaluation suite and training data! (4/N)
Data & Model: huggingface.co/collections/To…

Liwei Jiang@liweijianglw
Although I can’t attend #COLM2025 in person this year, my 𝐀𝐁𝐒𝐎𝐋𝐔𝐓𝐄𝐋𝐘 𝐈𝐍𝐂𝐑𝐄𝐃𝐈𝐁𝐋𝐄 collaborators and co-organizers are running some exciting sessions. Be sure to check them out! (1/N)
English
Devansh Jain retweetledi

Day 3 (Thu Oct 9), 11:00am–1:00pm, Poster Session 5
Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages — led by @kpriyanshu256, @devanshrjain
Poster #74: Fluid Language Model Benchmarking — led by @vjhofmann
English
Devansh Jain retweetledi











