Devansh Jain

116 posts

Devansh Jain

@devanshrjain

agent artist @letta_ai | @LTIatCMU @bitspilaniindia

San Francisco, CA Katılım Temmuz 2015

823 Takip Edilen199 Takipçiler

Devansh Jain@devanshrjain·2d

@cameron_pfiffer @xtinatong @Letta_AI 👋 xtina’s outer voices

English

102

Cameron@cameron_pfiffer·2d

Everyone please welcome @xtinatong, one of the wonderful engineers at @Letta_AI

English

289

Devansh Jain retweetledi

Letta@Letta_AI·6d

x.com/i/article/2033…

ZXX

162

27.3K

Devansh Jain@devanshrjain·6 Mar

@swyx We built a cowork replica powered by our stateful agents: github.com/letta-ai/letta…

English

swyx@swyx·5 Mar

ok are there any open source Claude Cowork clones because I can no longer function without a cowork pls recommend or i will build

English

116

65K

Devansh Jain@devanshrjain·5 Mar

@adityaag @Letta_AI's stateful agents enable this since memory and context are managed server-side: github.com/letta-ai/letta…

English

Aditya Agarwal@adityaag·4 Mar

Product Idea: There should be a shared "Claude Code" instance that can be passed around different people. It's dumb to have the software repo as an intermediate step.

English

109

503

99.6K

Devansh Jain@devanshrjain·21 Şub

@Letta_AI

GIF

QME

Letta@Letta_AI·21 Şub

🏰 The memory palace: View your agent's context and memory evolution in Letta Code Run `/memory` and enter "o" to open the palace locally in your browser

English

7.8K

Devansh Jain retweetledi

Letta@Letta_AI·13 Şub

Introducing Context Repositories: git-tracked files for storing agent context Agents can now write scripts and spawn memory subagents to programmatically restructure prior context and learn in token-space. letta.com/blog/context-r…

English

640

115K

Devansh Jain@devanshrjain·5 Şub

@corsaren arxiv.org/abs/2504.13171

QME

corsaren@corsaren·5 Şub

There seem to be two dominant strategies for agent memory: 1. Single compact memory file (CLAUDE.md, MEMORY.md, etc.): user + agent-owned and updated during runs, always loaded at session-start 2. Full session logs with semantic search: backend system where all memory files are chunked, vectorized and embedded, and either dynamically RAGed or searched as an explicit CLI/skill The problem is that #1 is fast, simple, and stable, but is either horribly incomplete or horribly bloated, whereas #2 can handle memory over larger datasets, but is slower, brittle, less secure, requires constant re-indexing, and the results can still be kinda shit without some very intelligent chunking strats (which are often datatype specific). Both of these seem...wrong to me. Maybe this is fine for coding applications where a lot of the on-the-job capability is already stored in Claude's weights (syntax, schema, process etc.). In that case, maybe #1 + grep/glob is fine? But as a business user, I feel like I have a LOT more context and memories that I need Claude to remember--content that it hasn't been RLed to death on and which is very costly to grep every time it needs to retrieve it (and it still needs to remember to retrieve it). And sometimes those memories are updating in weird ways (client A has XYZ needs in one week, but XY+AB the next; I don't want Z to show back up again or AB to be missed). I also want to know what Claude knows. I don't want an opaque vector database; I want files that I can read that are structured in ways I can understand (ideally with links to documents). This leads me to a growing belief that the management of memory itself is a two agent task: consciousness and subconsciousness. The conscious agent that actually takes action is responsible for writing down what's important, mostly to facilitate its own cross-compaction memory. The subconscious agent is responsible for synthesizing all of that afterwards (or perhaps even simultaneously?), and organizing memories into useful memory structures (Obsidian-style links anyone?). You can then search or RAG over those memory docs. Obviously, this is super costly, doesn't even solve file retrieval (a memory of what something was does not necessarily tell you what/where it is now), and is probably "insufficiently bitter-lesson-pilled". But idk. There's a reason we have a subconscious, and a lesson I learned in consulting is that making sense of what matters in order to take an action is a very separate activity from organizing what matters for future reference. It is hard to build a plane while flying it, but harder still to build the training manual. The latter requires putting yourself in the mind of the uninitiated, to intuit how they might be confused--it almost requires clearing your mind of the very tacit knowledge and instinct that allows you to perform the former tasks. Different activity, different mindset, different agent.

alex fazio@alxfazio

> active recall > it's just updating the claude.md until it turns into a useless 6k line context rot never trust your clankers to manage their own memory

English

125

17.8K

Devansh Jain retweetledi

Letta@Letta_AI·29 Oca

Introducing LettaBot 👾: A proactive personal assistant with perpetual memory -- living on your computer - import skills from @openclaw hub & @vercel - message from @telegram @signalapp @SlackHQ @WhatsApp - built on (and with) Letta Code github.com/letta-ai/letta…

English

243

36.2K

Devansh Jain retweetledi

Letta@Letta_AI·16 Ara

We're releasing Letta Code, a memory-first coding agent - open source (apache 2.0) - model agnostic - portable agent learning and memory

English

123

1.3K

358.1K

Devansh Jain retweetledi

Letta@Letta_AI·3 Ara

Today we’re excited to release our technical report on “Skill Learning" Agents can use skills *learned* from past trajectories and feedback to improve performance on future tasks (as shown in our evaluation with Terminal Bench).

English

902

Devansh Jain retweetledi

Sarah Wooders@sarahwooders·8 Kas

Claude Skills might be the new MCP - but does it work outside of @AnthropicAI? Find out with the "Skills Suite" in Context-Bench, our benchmark for Agentic Context Engineering GPT-5 and GLM 4.6 excel at skill-use, but smaller models (e.g. GPT-5-mini) struggle

English

2.1K

Devansh Jain retweetledi

Letta@Letta_AI·8 Kas

Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.

English

8.8K

Devansh Jain retweetledi

Letta@Letta_AI·23 Eki

What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? 🧪 Introducing 𝗟𝗲𝘁𝘁𝗮 𝗘𝘃𝗮𝗹𝘀: a fully open source evaluation framework for stateful agents

English

13K

Devansh Jain retweetledi

Kshitish Ghate@GhateKshitish·14 Eki

🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences? With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵

English

6.5K

Devansh Jain retweetledi

Rootly@rootlyhq·8 Eki

While Sonnet-4.5 remains a popular choice among developers, our benchmarks show it underperforms GPT-5 on SRE-related tasks when both are run with default parameters. However, using the @notdiamond_ai prompt adaptation platform, Sonnet-4.5 achieved up to a 2x performance improvement in some test cases, effectively closing the gap between the models. Learn more about our findings or run our benchmark with a single command using @GroqInc's OpenBench.

English

3.3K

Devansh Jain retweetledi

Liwei Jiang@liweijianglw·8 Eki

(Thu Oct 9, 11:00am–1:00pm) Poster Session 5 𝐏𝐨𝐬𝐭𝐞𝐫 #𝟏𝟑: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages; w/ amazing @kpriyanshu256, @devanshrjain PolyGuard is among the SOTA multilingual safety moderation tool + we release comprehensive multilingual evaluation suite and training data! (4/N) Data & Model: huggingface.co/collections/To…

Liwei Jiang@liweijianglw

Although I can’t attend #COLM2025 in person this year, my 𝐀𝐁𝐒𝐎𝐋𝐔𝐓𝐄𝐋𝐘 𝐈𝐍𝐂𝐑𝐄𝐃𝐈𝐁𝐋𝐄 collaborators and co-organizers are running some exciting sessions. Be sure to check them out! (1/N)

English

3.8K

Devansh Jain retweetledi

Maarten Sap (he/him)@MaartenSap·6 Eki

Day 3 (Thu Oct 9), 11:00am–1:00pm, Poster Session 5 Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages — led by @kpriyanshu256, @devanshrjain Poster #74: Fluid Language Model Benchmarking — led by @vjhofmann

English

656

Devansh Jain retweetledi

Andy Liu@uilydna·2 Eki

🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.

English

11.5K

Devansh Jain@devanshrjain·19 Ağu

@vipul_1011 Congrats Vipul!

English

Vipul Gupta@vipul_1011·19 Ağu

Life update: Completed my PhD. From never wanting to get a masters degree to here, it’s been a memorable journey. Time flies, 4 years went by quite fast.

English

366

31.4K

Devansh Jain@devanshrjain·31 Tem

@madhu___s Heard good things about @attio!

English

Keşfet

@cameron_pfiffer @xtinatong @Letta_AI @swyx @adityaag @corsaren @openclaw @vercel