Devansh Jain

116 posts

Devansh Jain

Devansh Jain

@devanshrjain

agent artist @letta_ai | @LTIatCMU @bitspilaniindia

San Francisco, CA เข้าร่วม Temmuz 2015
823 กำลังติดตาม199 ผู้ติดตาม
swyx
swyx@swyx·
ok are there any open source Claude Cowork clones because I can no longer function without a cowork pls recommend or i will build
swyx tweet media
English
59
3
116
65K
Aditya Agarwal
Aditya Agarwal@adityaag·
Product Idea: There should be a shared "Claude Code" instance that can be passed around different people. It's dumb to have the software repo as an intermediate step.
English
109
14
503
99.6K
Letta
Letta@Letta_AI·
🏰 The memory palace: View your agent's context and memory evolution in Letta Code Run `/memory` and enter "o" to open the palace locally in your browser
English
6
7
67
7.8K
Devansh Jain รีทวีตแล้ว
Letta
Letta@Letta_AI·
Introducing Context Repositories: git-tracked files for storing agent context Agents can now write scripts and spawn memory subagents to programmatically restructure prior context and learn in token-space. letta.com/blog/context-r…
English
21
39
640
115K
corsaren
corsaren@corsaren·
There seem to be two dominant strategies for agent memory: 1. Single compact memory file (CLAUDE.md, MEMORY.md, etc.): user + agent-owned and updated during runs, always loaded at session-start 2. Full session logs with semantic search: backend system where all memory files are chunked, vectorized and embedded, and either dynamically RAGed or searched as an explicit CLI/skill The problem is that #1 is fast, simple, and stable, but is either horribly incomplete or horribly bloated, whereas #2 can handle memory over larger datasets, but is slower, brittle, less secure, requires constant re-indexing, and the results can still be kinda shit without some very intelligent chunking strats (which are often datatype specific). Both of these seem...wrong to me. Maybe this is fine for coding applications where a lot of the on-the-job capability is already stored in Claude's weights (syntax, schema, process etc.). In that case, maybe #1 + grep/glob is fine? But as a business user, I feel like I have a LOT more context and memories that I need Claude to remember--content that it hasn't been RLed to death on and which is very costly to grep every time it needs to retrieve it (and it still needs to remember to retrieve it). And sometimes those memories are updating in weird ways (client A has XYZ needs in one week, but XY+AB the next; I don't want Z to show back up again or AB to be missed). I also want to know what Claude knows. I don't want an opaque vector database; I want files that I can read that are structured in ways I can understand (ideally with links to documents). This leads me to a growing belief that the management of memory itself is a two agent task: consciousness and subconsciousness. The conscious agent that actually takes action is responsible for writing down what's important, mostly to facilitate its own cross-compaction memory. The subconscious agent is responsible for synthesizing all of that afterwards (or perhaps even simultaneously?), and organizing memories into useful memory structures (Obsidian-style links anyone?). You can then search or RAG over those memory docs. Obviously, this is super costly, doesn't even solve file retrieval (a memory of what something was does not necessarily tell you what/where it is now), and is probably "insufficiently bitter-lesson-pilled". But idk. There's a reason we have a subconscious, and a lesson I learned in consulting is that making sense of what matters in order to take an action is a very separate activity from organizing what matters for future reference. It is hard to build a plane while flying it, but harder still to build the training manual. The latter requires putting yourself in the mind of the uninitiated, to intuit how they might be confused--it almost requires clearing your mind of the very tacit knowledge and instinct that allows you to perform the former tasks. Different activity, different mindset, different agent.
alex fazio@alxfazio

> active recall > it's just updating the claude.md until it turns into a useless 6k line context rot never trust your clankers to manage their own memory

English
15
3
125
17.8K
Devansh Jain รีทวีตแล้ว
Letta
Letta@Letta_AI·
We're releasing Letta Code, a memory-first coding agent - open source (apache 2.0) - model agnostic - portable agent learning and memory
English
51
123
1.3K
358.2K
Devansh Jain รีทวีตแล้ว
Letta
Letta@Letta_AI·
Today we’re excited to release our technical report on “Skill Learning" Agents can use skills *learned* from past trajectories and feedback to improve performance on future tasks (as shown in our evaluation with Terminal Bench).
Letta tweet media
English
1
4
11
902
Devansh Jain รีทวีตแล้ว
Sarah Wooders
Sarah Wooders@sarahwooders·
Claude Skills might be the new MCP - but does it work outside of @AnthropicAI? Find out with the "Skills Suite" in Context-Bench, our benchmark for Agentic Context Engineering GPT-5 and GLM 4.6 excel at skill-use, but smaller models (e.g. GPT-5-mini) struggle
English
2
5
14
2.1K
Devansh Jain รีทวีตแล้ว
Letta
Letta@Letta_AI·
Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.
Letta tweet media
English
1
2
18
8.8K
Devansh Jain รีทวีตแล้ว
Letta
Letta@Letta_AI·
What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? 🧪 Introducing 𝗟𝗲𝘁𝘁𝗮 𝗘𝘃𝗮𝗹𝘀: a fully open source evaluation framework for stateful agents
Letta tweet media
English
4
4
56
13K
Devansh Jain รีทวีตแล้ว
Kshitish Ghate
Kshitish Ghate@GhateKshitish·
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences? With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
Kshitish Ghate tweet media
English
1
17
68
6.5K
Devansh Jain รีทวีตแล้ว
Rootly
Rootly@rootlyhq·
While Sonnet-4.5 remains a popular choice among developers, our benchmarks show it underperforms GPT-5 on SRE-related tasks when both are run with default parameters. However, using the @notdiamond_ai prompt adaptation platform, Sonnet-4.5 achieved up to a 2x performance improvement in some test cases, effectively closing the gap between the models. Learn more about our findings or run our benchmark with a single command using @GroqInc's OpenBench.
English
3
9
20
3.3K
Devansh Jain รีทวีตแล้ว
Liwei Jiang
Liwei Jiang@liweijianglw·
(Thu Oct 9, 11:00am–1:00pm) Poster Session 5 𝐏𝐨𝐬𝐭𝐞𝐫 #𝟏𝟑: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages; w/ amazing @kpriyanshu256, @devanshrjain PolyGuard is among the SOTA multilingual safety moderation tool + we release comprehensive multilingual evaluation suite and training data! (4/N) Data & Model: huggingface.co/collections/To…
Liwei Jiang tweet media
Liwei Jiang@liweijianglw

Although I can’t attend #COLM2025 in person this year, my 𝐀𝐁𝐒𝐎𝐋𝐔𝐓𝐄𝐋𝐘 𝐈𝐍𝐂𝐑𝐄𝐃𝐈𝐁𝐋𝐄 collaborators and co-organizers are running some exciting sessions. Be sure to check them out! (1/N)

English
0
6
25
3.8K
Devansh Jain รีทวีตแล้ว
Maarten Sap (he/him)
Maarten Sap (he/him)@MaartenSap·
Day 3 (Thu Oct 9), 11:00am–1:00pm, Poster Session 5 Poster #13: PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages — led by @kpriyanshu256, @devanshrjain Poster #74: Fluid Language Model Benchmarking — led by @vjhofmann
English
0
2
5
656
Devansh Jain รีทวีตแล้ว
Andy Liu
Andy Liu@uilydna·
🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
Andy Liu tweet media
English
2
11
44
11.5K
Vipul Gupta
Vipul Gupta@vipul_1011·
Life update: Completed my PhD. From never wanting to get a masters degree to here, it’s been a memorable journey. Time flies, 4 years went by quite fast.
Vipul Gupta tweet media
English
27
1
367
31.4K