Riya
54 posts


had a blast on the pod (first one kinda nervous 😅), big shoutout to @himanshustwts just felt like us riffing on agent engineering, open source & research ❤️ some fun highlights - working backwards from the model’s capabilities/flaws and building systems (a harness) around them to accomplish Tasks - Traces! they’re our signal for continual learning and self-improving agents. They capture agent errors + inefficiencies, pointing compute at understanding traces helps us update our agents and generate evals and training environments so so the modem doesn’t make these mistakes in the future. This becomes a moat for many teams - it’s ok to engineer a good harness with opinionated built-in skills, prompts, and execution patterns like task decomposition. you just want to solve your task with the intelligence we have today, go do that and adjust as new models are released - open harnesses and open models are an on-ramp for teams to own their intelligence. Traces are an improvement signal, we need to use compute to understand them at scale - I think I need to update my priors on how quickly RLMs and computer use went from interesting to pretty usable and more research looks like it’s coming, really happy to see and a note to retry the tools in these spaces more often! - AI is super broad, we’re all figuring it out, doing stuff that interests you and telling ppl about it is a great way to grow yourself and meet other great ppl builders all of Himanshu’s pod episodes are awesome everyone should check out :)

Harness, Memory, Context Fragments, & the Bitter Lesson this is a work in progress mental dump on interesting intersections between how we use and design a harness, implications for memory being accumulated over long timescales, and the search bitter lesson we can’t escape this is v30+, HTML diagrams help me iteratively refine + chat to roughly “see” and alter the mental model Harnesses & Context Fragments: a very important job of the harness is to efficiently & correctly route data within its boundaries into the context window boundary for computation to happen the context window is a precious artifact. Harnesses make decisions on how to populate, manage, edit, and organize it so agents can do work. Each loaded object can be thought of as a Context Fragment and represents an explicit decision by the user and harness designer of what needs a model needs to do work at any given time. many ideas on externalizing objects + loading into the context window are pioneered and very well described by @a1zhang with RLMs Experiential Memory: we’re in the very early days of deploying agents and agents produce massive amounts of data in every interaction they have. this is akin to humans doing things and remembering things they did. however agent memory has a massive advantage as it can be accumulated across all agents which are easily forked and duplicated (unlike humans). @dwarkesh_sp does a good talking about this massive benefit of artificial systems memory can be treated as an externalized object. the harness is tasked with doing good contextualized retrieval which means pulling in the right data from accumulated memories across all agent interactions Search & The Bitter Lesson: As we deploy agents in our world over year timescales, there is going to be a hyper-exponential in the amount of data produced by those agents. We should want to: 1. Own that data for ourselves. Open ecosystems are important here 2. Use that data This means that we’ll have to search over, distill, and organize massive amounts of data. Our brain is exceptional at doing this. Both contextually using prior experience and mostly committing the right stuff to memory with enough intentional practice. Our current infrastructure systems and algorithms will be put to the test and often break as we get used to this new data regime some open questions: - how do we efficiently distill experiences (Traces) into higher level memory primitives that capture the important parts? How do we do this over ultra long time horizons? - How much of the future is Search just-in-time vs Search that gets integrated into model weights? - How do we make models much better at self-managing their context window? How do we reduce error rates in recursively allowing agents to operate over external objects? i’ll be expanding on, altering, and adjusting these mental models but these feel like an important subset to me on the future of designing agents practically








