고정된 트윗
Samuel Agbede
798 posts

Samuel Agbede
@AgbedeSamuelD
Developer Advocate @Redis | Learning Agentic AI in public | 🇳🇬🏴
Glasgow, United Kingdom 가입일 Ekim 2022
116 팔로잉334 팔로워

@peaceitimi @theFCshow_ Completely agree. Dale Carnegie has that quote “to be interesting, be interested”
English

If you’ve ever wondered how I’m able to ask thoughtful questions as a host on @theFCshow_, here’s your answer.
And this goes beyond interviews. It’s the same skill that makes someone a better communicator in everyday life.
What are some tips that have helped you become a better communicator? Share with me👇
English

@richmondalake Followed some of the posts Richmond! Was inspired by the consistency 🙌🏾💯
English

I was reading about vector retrieval the other week and came across cross-encoders. I found it genuinely interesting, and it explains why reranking matters so much.
I have been working with embeddings for a while. Embed your documents, store the vectors, compare them at query time with cosine similarity, send the closest results to the LLM. Most people building RAG applications today are doing exactly this. It is the default approach and it makes sense that it's the default. It is fast, it scales, and it works well enough most of the time.
But there is something underneath it that is worth understanding.
When you embed a document, you get one vector back. One fixed point in vector space. That vector has to represent the document for every possible query that might ever come in, before it knows what anyone is going to ask.
A quick example is this: say we've got a restaurant review. "The pasta was incredible but the service was painfully slow." One query asks for good Italian food. Another asks for restaurants with bad service. Same sentence, completely different relevance depending on who is asking. The embedding already committed to a position before either query arrived.
What I found even more interesting is the negation problem. "I am having a good time" and "I am NOT having a good time" score high similarity in most bi-encoders. Same tokens. Opposite meaning. The architecture does not notice.
The reason this happens is architectural. Embedding models are bi-encoders. The document is encoded independently of the query. They never see each other until the similarity function. That separation is what makes them fast enough to use at scale. It is also what makes them contextually blind.
Cross-encoders work differently. Instead of embedding a single string, they accept two inputs together, the document and the query, and return a number showing how similar they are. Behind the scenes they use the same attention mechanism that LLMs use, which has proven effective at capturing semantic relationships. Because of this, the meaning that gets encoded for a document actually changes based on the question being asked.
The downside is speed. You have to go through all your documents and the attention mechanism is quite slow, so you cannot run a cross-encoder over a large document store at query time.
The pattern that works is: use a bi-encoder to retrieve the top 20 candidates quickly, then use a cross-encoder to rerank them, then send the top 5 (for instance) to the LLM.
The bi-encoder gives you recall. The cross-encoder gives you precision. You need both.
If you want to try it, cross-encoder/ms-marco-MiniLM-L-6-v2 from sentence-transformers is a good starting point. Drop it between retrieval and generation.
I am still learning about this and would love to know if anyone has run into retrieval quality issues that turned out to be architecture problems rather than chunking or prompting problems.

English
Samuel Agbede 리트윗함

Announcing Redis Iris. The context layer agents have been missing. Redis is already the most used database for agent data
Today we’re making it easier than ever with Iris.
redis.io/iris/
English

@parzival1213 Interesting, thanks Lakshman. I'd check it out!
English

@AgbedeSamuelD This is exactly why the browser layer needs more than raw HTML. Accessibility state, DOM context, screenshots, action history, and review points each catch different failures. FSB is built around that mix for real Chrome sessions. github.com/LakshmanTurlap…
English

I was building a browser agent the other month when I noticed something.
The tool (Playwright MCP) my agent called when it wanted to navigate a website didn’t return HTML. It returned an accessibility tree.
This was interesting because I’d have assumed we should return HTML. It’s the source of truth for a webpage. But then I thought about what my agent actually needed to do. It didn’t need to render anything. It simply needed to read content and interact with elements. For that job, HTML is full of noise. Styling, layout, structure that only makes sense if you’re a browser.
An accessibility tree strips all of that away.
That made me think about data formats differently. Every format carries assumptions about who’s on the receiving end.
A PDF was designed for print and human eyes. Columns, headers, layout instructions for a renderer. An API response was designed for a frontend, deeply nested, full of display metadata that a UI component needs and an agent doesn’t.
In a sense, every format encodes its intended receiver. The layout assumptions in a PDF are a fingerprint of the human eye. The nesting in an API response is a fingerprint of a UI component.
When we pipe these into an LLM without thinking, we’re not just sending information. We’re sending information wrapped in expectations built for someone else.
I wonder how often we do this blindly. Reaching for the “default” format rather than asking whether it actually serves the agent.
Going forward, the questions I now ask before wiring any data source into an agent: whose world was this format designed for and what does my agent need its format to look like?
English

What if memory extraction, management and retrieval were the function of one unified layer instead of several LLM tool-calls?
This was the main point of a recent talk I gave at @DevoxxUK.
When you build an agentic app that depends on memory, a common approach is to define a tool per source. For instance, a search_slack tool first. Then another one for Notion. Then Drive, then meeting transcripts etc. Each new source brings its own retrieval tool, and the agent's tool surface grows one source at a time.
Over time, the LLM starts moving from a reasoning engine to a router. Plus, each tool definition also eats valuable context. The other day, I read an interesting paper summary about tool selection degrading with scale, especially when the tool descriptions are very similar. All of these points in one direction - LLMs shouldn't have to sift through many tools when it wants retrieval.
Here's the interesting idea. What if memory search and retrieval were the function of one unified layer instead? The LLM issues a single query. Behind the scenes, the layer handles retrieval across sources, deduplication, ranking, and provenance. With this, the source becomes metadata on a retrieved memory, not something the LLM uses for routing. This makes your LLM 𝘀𝗼𝘂𝗿𝗰𝗲-𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰 𝗯𝘂𝘁 𝗻𝗼𝘁 𝘀𝗼𝘂𝗿𝗰𝗲-𝗯𝗹𝗶𝗻𝗱.
And it's not just retrieval that needs to be answered when working with memory. There are two other primary challenges: 𝗺𝗲𝗺𝗼𝗿𝘆 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 𝗱𝘂𝗿𝗶𝗻𝗴 𝘂𝘀𝗲𝗿 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 𝗹𝗶𝗳𝗲𝗰𝘆𝗰𝗹𝗲 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁
A unified layer can potentially handle all three challenges. Extraction happens asynchronously in the background. Lifecycle - data expiry, deduplication, consolidation - becomes a property of the layer rather than something the agent has to reason about. Retrieval becomes a single trusted surface.
There are still a lot of ideas to think through properly - permissions across sources, how aggressive consolidation should be, and when fresh source data should override stored memory. But I like this architectural direction :).
I also had the pleasure of redelivering "Anatomy of Memory in Humans & AI Agents" with my colleague Raphael De Lio (@RaphaelDeLio). Interesting to see how much overlap there is between how human memory is organised and what actually works for agents.
Had a great time at Devoxx. Looking forward to future talks. Massive thanks to the incredible Redis team at the booth! Big thanks to Ricardo Ferreira who spent hours preparing this talk and gave me great tips, which were super useful.
Photo credits: Ebuka Mordi and Dimitris Doutsiopoulos

English

I’ve done a bit of research on this recently and I agree, there’s not enough discourse on memory design.
Ive been playing with the one-layer memory design pattern and I quite like it. It says it’s not the job of the LLM to figure out what tools to call to get what memory. LLM should interact with just one layer which does all the figuring out: managing what gets extracted (based on your config/custom app), managing the lifecycle of memory (e.g TTL) and handling retrieval (what gets retrieved and what doesn’t).
It makes for clearer domain boundaries and debugging isn’t breakpoints on several tools.
A colleague at Redis worked on an implementation of this pattern called the Agent Memory Server (github.com/redis/agent-me…).
Happy to talk through it with you Angie if needed. Just lmk
English

The more I work with agents, the more I'm convinced that "just give it more context" can't be the whole answer.
I'm not seeing enough discourse about memory. More specifically, memory design... like what gets stored, what gets retrieved, what gets summarized, what triggers the agent to look things up again.
I'll be spending time with @oracledevelopers soon, getting hands-on with agentic memory patterns. Very excited to get into the weeds!
English
Samuel Agbede 리트윗함

We just published research on retrieval tradeoff that should get the attention of anyone building RAG or agentic systems. Read more here from @TechJournalist in @VentureBeat: venturebeat.com/data/rag-preci…
English

Recently, I have felt quite behind.
In the AI space, it’d seem that every week something new is happening: a new framework being released, another new model tested, benchmark results for some memory tool I just heard of making the news, and to make matters worse, it can be difficult to sift through the noise/hype.
What is real and what is not?
A few weeks ago, I asked some colleagues about where they go to keep their knowledge up to date. Where do they get high-signal, low-noise information?
Here are the top resources they mentioned. I’ve also added one I’ve found helpful (the second one:
1. news.smol.ai: this website gives a day-by-day breakdown of the most important releases and news: covering everything from new model releases to company news to new agents.
2. The AI Engineer YouTube Channel (@aiDotEngineer" target="_blank" rel="nofollow noopener">youtube.com/@aiDotEngineer): They run the AI Engineer conferences, which I’ve been following virtually. Angie Jones (@techgirl1908) tells me they’re “refreshingly void of hype”!
3. Blog pages of the major LLM providers (Anthropic, OpenAI etc).
4. Simon Willison’s blog (simonwillison.net). Simon writes about everything from “detailed notes on changes between Opus 4.6 and 4.7” to notes on AI/LLM predictions. I've not read his blog thoroughly but from a skim, it looks interesting. Looking forward to digging deep into it.
It might not be surprising but I’ve noticed the most signal tend to come from builders - people who actually use the tool, or built the tool they’re talking about. They are people who can speak to where it breaks, the limitations it has etc.
As a result, I’ve decided to get more into “building” before speaking. This way, I can speak more from experience and grounding.
Anyway, keen to learn from you. How are you staying “current” in this space?
English

lol turns out there’s an entire paper on something similar. Interesting!
I’m so back to school 😂

Samuel Agbede@AgbedeSamuelD
I’ve been disturbing Claude about why we need separate vectors for “key” and “value” in the transformer architecture paper. Couldn’t a single vector encode both what “advertises” a token and its value? 🤔 Genuinely trying to understand this so if anyone knows, please help me out
English

@techgirl1908 thought as much! looking forward to attending one someday :)
English



