Mustafa

4K posts

Mustafa

@mustafa_2vec

Agents @agnoagi | Polyagentamorous

Miami, FL Beigetreten Nisan 2013

2K Folgt1.2K Follower

Angehefteter Tweet

Mustafa@mustafa_2vec·7 Oca

For my recent followers here's a bit about me: - Current working on AI X Agents🤖. - Finish grad school and figuring out future plans🤔 - Moved to US from India to attend grad school in sunny Miami (FIU) - Had 2 internships and worked 3 full-time SWE jobs all while in college!!! - Lived in Pune and had the best time! - Worked on 5 projects while still in college from my bedroom and scaled it to thousands on users - Built a localized Covid cases tracker that went viral - Love playing Soccer and Tennis🎾 - Gym freak🤸 I live by "Talk is cheap. Show me the code". DM's open📥. Feel free to reach out!

English

4.5K

Mustafa@mustafa_2vec·3h

GIF

Om Patel@om_patel5

I taught Claude to talk like a caveman to use 75% less tokens. normal claude: ~180 tokens for a web search task caveman claude: ~45 tokens for the same task "I executed the web search tool" = 8 tokens caveman version: "Tool work" = 2 tokens every single grunt swap saves 6-10 tokens. across a FULL task that's 50-100 tokens saved why does it work? caveman claude doesn't explain itself. it does its task first. gives the result. then stops. no "I'd be happy to help you with that." no "Let me search the web for you" no more unnecessary filler words "result. done. me stop." 50-75% burn reduction with usage limits getting tighter every week this might be the most practical hack out there right now

ZXX

Mustafa retweetet

Ashpreet Bedi@ashpreetbedi·3h

Sharing my learnings on human in the loop / approval flows for agents. Whether you implement them today or not, these concepts are critical to understand and will shape how you design your system. There are three types of actions an agent can take: 1. No approval needed 2. User approval needed 3. Admin approval needed And then separately, a related concept is tracing vs audit logs. Let's walk through each case: 1) No approval needed This is where 99% of agent actions live. Reading a file. Searching a knowledge base. Querying a database. Looking up calendar events. Summarizing a document. Running a web search. Think about how you use Claude Code or Cursor. Most of what they do (reading files, searching code, generating diffs) requires no approval. The agent just does it. This is the default mode, and it should stay the default. If you’re putting approval gates on read-only operations, you’re slowing your agent down for no reason. 2) User approval needed This is where most coding agents live today. Claude Code asks “Can I run this command?” before executing shell commands. Cursor asks before applying file changes. An email agent should confirm before sending. A calendar agent should confirm before creating events with external attendees. This covers things like: executing shell commands, sending messages, writing files, making API calls with side effects, creating or modifying records. Anything where the agent is doing something that changes state in the real world. 3) Admin approval needed This is the one almost nobody has built yet. And it’s the one that matters most for production agents in enterprises. User approval means the person using the agent confirms. Admin approval means a designated administrator (who may not be the user) reviews and approves the action before it executes. Examples: issuing a refund. Approving a purchase order. Modifying production infrastructure. Deleting data. Granting access permissions. Any action where the business needs an authorized human (with specific role permissions) to sign off, not just the person who asked. This is where audit logs become essential (more on this below). When an admin approves a $10K refund triggered by an agent, that approval, who approved it, when, and the full context of the request needs to be retained for the life of your product. In Agno, this is a built-in primitive. You decorate a tool with @approval and the action gets routed to an approval system with full audit trail. Most frameworks don’t have this. On Tracing vs Audit Logs Traces are observability logs. They record the flow of execution: model calls, tool calls, latency, token usage. Traces are for debugging and optimization. Most third-party observability tools will recommend deleting traces after 90-120 days because they cost money to store. Audit logs are accountability data. They record actions the agent took that were approved by a human. An admin approved a refund. A user confirmed a data deletion. A manager signed off on a purchase order. These are business transactions, and they need to be retained for the life of your product. You should be able to go back seven months and see: what action was proposed, who approved it, and what happened. Side note: you should always own your traces. Your traces are your training data. If you’re sending them to a third-party observability platform, you’re handing over the most valuable data you have for improving your agents. Own your database. Own your traces. The distinction matters because if you design your system treating traces and audit logs as the same thing, you’ll either delete audit logs too early (compliance risk) or retain all traces forever (cost explosion). They need different storage, different retention policies, and different access patterns. TLDR: Three types of agent actions: no approval (most actions, keep it fast), user approval (commands, sends, writes), admin approval (financial, destructive, privileged). Traces are for debugging and should be owned, not outsourced. Audit logs are for accountability and should be retained forever. Design for both from day one.

English

1.2K

Mustafa retweetet

Ashpreet Bedi@ashpreetbedi·21h

Building a Personal Knowledge Agent I've been using a personal knowledge agent called Pal (Personal Agent that Learns). It runs locally, talks to me over Slack, and tries to get better over time (still tuning this part). I posted about it a few weeks ago and wanted to share some key design decisions. The goal is that I feed it raw data (URLs, papers, notes, meeting context, tidbits about people) and it organizes everything into two layers: a compiled wiki for text-heavy knowledge (concepts, summaries, research), and a SQL database for structured data (notes, people, projects, decisions). It should connect to my email, calendar, and slack. Here are some details: 1) Markdown + SQL Markdown is great until you need to query across dimensions. "Everything related to Project X from the last two weeks across all sources". "Prep me for my meeting with Sarah" (pull her notes, recent emails, project context, calendar history). This is relational data, not document retrieval. SQL handles this well. 2) Navigation over Search The key insight behind Pal is navigation over search. Each data source keeps its native query interface. Databases get SQL. Email gets queried by sender and date. Files get navigated by directory structure. The wiki gets navigated by its index. No flattening everything into one vector store. The agent picks the right source for the right question through a metadata routing layer, not through embedding similarity. 3) Structured data (SQL) When I say "save a note: met with sarah from acme, she's interested in a partnership". Pal creates a row in a notes table, tags it with ['sarah', 'acme', 'partnership'], and links it to sarah's entry in a people table. When I later ask "what do I know about sarah?" it queries across notes, people, projects, emails, and calendar. Tags are the cross-table connector. A note about a meeting with sarah about Project X gets tagged so it shows up in both contexts. The agent owns the schema. It creates tables on demand. Notes, people, projects, decisions all emerged from natural conversation. "Save a note" creates a note. "Track this project" creates a project. The schema grows with usage. 4) Knowledge base (Wiki) The other half is a compiled knowledge base for things that need depth. Research, technical concepts, reference material. 4.1) Ingest: I feed it URLs, papers, articles, meeting notes. It fetches the content, converts to clean markdown, and saves to a raw/ directory with YAML frontmatter (title, source, date, tags). A manifest tracks what's been ingested and what's been compiled. 4.2) Compile: A dedicated Compiler agent reads uncompiled raw files and produces structured wiki articles. It breaks each source into concept articles, writes summaries, cross-links related concepts, and maintains a master index. Compilation is incremental. Only new files get processed, never the whole wiki. New information enriches existing articles rather than replacing them, and every claim links back to the raw source. 4.3) Query: The wiki index is designed to fit in one LLM call (~100 articles). When I ask a knowledge question, the agent reads the index first, picks relevant articles, then falls back to raw sources and live tools. I expected to need vector search for this. Turns out an auto-maintained index with brief summaries works surprisingly well at this scale. The LLM navigates it like a table of contents. 5) Learnings I'm still working on this part. The pieces are there but it's not where I want it yet. Because Pal is a team of agents, they all share a common learning store. Every time a retrieval strategy works, it gets saved. Every time I correct the agent, that correction gets saved with highest priority. Over time, the agent should route to the right source faster and give better answers without me tuning anything. 6) Architecture: Pal is a team of five specialist agents. Navigator is the workhorse. Researcher gathers sources from the web. Compiler turns raw into wiki. Linter checks quality. Syncer pushes everything to GitHub. Pull the wiki locally, read it in your IDE of choice, push changes back. 7) Scheduled tasks: Eight scheduled tasks run themselves between conversations: daily briefings, wiki compilation, inbox digests, weekly reviews, wiki linting, context re-indexing, and git sync. Results post to Slack. TLDR: raw data from any source gets ingested and organized into two layers: a compiled wiki for knowledge depth and SQL tables for structured breadth. Five context systems get navigated (not searched) to answer questions. A learning loop compounds every interaction. The wiki is just markdown backed by git.

English

210

16.6K

Mustafa@mustafa_2vec·1d

@yoheinakajima Also called as Context rot

English

Yohei@yoheinakajima·1d

agents can do everything*, which causes a UX problem, your users have vastly different experiences, some better and some worse thinking like an RPG can help here when you start an RPG game, you start with one weapon, and you learn to fight with it. as you progress, you pick up and learn new weapons. and by the final boss, you can wield all the weapons to defeat him. very rewarding if you'd started the game with all the weapons, it would be too overwhelming, you wouldn't know which weapon to learn first, and you'd probably stop this isn't a new idea, but i've used a few tools recently that reminded me of this latter experience *not everything of course, but you get the point

English

133

16.2K

Mustafa@mustafa_2vec·1d

IDE? What's that? Just code in slack

Ashpreet Bedi@ashpreetbedi

If I was on an engineering team, I'd hook this up to my codebase and drop it in Slack. It explains code, reviews PRs, triages issues, automatically, on a schedule, day and night. Incredible alpha in coding-adjacent agents that live where people are already working.

English

Mustafa retweetet

Ashpreet Bedi@ashpreetbedi·1d

If you're looking for an open-source version of this, checkout Pal (Personal Agent that Learns) github.com/agno-agi/pal Same idea: ingest sources, compile a structured wiki, navigate across it. Pal also connects to gmail, slack, and self-maintains a SQL database so the knowledge is my whole work context. Excited to see more of these in the wild!

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

296

24.6K

Mustafa@mustafa_2vec·1d

@viemccoy Add me gang

English

𝚟𝚒𝚎 ⟢@viemccoy·2d

if you are making an agent harness and want to be in a signal group chat, please let me know! I'll select pretty heavily for power users since I really want a place to discuss Excalibur with people who are doing something similar.

English

118

302

15.6K

Mustafa retweetet

Agno@AgnoAgi·2d

x.com/i/article/2039…

ZXX

6.1K

Mustafa@mustafa_2vec·1d

@ashpreetbedi LFG

Ashpreet Bedi@ashpreetbedi·2d

impossible to get better than agno, no other agentic system has admin approvals built in, will be a core primitive everywhere in a few months

Agno@AgnoAgi

x.com/i/article/2039…

English

3.9K

Mustafa@mustafa_2vec·1d

Slack is where the coding happens🫡

English

103

Mustafa@mustafa_2vec·1d

@neural_avb Share results? I am sure it’s be the worst investment every

English

125

AVB@neural_avb·2d

People are officially automating harness optimization > Define task and boundaries > Agent updates code, run evals > Logs all traces and results from ALL expts directly into the filesystem. No discards. > Keeps pushing the pareto frontier The Model + The Harness = The Product

AVB@neural_avb

x.com/i/article/2039…

English

463

69.2K

Mustafa@mustafa_2vec·2d

@jaltma Sell to Anthropic lol

English

110

Jack Altman@jaltma·2d

Sam did the offer to buy Uncapped get lost in the mail?

English

3.1K

687.5K

Mustafa@mustafa_2vec·2d

@jeff_weinstein @stevekaliski @stripe @coinbase @Cloudflare Interesting

English

176

Jeff Weinstein@jeff_weinstein·2d

.@stripe is excited to join the newly formed x402 foundation, as a founding member along with @coinbase and @cloudflare, to build an open ecosystem for agent-initiated payments. Businesses on Stripe can accept money via x402 in a few lines of code: docs.stripe.com/payments/machi…

English

329

28.3K

Mustafa@mustafa_2vec·2d

@SaaiArora Yang is that you?

GIF

English

Saai Arora@SaaiArora·3d

big day. I just raised a $10M seed round to solve one of humanity’s hardest problems. for the past few months I’ve been pushing the boundaries of what’s possible in AI categorization determining whether something is a hot dog or not a hot dog we’ve built a state of the art model that can classify any object in real time with unprecedented accuracy it’s simple. you upload an image and we tell you whether it’s a hot dog or not a hot dog. backed by some incredible investors who also agree this needs to exist excited to scale the team, expand into adjacent categories, and push the boundaries of binary food intelligence dm if you want access

English

6.3K

Mustafa@mustafa_2vec·3d

@davj Let claude code handle your Slack lol

English

David J Phillips@davj·3d

Opens slack

English

648

470K

Mustafa@mustafa_2vec·3d

@p0 @Opendoor Isn’t this just an agent doing lot of tool calls?

English

104

Parallel Web Systems@p0·3d

Every transaction @Opendoor makes requires HOA research: Does this property have an HOA? Who manages it? Any active lawsuits? Before Parallel, each property took ~10 minutes of manual investigation. Now it takes ~2.

English

5.1K

Mustafa@mustafa_2vec·3d

@mahaniok Claude code

English

307

Ihar Mahaniok@mahaniok·4d

Is there a good, AI-first, modern (2026) email client? Superhuman was good in 2024. But it feels extremely obsolete today.

English

39.4K

Mustafa@mustafa_2vec·4d

@gitlawb Can I use Opusc 4.6 1M context?

English

58.6K

GitLawb@gitlawb·4d

We forked the leaked Claude Code source and made it work with ANY LLM: GPT, DeepSeek, Gemini, Llama, MiniMax. Open source. The name is OpenCode

English

383

1.8K

21.3K

1.8M

Mustafa retweetet

Agno@AgnoAgi·4d

x.com/i/article/2039…

ZXX

3.2K

Mustafa@mustafa_2vec·4d

@thenerd_be @AnthropicAI Thanks for leaking the source code

English

105

Frederik Jacques@thenerd_be·4d

Today is my first day at Anthropic. Super excited I shipped my first change today, added source maps so debugging is easier. Can’t wait to show you all what I’ve been working on! cc: @AnthropicAI

English

594

547

18.3K

785.1K

Entdecken

@approval @yoheinakajima @viemccoy @ashpreetbedi @neural_avb @jaltma @jeff_weinstein @stevekaliski