Yassin

1.1K posts

Yassin

@yelf_fafa

CTPO @ Idun | Tech Lead GenAI @ LVMH, building distributed agentic infra Builder at heart ❤️ Deadlift up, hallucinations down OSS ↓

Beigetreten Ekim 2024

66 Folgt74 Follower

Yassin@yelf_fafa·9h

@hwchase17 the $1500 cap is the symptom not the fix. most of that spend is agent loops retrying themselves, and just pure token waste

English

Harrison Chase@hwchase17·1d

we are seeing costs start to matter! uber just set limits of $1500 in tokens per developer per month i think we're going to start seeing more of this, and LangSmith Gateway is a great way to implement it

LangChain@LangChain

Say goodbye to month-end surprise invoices. LangSmith LLM Gateway lets you see your spend. Roll up your costs in real time by workspace, user and API key.

English

8.6K

Yassin@yelf_fafa·9h

@_avichawla eval and observability is where every one i've built breaks not the orchestration loop

English

Avi Chawla@_avichawla·1d

A harnessed LLM agent, clearly explained! Most people picture this as a model with tools bolted on. The real architecture inverts that relationship. The model itself is deliberately thin. Intelligence gets pushed outward, and the harness composes it at runtime. Three dimensions orbit the harness core: - 𝗠𝗲𝗺𝗼𝗿𝘆 holds the state a model shouldn't carry in weights or context. Working context, semantic knowledge, episodic experience, and personalized memory each have their own lifecycle. - 𝗦𝗸𝗶𝗹𝗹𝘀 hold procedural knowledge. This can cover operational procedures, decision heuristics, and normative constraints that specialize the general model per task. - 𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹𝘀 hold the interaction contracts. Agent-to-user, agent-to-agent, and agent-to-tools are three distinct surfaces with their own failure modes. Between the core and these modules sit the mediators, like sandboxing, observability, compression, evaluation, approval loops, and sub-agent orchestration. They govern how the harness reaches out and how state flows back in. The useful question this framing unlocks is: for any new capability, where should it live? - Stable knowledge goes to memory - Learned playbooks go to skills - Communication contracts go to protocols - Loop governance goes to the mediators Harness design becomes a question of what to externalize, and how to mediate it. I'm building a minimal agent harness from scratch and will open-source it soon. In the meantime, my co-founder wrote an article about the anatomy of Agent Harness, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. Read it below.

GIF

Akshay 🚀@akshay_pachaar

x.com/i/article/2040…

English

224

1.3K

195.3K

Yassin@yelf_fafa·9h

@CharlieZvible efficiency becomes intelligence, it’s a good quotable line, but 'weaknesses compound just as quickly' is the scary half, a 2% per-step error rate is basically a coin flip by step 35.

English

178

Charlie Zvibleman@CharlieZvible·1d

Excited to announce Alphasense’s latest funding round and crossing $600m ARR. I’m also flat out giddy to officially announce SuperAnalyst, our agentic platform, along with the release. The first artifact I created came up with an est for KR SSS & GMs by pulling historicals from Canalyst models, building 2 year / 3 year stacks, sending an agent to go back through the last few years of results and annotate impacts (calendar, weather, etc), pulled the AlphaSense channel checks, and triangulated to est for the upcoming quarter. I then asked it to research key debates on HD and spin up subagents to research, loop, and weigh evidence on each of the top 3 debates. We’re able to automate the monitoring / push of new information coming out impacting those key debates due to our indexing of the data. Our vertical integration, architectural choices, and focus on context engineering allow this to be accomplished with high token efficiency. Efficiency is something you’ll hear a lot more. A 5-10x efficiency edge was nice in chat but compounds in long running tasks. With the explosion of agents, efficiency literally becomes intelligence. A more efficient system can run more searches, test more hypotheses, call more tools, and verify more claims. In multi-step agentic workflows, strengths compound just as quickly as weaknesses. As accuracy and comprehensiveness build across each step, so do errors and blind spots. That dynamic gives SuperAnalyst an exponentially widening advantage, powered by the market's leading search foundation. We’re using the capital to double down on investment in creating the most intelligent system. Building AI optimized tools and reinvesting our NNARR into our flywheel of expert calls to constantly close information gaps.

English

242

111.5K

Yassin@yelf_fafa·9h

@askalphaxiv There’s a part people underestimate: a tiny LoRA per user means you now run adapter serving, routing and per-user eval for millions of them

English

alphaXiv@askalphaxiv·1d

"On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters" Right now LLM personalization mostly means prompts, memory, or retrieval on top of one shared assistant. This paper instead keeps one trillion-parameter base model shared, and give each user a tiny persistent LoRA adapter that carries preferences, skills, tool habits, and memory-like updates. Bigger base models make small adapters more powerful. Smaller adapters make lifelong updates cheap too, which can turn foundation model into millions of personal models.

English

105

4.8K

Yassin@yelf_fafa·9h

@primemans the terminal was never the barrier knowing what to build is no tutorial fixes taste

English

Prime AI@primemans·1d

Ali Abdaal just shared his Claude Code workflow. And this might be the most beginner-friendly breakdown out there. Most Claude Code tutorials feel like they’re built for developers:Terminal commands. API keys. Technical jargon that loses most people in the first minute. Ali takes a different approach. He starts from zero — and actually makes it under.

English

163

11.7K

Yassin@yelf_fafa·16h

@DevaBuilds Bad tool definitions is the biggest pain imo, I try to measure skill/tool triggering accuracy on some of my harnesses but it has proven to be harder than expected to get good results !

English

Deva@DevaBuilds·1d

@yelf_fafa Workflow design is the actual skill gap, not the model. The model upgrade cargo cult is everywhere though, every release becomes an excuse to not audit the 11 call loop. What was the root cause there, bad tool definitions or just no retry budget?

English

Yassin@yelf_fafa·2d

Hot take: Everyone's waiting on the next model to fix their agents Watched a "broken" pipeline last week, model’s thinking was fine. 11 tool calls doing the work of 3, looping on itself, torching the token budget, basically unoptimized workflow The model's a commodity => Orchestration is the product.

English

Yassin@yelf_fafa·16h

@DamiDefi installing six specialist roles is the new buying a course you never finish. the setup that ships is one session and a tight prompt One thing to add, don’t bloat your configuration, Claude notices that

English

Dami-Defi@DamiDefi·1d

Life after realising 6 free Claude Code plugins that add specialist roles have been sitting in the marketplace the entire time and most users are still running one session with zero of them installed. This is what happens when you actually install all six instead of using Claude Code the same way you used ChatGPT two years ago. A completely different dev setup. Same subscription.

Dami-Defi@DamiDefi

x.com/i/article/2061…

English

202

35.3K

Yassin@yelf_fafa·17h

@ClementDelangue Tech Marketing is at its peak

English

clem 🤗@ClementDelangue·1d

At this point, I suspect you could put endpoints named 0pus 4.8 & GPT 5.S in your apps powered by open-source models and it would get massive usage without people complaining. The power of "frontier" marketing!

English

272

24.1K

Yassin@yelf_fafa·17h

@code_bykuti writing the routine was never the hard part, the app reships its UI and your autopilot quietly does the wrong thing, 'forget forever' is how you find out 3 weeks late it booked the wrong flight

English

marium@code_bykuti·1d

Write down every recurring thing your phone makes you do. Morning email triage. Food delivery comparison. Loyalty check-ins. Package tracking across five carriers. Price-watching on the flight you already booked. Claiming the refund you'd never chase. The parenting app loop. Multi-account DMs. That whole list is a routine library. @airtap_ai is the agent that finally runs it for you, on a cloud Android, every day, on autopilot. Write the routine once. Forget forever.

English

200

146.1K

Yassin@yelf_fafa·17h

@The_Cyber_News In my exp. indirect prompt injection isnt a gemini bug its what happens the second an agent reads untrusted input and can also act the notification feed is now an exec endpoint

English

174

Cyber Security News@The_Cyber_News·1d

🚨 New Google Gemini Vulnerability Exploited via Prompt Injections from WhatsApp, Slack, and SMS Source: cybersecuritynews.com/google-gemini-… A new class of indirect prompt injection (IPI) attacks targets Google Gemini's voice assistant, allowing attackers to silently hijack the AI through malicious payloads delivered via everyday messaging apps, including WhatsApp, Slack, Signal, SMS, Instagram, and Messenger. The core exploit leverages Gemini's Android Utilities agent, specifically the tool that reads incoming notifications. Because this tool processes untrusted data from third-party apps, an attacker can embed malicious instructions directly inside a crafted message. Once Gemini reads the poisoned notification, it silently incorporates the attacker's commands into the conversational context without the user's knowledge. #cybersecuritynews

English

113

347

25.2K

Yassin@yelf_fafa·17h

Everyone's waiting for the next model to fix their agent. The model isn't your bottleneck. I could even say you can use smaller models for greater results. Your agent makes 9 tool calls where 2 would do, retries blind, and burns 40k tokens to answer something a clean workflow solves in 4k, without even touching token compression Orchestration is the product. Models are commodities.

English

Yassin@yelf_fafa·18h

@swyx @liamcbride yeah this is the real tell, organic spread with zero enablement is the only adoption signal that doesn't lie, you can't fake a team picking up a tool when nobody's championing it, everything else is vanity next to that

English

swyx@swyx·1d

Town is the Devin for Everything Else i was talking about at AIE Europe i brought it into our company one day and a few weeks later was shocked to hear that it had just organically spread to @liamcbride and the rest of our team with no further hyping or enablement from me. this never happens! sadly i was not smart enough to ask to invest, so just genuinely a daily active user sitting on the sidelines like a chump

Jean-Denis Greze 💡@jgreze

Today, we’re launching @TownAI: the AI assistant that learns you. We’re coming out of beta with a $55M Series A led by @ARampell at @a16z, with participation from @KirstenGreen at @forerunnervc and continued support from @firstround, @altcap, and @conviction. Right now, getting real value from AI means prompting, configuring, building workflows, managing agents. We think that’s backwards. The future of AI is a companion that already knows you and how you work. Town connects across your inbox, calendar, Slack, docs, messages, and workflows to understand what you need, then starts doing the work with you. Drafting. Scheduling. Project tracking. Follow-ups. Context gathering. Multi-step tasks. And it only acts when you say so. All adapting to your voice, priorities, routines, and relationships over time. Your Townie is the AI assistant you actually need.

English

21K

Yassin@yelf_fafa·18h

@seangeng Model routing is the easy win. The part nobody prices in: the cheap tier quietly degrading on the long tail — you don't cut cost, you just relocate it to wherever quality matters most.

English

Sean Geng@seangeng·1d

Been looking into token optimization and model routing, I think super obvious optimization to tackle both cost + demand on inference Here’s a small post about different techniques and methods seangeng.com/writing/the-ho…

Factory@FactoryAI

Introducing model routing to Factory. Factory Router picks the right model for every task, automatically. Maintain frontier performance while cutting costs by 25%.

English

114

13.9K

Yassin@yelf_fafa·1d

@cyrilXBT git versioning the agent was never the hard part reproducible behavior is the model shifts under you and your git sha says nothing

English

CyrilXBT@cyrilXBT·1d

THE MISSING LAYER BETWEEN AI AGENTS AND PRODUCTION INFRASTRUCTURE JUST ARRIVED. Anthropic's new "ant" CLI brings Git-based versioning, definition, and syncing to Claude agents. The same workflow used to manage every other piece of software now applies to agents: Version control. Reproducible deployments. Collaborative development. AI agents just became real infrastructure.

English

6.5K

Yassin@yelf_fafa·1d

@Vtrivedy10 yeah the decompose-into-primitives part is basically code gen at this point, the part that's still slop is the "adaptively alter the plan from sub-execution learnings", that feedback loop is where it actually breaks in prod, the rest is the easy 80%

English

Viv@Vtrivedy10·1d

"Mental Model: An all-knowing AGI Agent is really a perfect, just-in-time workflow generator & executor." was messing around with this idea late last year and Anthropic's Dynamic Workflows "feel" like the first implementation of the mental model where the models are intelligent enough to take advantage of this problem decomposition strategy (maybe possible since January) dynamic workflows - just-in-time decompose complex problems into workflow primitives via code gen - assign large amounts of compute to solve sub-problems - BUT adaptively alter the execution plan for the workflow based on learnings from sub-executions imo AGI is just doing this flow perfectly including any exploration and verification steps. Generating & execute the right workflow for any input task, across any time horizon design primitives like dynamic workflows & /goal feel like exciting sparks of the generalizable problem solving machine where the UX maps onto how humans want to interact with AI even if the exact implementation today may not be "the one" and may even often look like slop... the trajectory feels correct 🚀

Thariq@trq212

x.com/i/article/2061…

English

192

16.7K

Yassin@yelf_fafa·1d

@rubenhassid Tbh you dont learn claude from a roadmap of guides you learn it shipping one real task and hitting the wall yourself

English

Ruben Hassid@rubenhassid·1d

Delete your 33 unread Claude guides bookmarks. This (stupid simple) roadmap is all you need: → Quick Start - 20 mins: Get going. The basics of Claude: ruben.substack.com/p/claude-for-d… Prompt better: ruben.substack.com/p/prompt-47 Use Projects: ruben.substack.com/p/claude-cowor… Get free certified: ruben.substack.com/p/im-claude-ce… → Head Start - 30 mins: Real work. New interface tour: claudedesign.free Create slides with AI: how-to-gamma.ai First Claude skill: claudecode.free Make Claude challenge you: ruben.substack.com/p/how-to-rot-y… → Go Deeper - 45 mins: The pro moves. Claude Cowork: claude-co.work Set up your team: how-claude.team Train your voice: ruben.substack.com/p/youre-just-a… Build with Code (vibecode): claudecode.free → More Deeper - extras: Stop sounding like a robot. Sound less AI: ruben.substack.com/p/its-not-x-it… Avoid token limits: ruben.substack.com/p/how-to-stop-… Claude connectors: ruben.substack.com/p/claude-conne… Use Claude for Excel: ruben.substack.com/p/how-to-make-… Pro tip: Don't binge it. Do one ring per sitting. Actually apply each guide before moving to the next. --- To download all of my other Claude infographics: Step 1. Go to how-to-ai.guide. Step 2. Subscribe for free. Don't pay anything. Step 3. Open my welcome email (most skip this). Step 4. Hit the automatic reply button inside. Step 5. Download my infographics from my Notion. Bonus. Enjoy my best copy-paste prompts, too.

Ruben Hassid@rubenhassid

x.com/i/article/2057…

English

100

487

56.4K

Yassin@yelf_fafa·1d

@simonw Yeah but that cap is about variance not value, the spread between a disciplined user and someone just looping the agent is wild, you don't ceiling what you can price you ceiling what you can't predict yet

English

249

Simon Willison@simonw·1d

Uber reportedly now caps coding agents at $1,500/month per employee per tool - seems sensible to me, but it's also an interesting hint at the value Uber thinks these tools are providing simonwillison.net/2026/Jun/3/ube…

English

116

592

680.4K

Yassin@yelf_fafa·1d

@richardseiler The capex isnt distributing its concentrating into like 3 clouds tokenizing compute doesnt change who owns the racks

English

Richard Seiler@richardseiler·1d

You are watching the last cycle where AI infra gets funded through equity Next leg is tokenized compute, agent payment rails and onchain inference markets as the capex required is too large & too distributed for any 1 cap table This is where crypto stops being a sideshow to AI

English

2.8K

Yassin@yelf_fafa·1d

@cyrilXBT yeah but 'never knows the difference' is doing a lot of work there. you will. Flash Lite vs the frontier isn't a routing detail it's the output, run both on your real tasks for a day and the gap shows up fast

English

CyrilXBT@cyrilXBT·1d

CLAUDE CODE IS FREE TO INSTALL — and you can run it at zero subscription cost by routing it through Google Gemini 2.5 Flash Lite. Not a workaround. Not a hack. Claude Code Router. Point it at the free tier. Claude Code never knows the difference. The developers paying $200 a month in API costs just found their alternative.

English

5.6K

Yassin@yelf_fafa·1d

@emollick the ui feels linear because the loop underneath is linear

English

Ethan Mollick@emollick·2d

The everything apps still look a lot like hybrids between chatbots and IDEs, rather than something built for general knowledge work. Too much assuming linearity & that final outputs are the only goal, too little connection to research, not enough chances to steer or select, etc.

English

230

16.2K

Entdecken

@hwchase17 @_avichawla @CharlieZvible @askalphaxiv @primemans @DevaBuilds @DamiDefi @ClementDelangue