Yassin

1.1K posts

Yassin banner
Yassin

Yassin

@yelf_fafa

CTPO @ Idun | Tech Lead GenAI @ LVMH, building distributed agentic infra Builder at heart ❤️ Deadlift up, hallucinations down OSS ↓

Beigetreten Ekim 2024
66 Folgt74 Follower
Yassin
Yassin@yelf_fafa·
@hwchase17 the $1500 cap is the symptom not the fix. most of that spend is agent loops retrying themselves, and just pure token waste
English
0
0
0
9
Yassin
Yassin@yelf_fafa·
@_avichawla eval and observability is where every one i've built breaks not the orchestration loop
English
0
0
0
9
Avi Chawla
Avi Chawla@_avichawla·
A harnessed LLM agent, clearly explained! Most people picture this as a model with tools bolted on. The real architecture inverts that relationship. The model itself is deliberately thin. Intelligence gets pushed outward, and the harness composes it at runtime. Three dimensions orbit the harness core: - 𝗠𝗲𝗺𝗼𝗿𝘆 holds the state a model shouldn't carry in weights or context. Working context, semantic knowledge, episodic experience, and personalized memory each have their own lifecycle. - 𝗦𝗸𝗶𝗹𝗹𝘀 hold procedural knowledge. This can cover operational procedures, decision heuristics, and normative constraints that specialize the general model per task. - 𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹𝘀 hold the interaction contracts. Agent-to-user, agent-to-agent, and agent-to-tools are three distinct surfaces with their own failure modes. Between the core and these modules sit the mediators, like sandboxing, observability, compression, evaluation, approval loops, and sub-agent orchestration. They govern how the harness reaches out and how state flows back in. The useful question this framing unlocks is: for any new capability, where should it live? - Stable knowledge goes to memory - Learned playbooks go to skills - Communication contracts go to protocols - Loop governance goes to the mediators Harness design becomes a question of what to externalize, and how to mediate it. I'm building a minimal agent harness from scratch and will open-source it soon. In the meantime, my co-founder wrote an article about the anatomy of Agent Harness, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. Read it below.
GIF
Akshay 🚀@akshay_pachaar

x.com/i/article/2040…

English
37
224
1.3K
195.3K
Yassin
Yassin@yelf_fafa·
@CharlieZvible efficiency becomes intelligence, it’s a good quotable line, but 'weaknesses compound just as quickly' is the scary half, a 2% per-step error rate is basically a coin flip by step 35.
English
0
0
0
178
Charlie Zvibleman
Charlie Zvibleman@CharlieZvible·
Excited to announce Alphasense’s latest funding round and crossing $600m ARR. I’m also flat out giddy to officially announce SuperAnalyst, our agentic platform, along with the release. The first artifact I created came up with an est for KR SSS & GMs by pulling historicals from Canalyst models, building 2 year / 3 year stacks, sending an agent to go back through the last few years of results and annotate impacts (calendar, weather, etc), pulled the AlphaSense channel checks, and triangulated to est for the upcoming quarter. I then asked it to research key debates on HD and spin up subagents to research, loop, and weigh evidence on each of the top 3 debates. We’re able to automate the monitoring / push of new information coming out impacting those key debates due to our indexing of the data. Our vertical integration, architectural choices, and focus on context engineering allow this to be accomplished with high token efficiency. Efficiency is something you’ll hear a lot more. A 5-10x efficiency edge was nice in chat but compounds in long running tasks. With the explosion of agents, efficiency literally becomes intelligence. A more efficient system can run more searches, test more hypotheses, call more tools, and verify more claims. In multi-step agentic workflows, strengths compound just as quickly as weaknesses. As accuracy and comprehensiveness build across each step, so do errors and blind spots. That dynamic gives SuperAnalyst an exponentially widening advantage, powered by the market's leading search foundation. We’re using the capital to double down on investment in creating the most intelligent system. Building AI optimized tools and reinvesting our NNARR into our flywheel of expert calls to constantly close information gaps.
English
18
16
242
111.5K
Yassin
Yassin@yelf_fafa·
@askalphaxiv There’s a part people underestimate: a tiny LoRA per user means you now run adapter serving, routing and per-user eval for millions of them
English
0
0
0
11
alphaXiv
alphaXiv@askalphaxiv·
"On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters" Right now LLM personalization mostly means prompts, memory, or retrieval on top of one shared assistant. This paper instead keeps one trillion-parameter base model shared, and give each user a tiny persistent LoRA adapter that carries preferences, skills, tool habits, and memory-like updates. Bigger base models make small adapters more powerful. Smaller adapters make lifelong updates cheap too, which can turn foundation model into millions of personal models.
alphaXiv tweet media
English
7
17
105
4.8K
Yassin
Yassin@yelf_fafa·
@primemans the terminal was never the barrier knowing what to build is no tutorial fixes taste
English
0
0
0
5
Prime AI
Prime AI@primemans·
Ali Abdaal just shared his Claude Code workflow. And this might be the most beginner-friendly breakdown out there. Most Claude Code tutorials feel like they’re built for developers:Terminal commands. API keys. Technical jargon that loses most people in the first minute. Ali takes a different approach. He starts from zero — and actually makes it under.
English
28
70
163
11.7K
Yassin
Yassin@yelf_fafa·
@DevaBuilds Bad tool definitions is the biggest pain imo, I try to measure skill/tool triggering accuracy on some of my harnesses but it has proven to be harder than expected to get good results !
English
0
0
0
3
Deva
Deva@DevaBuilds·
@yelf_fafa Workflow design is the actual skill gap, not the model. The model upgrade cargo cult is everywhere though, every release becomes an excuse to not audit the 11 call loop. What was the root cause there, bad tool definitions or just no retry budget?
English
1
0
1
7
Yassin
Yassin@yelf_fafa·
Hot take: Everyone's waiting on the next model to fix their agents Watched a "broken" pipeline last week, model’s thinking was fine. 11 tool calls doing the work of 3, looping on itself, torching the token budget, basically unoptimized workflow The model's a commodity => Orchestration is the product.
English
1
0
1
41
Yassin
Yassin@yelf_fafa·
@DamiDefi installing six specialist roles is the new buying a course you never finish. the setup that ships is one session and a tight prompt One thing to add, don’t bloat your configuration, Claude notices that
English
0
0
0
7
Dami-Defi
Dami-Defi@DamiDefi·
Life after realising 6 free Claude Code plugins that add specialist roles have been sitting in the marketplace the entire time and most users are still running one session with zero of them installed. This is what happens when you actually install all six instead of using Claude Code the same way you used ChatGPT two years ago. A completely different dev setup. Same subscription.
Dami-Defi@DamiDefi

x.com/i/article/2061…

English
26
24
202
35.3K
clem 🤗
clem 🤗@ClementDelangue·
At this point, I suspect you could put endpoints named 0pus 4.8 & GPT 5.S in your apps powered by open-source models and it would get massive usage without people complaining. The power of "frontier" marketing!
English
46
7
272
24.1K
Yassin
Yassin@yelf_fafa·
@code_bykuti writing the routine was never the hard part, the app reships its UI and your autopilot quietly does the wrong thing, 'forget forever' is how you find out 3 weeks late it booked the wrong flight
English
0
0
0
7
marium
marium@code_bykuti·
Write down every recurring thing your phone makes you do. Morning email triage. Food delivery comparison. Loyalty check-ins. Package tracking across five carriers. Price-watching on the flight you already booked. Claiming the refund you'd never chase. The parenting app loop. Multi-account DMs. That whole list is a routine library. @airtap_ai is the agent that finally runs it for you, on a cloud Android, every day, on autopilot. Write the routine once. Forget forever.
English
18
44
200
146.1K
Yassin
Yassin@yelf_fafa·
@The_Cyber_News In my exp. indirect prompt injection isnt a gemini bug its what happens the second an agent reads untrusted input and can also act the notification feed is now an exec endpoint
English
0
0
1
174
Cyber Security News
Cyber Security News@The_Cyber_News·
🚨 New Google Gemini Vulnerability Exploited via Prompt Injections from WhatsApp, Slack, and SMS Source: cybersecuritynews.com/google-gemini-… A new class of indirect prompt injection (IPI) attacks targets Google Gemini's voice assistant, allowing attackers to silently hijack the AI through malicious payloads delivered via everyday messaging apps, including WhatsApp, Slack, Signal, SMS, Instagram, and Messenger. The core exploit leverages Gemini's Android Utilities agent, specifically the tool that reads incoming notifications. Because this tool processes untrusted data from third-party apps, an attacker can embed malicious instructions directly inside a crafted message. Once Gemini reads the poisoned notification, it silently incorporates the attacker's commands into the conversational context without the user's knowledge. #cybersecuritynews
Cyber Security News tweet media
English
23
113
347
25.2K
Yassin
Yassin@yelf_fafa·
Everyone's waiting for the next model to fix their agent. The model isn't your bottleneck. I could even say you can use smaller models for greater results. Your agent makes 9 tool calls where 2 would do, retries blind, and burns 40k tokens to answer something a clean workflow solves in 4k, without even touching token compression Orchestration is the product. Models are commodities.
English
0
0
0
7
Yassin
Yassin@yelf_fafa·
@swyx @liamcbride yeah this is the real tell, organic spread with zero enablement is the only adoption signal that doesn't lie, you can't fake a team picking up a tool when nobody's championing it, everything else is vanity next to that
English
0
0
0
18
swyx
swyx@swyx·
Town is the Devin for Everything Else i was talking about at AIE Europe i brought it into our company one day and a few weeks later was shocked to hear that it had just organically spread to @liamcbride and the rest of our team with no further hyping or enablement from me. this never happens! sadly i was not smart enough to ask to invest, so just genuinely a daily active user sitting on the sidelines like a chump
swyx tweet media
Jean-Denis Greze 💡@jgreze

Today, we’re launching @TownAI: the AI assistant that learns you. We’re coming out of beta with a $55M Series A led by @ARampell at @a16z, with participation from @KirstenGreen at @forerunnervc and continued support from @firstround, @altcap, and @conviction. Right now, getting real value from AI means prompting, configuring, building workflows, managing agents. We think that’s backwards. The future of AI is a companion that already knows you and how you work. Town connects across your inbox, calendar, Slack, docs, messages, and workflows to understand what you need, then starts doing the work with you. Drafting. Scheduling. Project tracking. Follow-ups. Context gathering. Multi-step tasks. And it only acts when you say so. All adapting to your voice, priorities, routines, and relationships over time. Your Townie is the AI assistant you actually need.

English
26
3
98
21K
Yassin
Yassin@yelf_fafa·
@seangeng Model routing is the easy win. The part nobody prices in: the cheap tier quietly degrading on the long tail — you don't cut cost, you just relocate it to wherever quality matters most.
English
0
0
0
23
Yassin
Yassin@yelf_fafa·
@cyrilXBT git versioning the agent was never the hard part reproducible behavior is the model shifts under you and your git sha says nothing
English
0
0
0
5
CyrilXBT
CyrilXBT@cyrilXBT·
THE MISSING LAYER BETWEEN AI AGENTS AND PRODUCTION INFRASTRUCTURE JUST ARRIVED. Anthropic's new "ant" CLI brings Git-based versioning, definition, and syncing to Claude agents. The same workflow used to manage every other piece of software now applies to agents: Version control. Reproducible deployments. Collaborative development. AI agents just became real infrastructure.
English
29
13
80
6.5K
Yassin
Yassin@yelf_fafa·
@Vtrivedy10 yeah the decompose-into-primitives part is basically code gen at this point, the part that's still slop is the "adaptively alter the plan from sub-execution learnings", that feedback loop is where it actually breaks in prod, the rest is the easy 80%
English
0
0
1
22
Viv
Viv@Vtrivedy10·
"Mental Model: An all-knowing AGI Agent is really a perfect, just-in-time workflow generator & executor." was messing around with this idea late last year and Anthropic's Dynamic Workflows "feel" like the first implementation of the mental model where the models are intelligent enough to take advantage of this problem decomposition strategy (maybe possible since January) dynamic workflows - just-in-time decompose complex problems into workflow primitives via code gen - assign large amounts of compute to solve sub-problems - BUT adaptively alter the execution plan for the workflow based on learnings from sub-executions imo AGI is just doing this flow perfectly including any exploration and verification steps. Generating & execute the right workflow for any input task, across any time horizon design primitives like dynamic workflows & /goal feel like exciting sparks of the generalizable problem solving machine where the UX maps onto how humans want to interact with AI even if the exact implementation today may not be "the one" and may even often look like slop... the trajectory feels correct 🚀
Viv tweet media
Thariq@trq212

x.com/i/article/2061…

English
22
20
192
16.7K
Yassin
Yassin@yelf_fafa·
@rubenhassid Tbh you dont learn claude from a roadmap of guides you learn it shipping one real task and hitting the wall yourself
English
0
0
0
25
Ruben Hassid
Ruben Hassid@rubenhassid·
Delete your 33 unread Claude guides bookmarks. This (stupid simple) roadmap is all you need: → Quick Start - 20 mins: Get going. The basics of Claude: ruben.substack.com/p/claude-for-d… Prompt better: ruben.substack.com/p/prompt-47 Use Projects: ruben.substack.com/p/claude-cowor… Get free certified: ruben.substack.com/p/im-claude-ce… → Head Start - 30 mins: Real work. New interface tour: claudedesign.free Create slides with AI: how-to-gamma.ai First Claude skill: claudecode.free Make Claude challenge you: ruben.substack.com/p/how-to-rot-y… → Go Deeper - 45 mins: The pro moves. Claude Cowork: claude-co.work Set up your team: how-claude.team Train your voice: ruben.substack.com/p/youre-just-a… Build with Code (vibecode): claudecode.free → More Deeper - extras: Stop sounding like a robot. Sound less AI: ruben.substack.com/p/its-not-x-it… Avoid token limits: ruben.substack.com/p/how-to-stop-… Claude connectors: ruben.substack.com/p/claude-conne… Use Claude for Excel: ruben.substack.com/p/how-to-make-… Pro tip: Don't binge it. Do one ring per sitting. Actually apply each guide before moving to the next. --- To download all of my other Claude infographics: Step 1. Go to how-to-ai.guide. Step 2. Subscribe for free. Don't pay anything. Step 3. Open my welcome email (most skip this). Step 4. Hit the automatic reply button inside. Step 5. Download my infographics from my Notion. Bonus. Enjoy my best copy-paste prompts, too.
Ruben Hassid tweet media
Ruben Hassid@rubenhassid

x.com/i/article/2057…

English
25
100
487
56.4K
Yassin
Yassin@yelf_fafa·
@simonw Yeah but that cap is about variance not value, the spread between a disciplined user and someone just looping the agent is wild, you don't ceiling what you can price you ceiling what you can't predict yet
English
0
0
2
249
Simon Willison
Simon Willison@simonw·
Uber reportedly now caps coding agents at $1,500/month per employee per tool - seems sensible to me, but it's also an interesting hint at the value Uber thinks these tools are providing simonwillison.net/2026/Jun/3/ube…
English
116
55
592
680.4K
Yassin
Yassin@yelf_fafa·
@richardseiler The capex isnt distributing its concentrating into like 3 clouds tokenizing compute doesnt change who owns the racks
English
0
0
1
9
Richard Seiler
Richard Seiler@richardseiler·
You are watching the last cycle where AI infra gets funded through equity Next leg is tokenized compute, agent payment rails and onchain inference markets as the capex required is too large & too distributed for any 1 cap table This is where crypto stops being a sideshow to AI
Richard Seiler tweet media
English
9
8
30
2.8K
Yassin
Yassin@yelf_fafa·
@cyrilXBT yeah but 'never knows the difference' is doing a lot of work there. you will. Flash Lite vs the frontier isn't a routing detail it's the output, run both on your real tasks for a day and the gap shows up fast
English
0
0
0
60
CyrilXBT
CyrilXBT@cyrilXBT·
CLAUDE CODE IS FREE TO INSTALL — and you can run it at zero subscription cost by routing it through Google Gemini 2.5 Flash Lite. Not a workaround. Not a hack. Claude Code Router. Point it at the free tier. Claude Code never knows the difference. The developers paying $200 a month in API costs just found their alternative.
English
17
11
73
5.6K
Yassin
Yassin@yelf_fafa·
@emollick the ui feels linear because the loop underneath is linear
English
0
0
0
40
Ethan Mollick
Ethan Mollick@emollick·
The everything apps still look a lot like hybrids between chatbots and IDEs, rather than something built for general knowledge work. Too much assuming linearity & that final outputs are the only goal, too little connection to research, not enough chances to steer or select, etc.
English
33
14
230
16.2K