Yassin

1K posts

Yassin banner
Yassin

Yassin

@yelf_fafa

CTPO @ Idun | Tech Lead GenAI @ LVMH, building distributed agentic infra Builder at heart ❤️ Deadlift up, hallucinations down OSS ↓

Katılım Ekim 2024
66 Takip Edilen69 Takipçiler
Yassin
Yassin@yelf_fafa·
@altryne benchmark charts vs Sonnet plus 'not available to try yet'
English
0
0
0
64
Alex Volkov
Alex Volkov@altryne·
BREAKING: Microsoft (specifically MAI, previously reflection) just announced their new LLM. MAI-Thinking-1 and MAI-Code-1-Flash MAI-Thinking is comparing itself to Sonnet, and showing interesting evals! Not available to try yet sadly.
Alex Volkov tweet media
English
7
3
89
8.8K
Yassin
Yassin@yelf_fafa·
@jerryjliu0 half my rag bugs trace back to a mangled table upstream not the model parsing is the part everyone skips until it quietly poisons everything :/
English
0
0
0
19
Jerry Liu
Jerry Liu@jerryjliu0·
We Parse PDFs We spent 7 figures to put this on billboards throughout SF. I thought long and hard about putting something more creative and whimsical. But then you wouldn’t know what we do. AI agents (and humans) are consuming exponentially more documents as they do real work. They need the best quality document parser to not output garbage on downstream tasks. This is what we do today as a company. If you have any PDFs (or other documents), we parse them :) If you’re around SF in June for one of the following events, come stop by our booths: ✅ Snowflake Summit (this week, Booth 1123) ✅ Databricks Data+AI Summit (June 15-18, Booth 137) ✅ AI Engineer World Fair(June 29-July 2, Booth L-G47) You can find us by the same sign we put on our billboards! We Parse PDFs @llama_index
English
19
11
108
9.3K
Yassin
Yassin@yelf_fafa·
@TeksEdge Local, French, runs on a MacBook. That's the actual headline I want to hear
English
0
0
2
889
David Hendrickson
David Hendrickson@TeksEdge·
🌞This is big Local AI news! A new open-source Computer-Use LLM has just launched. Holo 3.1 is H Company’s (🇫🇷) new local computer-use agent model that beats Qwen3.5-397B, Kimi-K2.5, and Sonnet 4.6! Since it is built for local deployment → ⬩ Runs fully on your machine (MacBook, Windows PC, DGX Spark, RTX Spark) ⬩ Based on Qwen architecture, specialized for GUI understanding & computer control ⬩ Optimized checkpoints: NVFP4, FP8 & Q4 GGUF (0.8B to 35B sizes) ⬩ Strong gains: 79.3% on AndroidWorld benchmark (35B model) 💻 Comparison to Qwen3.5: Holo 3.1 is fine-tuned specifically for computer-use agents (screen understanding, planning, clicking, navigation). Better at real GUI tasks than general-purpose Qwen3.5, especially when running locally.⚡
David Hendrickson tweet media
H@hcompany_ai

Computer-use agents are moving from the cloud to your local machine. Fast. When we launched Holo3 two months ago, the production feedback was clear: digital agents need to be blazing fast, cost-effective, and versatile. Today, we're dropping Holo 3.1, engineered to run anywhere, instantly. Massive token throughput. Low latency. Ready for your local workflow!

English
37
111
927
92.5K
Yassin
Yassin@yelf_fafa·
@gippp69 the 4 commands are band aids the real win is one task one session /clear early beats compacting one thats already 80 percent stale garbage
English
0
0
0
5
Gipp 🦅
Gipp 🦅@gippp69·
THIS DEVELOPER STOPPED WASTING 70% OF HIS $20 CLAUDE CODE PLAN WITH 4 COMMANDS he was hitting the limit almost every day and thought claude code was just too expensive then he checked the real problem every failed fix, old file read and useless tool call was staying inside the same session now he uses /usage to see the damage, /rewind after bad paths, /compact before the session gets heavy and /clear when the task changes same $20 plan just less garbage for claude to reread
AdiiX@adiix_official

x.com/i/article/2061…

English
32
16
151
10.6K
Boris
Boris@borvibe·
after 2 months in Paris, I've got my verdict ready: Bullish. Lots of problems yes. Rows of tents, pickpockets, the smell of urine, vandalism. But also the most beautiful evening skies, art, people with style, next level companies like @huggingface , @amilabs, @GradiumAI and a community of builders ready to change the world. Planning to stay for at least 1 more year. We found a new apartment too, this time one without rats 😁 planning short trips to SF & Vietnam too. wanna see what the fuzz is about.
Paris, France 🇫🇷 English
30
7
204
25.7K
Yassin
Yassin@yelf_fafa·
@PeterDiamandis 200% agree on that, building is the best way to learn, do learn the basics/theory first so that you can understand along the way though
English
0
0
0
93
Peter H. Diamandis, MD
Peter H. Diamandis, MD@PeterDiamandis·
Let be clear: The best way to learn AI is NOT a course. It is building something real, pick a problem. Pick a tool. Ship something. You will learn more in 90 days of building than a year of reading.
English
161
163
1.6K
40.8K
Yassin
Yassin@yelf_fafa·
@benln Hey Ben! I'd love feedback about our product, if you'd do us the pleasure of hoping on a 30 mins call, I'm 100% sure me & my cofounders would make it worth your while!
English
0
0
0
27
Ben Lang
Ben Lang@benln·
I’d love to angel invest in 2-3 new startups over the summer. If you think I’d make a good addition to your cap table, reach out. DMs are open!
English
197
36
1K
98.4K
Yassin
Yassin@yelf_fafa·
@Im_IrushiK You ask it to re-clean the code, maybe you can break the 500k new line cap!
English
0
0
0
3
Irushi
Irushi@Im_IrushiK·
I asked Opus 4.8 to refactor a few things in my codebase. Over 2 hours, it burned through 100 million tokens and completely reset the architecture. Then I ran it...None of it worked. What am I supposed to do with all these changes now? 😭
Irushi tweet media
English
35
2
35
2.3K
Floro S.
Floro S.@sflorimm·
met a dude, he still copy-pasting the code from chatgpt
English
270
30
1.1K
150.6K
Yassin
Yassin@yelf_fafa·
@julien_c @huggingface HF always delivering to show that Paris is not too far from SF in tech ! Allé Paris 🇫🇷
English
0
0
0
104
Julien Chaumond
Julien Chaumond@julien_c·
We have doubled the amount of total storage on @huggingface in the past five months. At this rate, we will cross 1 Exabyte before the end of the year 🔥
English
18
21
238
25.3K
Yassin
Yassin@yelf_fafa·
@eng_khairallah1 The 'set it up and go to sleep' part is where every app dies. You wake up to 400 personalized emails, 380 got the company name wrong in my exp lol
English
0
0
0
6
Khairallah AL-Awady
Khairallah AL-Awady@eng_khairallah1·
A normal American student just bought an iPad and Mac Mini for $2,200. Connected them to his MacBook. Three computers on one desk - dorm roommates thought he was mining crypto. He just set up the automation and went to sleep. In the morning the system had already processed hundreds of leads, written personalized emails to each one and filled the CRM without a single touch. The team that did this before him- cost $7,000 a month He paid $2,200 once. There are 360 million companies in the world and 310 of them still pay people for what a machine does better. And only 100,000 people on the planet know how to use AI and set this up.
Rahul@sairahul1

x.com/i/article/2061…

English
30
48
325
36.5K
Yassin
Yassin@yelf_fafa·
@Sumanth_077 Solid map of what exists, but in prod you converge on maybe 5 of these, what are your favorites ?
English
0
0
0
20
Sumanth
Sumanth@Sumanth_077·
AI Engineering Toolkit! I have curated a list of 100+ libraries and frameworks for training, fine-tuning, building, evaluating and deploying LLMs, RAG, and AI Agents. Categories of LLM Libraries include: • Vector Databases – Store and retrieve embeddings efficiently. • Orchestration & Workflows – Chain tools and LLM calls, manage pipelines. • Agent Frameworks – Build autonomous and multi-step agents. • Training & Fine-Tuning – Pretrain, fine-tune, and adapt models. • Inference & Serving – Run LLMs efficiently on diverse hardware. • Safety & Security – Guardrails, red-teaming, and policy checks. • Evaluation & Quality – Test and monitor both LLMs and LLM-powered apps (benchmarks, unit tests, telemetry, feedback). • Model Management – Versioning, experiment tracking, lineage, and lifecycle management. • AI App Development – Build UIs and LLM-powered apps quickly with Python-based frameworks. Link to the repo in the comments!
Sumanth tweet media
Sumanth@Sumanth_077

The self-improving AI agent from Nous Research! Hermes Agent is a self-improving AI agent that builds skills from your work, improves them over time, and remembers across sessions. Most AI agents reset every conversation. You teach them your codebase structure, they forget. You show them your deployment workflow, gone. You explain your preferences, erased. This happens because most agents treat memory as an afterthought. They save conversation history but don't actually learn from it. They don't build skills or improve their own behavior. Hermes fixes this with a closed learning loop. After you complete a complex task, it autonomously creates a skill for it. That skill becomes permanent. Next time a similar task comes up, it uses the skill and improves it based on what works. It has a memory system with periodic nudges. It actively prompts itself to persist knowledge. FTS5 session search with LLM summarization recalls past conversations. Honcho dialectic user modeling builds a deepening profile of who you are. What this gives you: an agent that actually gets better at helping you. It works with any model. OpenRouter (200+ models), GLM, Kimi, MiniMax, OpenAI, or your own endpoint. Switch models without changing code. You can talk to Hermes from Telegram, Discord, Slack, WhatsApp, Signal, or CLI. Voice memo transcription works. Conversation continuity across platforms. Built-in cron scheduler for scheduled automations. Delegates and parallelizes work by spawning isolated subagents. It's 100% open source. I've shared the link in the comments!

English
7
7
36
4.6K
Yassin
Yassin@yelf_fafa·
The high-variance read matches mine, it's more opinionated so it pushes back harder, sometimes best output i've had sometimes confidently wrong. averaging it out hides that, your eval has to catch it per-run not on aggregate, but I switch sometimes to 5.5 as it seems more grounded !
English
0
0
1
53
Dan Shipper 📧
Dan Shipper 📧@danshipper·
Almost a week later! What are your thoughts on Opus 4.8? We were extremely bullish on it in testing—it seems the response was more tepid once y'all got your hands on it. If you disagreed with our take I'm curious why so we can tune our evaluations! One theory I have is that by nature it pushes on your frame a little more, and the results are high-variance—sometimes it does something amazing, and sometimes it disagrees in a way that is obviously wrong. But curious how you're feeling and what you're reaching for after a few days of testing
Dan Shipper 📧@danshipper

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week @every and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on @every: every.to/vibe-check/opu…

English
77
3
102
44.8K
Yassin
Yassin@yelf_fafa·
@Sumanth_077 Solid map of what exists, but in prod you converge on maybe 5 of these, the other 95 are prototype tools you rip out the day you actually ship, the list is for exploring not for running
English
0
0
0
29
Yassin
Yassin@yelf_fafa·
@_MaxBlade Tbh I agree, opus is getting better at NOT over engineering however on big tasks gpt-5.5 just has a better overview
English
0
0
0
149
Max Blade
Max Blade@_MaxBlade·
After using opus 4.8 for 100 hours, this is what you need to know : GPT 5.5 is still the goat when it comes to ensuring you have a clean, bug free code base. Opus is still the goat when it comes to making beautiful Ui's that don't scream ai slop It also feels more magical, like a super fun and adventurous coding experience. HOWEVER opus 4.8 feels more like opus 4.7.1 The real winner will be the company that ships frontier intelligence with composer 2.5 speed.
English
20
5
190
26.7K
Yassin
Yassin@yelf_fafa·
@eng_khairallah1 The 'set it up and go to sleep' part is where every demo dies. You wake up to 400 personalized emails — 380 got the company name wrong. The eval is the part nobody films.
English
0
0
0
16
Yassin
Yassin@yelf_fafa·
@RhysSullivan It’s impressive in the way it’s truly an agent, autonomous However workflows are increasingly becoming long, expansive and unoptimized
English
0
0
3
214
Rhys
Rhys@RhysSullivan·
credit where credit is due, workflows in claude code are good i've been particularly impressed with them for writing effect, generally works really well with finding strong patterns from other repos and writing it properly
English
17
3
174
12.3K
Yassin
Yassin@yelf_fafa·
@trq212 That’s actually smart, I do the same thing but more on a checkpointing side, much like « Every new step in the process, document the changes and the breaking points » that keeps me always in the loop by inspecting the .md file
English
0
0
0
1.3K
Thariq
Thariq@trq212·
been asking others at Anthropic how they stay in the loop with Claude and fully understand the work being done this is one of my favorites from Suzanne:
Thariq tweet media
English
181
488
8K
568.8K
Yassin
Yassin@yelf_fafa·
Hot take: Everyone's waiting on the next model to fix their agents Watched a "broken" pipeline last week, model’s thinking was fine. 11 tool calls doing the work of 3, looping on itself, torching the token budget, basically unoptimized workflow The model's a commodity => Orchestration is the product.
English
0
0
0
20
Yassin
Yassin@yelf_fafa·
@emollick Tbh 17.3x vs 30% gap is the whole story honestly. generating code was never the bottleneck, review + eval + actually trusting the diff is. agents 10x'd the cheap part, the expensive part barely moved
English
0
0
0
142
Ethan Mollick
Ethan Mollick@emollick·
Big paper on AI coding agents using Github & other data The auto-complete tools (Copilot) led to 2.2x more code, local agents like original Claude Code led to 7.4x, & current remote coding agents 17.3x(!) But human bottlenecks in coding means actual releases "only" went up 30%
Ethan Mollick tweet mediaEthan Mollick tweet media
English
56
43
337
31.9K