Yassin

1K posts

Yassin

@yelf_fafa

CTPO @ Idun | Tech Lead GenAI @ LVMH, building distributed agentic infra Builder at heart ❤️ Deadlift up, hallucinations down OSS ↓

Katılım Ekim 2024

66 Takip Edilen69 Takipçiler

Yassin@yelf_fafa·4h

@altryne benchmark charts vs Sonnet plus 'not available to try yet'

English

Alex Volkov@altryne·8h

BREAKING: Microsoft (specifically MAI, previously reflection) just announced their new LLM. MAI-Thinking-1 and MAI-Code-1-Flash MAI-Thinking is comparing itself to Sonnet, and showing interesting evals! Not available to try yet sadly.

English

8.8K

Yassin@yelf_fafa·4h

@jerryjliu0 half my rag bugs trace back to a mangled table upstream not the model parsing is the part everyone skips until it quietly poisons everything :/

English

Jerry Liu@jerryjliu0·9h

We Parse PDFs We spent 7 figures to put this on billboards throughout SF. I thought long and hard about putting something more creative and whimsical. But then you wouldn’t know what we do. AI agents (and humans) are consuming exponentially more documents as they do real work. They need the best quality document parser to not output garbage on downstream tasks. This is what we do today as a company. If you have any PDFs (or other documents), we parse them :) If you’re around SF in June for one of the following events, come stop by our booths: ✅ Snowflake Summit (this week, Booth 1123) ✅ Databricks Data+AI Summit (June 15-18, Booth 137) ✅ AI Engineer World Fair(June 29-July 2, Booth L-G47) You can find us by the same sign we put on our billboards! We Parse PDFs @llama_index

English

108

9.3K

Yassin@yelf_fafa·7h

@TeksEdge Local, French, runs on a MacBook. That's the actual headline I want to hear

English

889

David Hendrickson@TeksEdge·11h

🌞This is big Local AI news! A new open-source Computer-Use LLM has just launched. Holo 3.1 is H Company’s (🇫🇷) new local computer-use agent model that beats Qwen3.5-397B, Kimi-K2.5, and Sonnet 4.6! Since it is built for local deployment → ⬩ Runs fully on your machine (MacBook, Windows PC, DGX Spark, RTX Spark) ⬩ Based on Qwen architecture, specialized for GUI understanding & computer control ⬩ Optimized checkpoints: NVFP4, FP8 & Q4 GGUF (0.8B to 35B sizes) ⬩ Strong gains: 79.3% on AndroidWorld benchmark (35B model) 💻 Comparison to Qwen3.5: Holo 3.1 is fine-tuned specifically for computer-use agents (screen understanding, planning, clicking, navigation). Better at real GUI tasks than general-purpose Qwen3.5, especially when running locally.⚡

H@hcompany_ai

Computer-use agents are moving from the cloud to your local machine. Fast. When we launched Holo3 two months ago, the production feedback was clear: digital agents need to be blazing fast, cost-effective, and versatile. Today, we're dropping Holo 3.1, engineered to run anywhere, instantly. Massive token throughput. Low latency. Ready for your local workflow!

English

111

927

92.5K

Yassin@yelf_fafa·7h

@gippp69 the 4 commands are band aids the real win is one task one session /clear early beats compacting one thats already 80 percent stale garbage

English

Gipp 🦅@gippp69·10h

THIS DEVELOPER STOPPED WASTING 70% OF HIS $20 CLAUDE CODE PLAN WITH 4 COMMANDS he was hitting the limit almost every day and thought claude code was just too expensive then he checked the real problem every failed fix, old file read and useless tool call was staying inside the same session now he uses /usage to see the damage, /rewind after bad paths, /compact before the session gets heavy and /clear when the task changes same $20 plan just less garbage for claude to reread

AdiiX@adiix_official

x.com/i/article/2061…

English

151

10.6K

Yassin@yelf_fafa·8h

@borvibe @huggingface @amilabs @GradiumAI Paris is simply the best city in the world, hands down You can't change my opinion, it's facts

English

Boris@borvibe·13h

after 2 months in Paris, I've got my verdict ready: Bullish. Lots of problems yes. Rows of tents, pickpockets, the smell of urine, vandalism. But also the most beautiful evening skies, art, people with style, next level companies like @huggingface , @amilabs, @GradiumAI and a community of builders ready to change the world. Planning to stay for at least 1 more year. We found a new apartment too, this time one without rats 😁 planning short trips to SF & Vietnam too. wanna see what the fuzz is about.

Paris, France 🇫🇷 English

204

25.7K

Yassin@yelf_fafa·8h

@PeterDiamandis 200% agree on that, building is the best way to learn, do learn the basics/theory first so that you can understand along the way though

English

Peter H. Diamandis, MD@PeterDiamandis·10h

Let be clear: The best way to learn AI is NOT a course. It is building something real, pick a problem. Pick a tool. Ship something. You will learn more in 90 days of building than a year of reading.

English

161

163

1.6K

40.8K

Yassin@yelf_fafa·9h

@benln Hey Ben! I'd love feedback about our product, if you'd do us the pleasure of hoping on a 30 mins call, I'm 100% sure me & my cofounders would make it worth your while!

English

Ben Lang@benln·12h

I’d love to angel invest in 2-3 new startups over the summer. If you think I’d make a good addition to your cap table, reach out. DMs are open!

English

197

98.4K

Yassin@yelf_fafa·9h

@Im_IrushiK You ask it to re-clean the code, maybe you can break the 500k new line cap!

English

Irushi@Im_IrushiK·22h

I asked Opus 4.8 to refactor a few things in my codebase. Over 2 hours, it burned through 100 million tokens and completely reset the architecture. Then I ran it...None of it worked. What am I supposed to do with all these changes now? 😭

English

2.3K

Yassin@yelf_fafa·9h

@sflorimm Dude's old school cool

English

Floro S.@sflorimm·15h

met a dude, he still copy-pasting the code from chatgpt

English

270

1.1K

150.6K

Yassin@yelf_fafa·10h

@julien_c @huggingface HF always delivering to show that Paris is not too far from SF in tech ! Allé Paris 🇫🇷

English

104

Julien Chaumond@julien_c·13h

We have doubled the amount of total storage on @huggingface in the past five months. At this rate, we will cross 1 Exabyte before the end of the year 🔥

English

238

25.3K

Yassin@yelf_fafa·10h

@eng_khairallah1 The 'set it up and go to sleep' part is where every app dies. You wake up to 400 personalized emails, 380 got the company name wrong in my exp lol

English

Khairallah AL-Awady@eng_khairallah1·1d

A normal American student just bought an iPad and Mac Mini for $2,200. Connected them to his MacBook. Three computers on one desk - dorm roommates thought he was mining crypto. He just set up the automation and went to sleep. In the morning the system had already processed hundreds of leads, written personalized emails to each one and filled the CRM without a single touch. The team that did this before him- cost $7,000 a month He paid $2,200 once. There are 360 million companies in the world and 310 of them still pay people for what a machine does better. And only 100,000 people on the planet know how to use AI and set this up.

Rahul@sairahul1

x.com/i/article/2061…

English

325

36.5K

Yassin@yelf_fafa·10h

@Sumanth_077 Solid map of what exists, but in prod you converge on maybe 5 of these, what are your favorites ?

English

Sumanth@Sumanth_077·12h

AI Engineering Toolkit! I have curated a list of 100+ libraries and frameworks for training, fine-tuning, building, evaluating and deploying LLMs, RAG, and AI Agents. Categories of LLM Libraries include: • Vector Databases – Store and retrieve embeddings efficiently. • Orchestration & Workflows – Chain tools and LLM calls, manage pipelines. • Agent Frameworks – Build autonomous and multi-step agents. • Training & Fine-Tuning – Pretrain, fine-tune, and adapt models. • Inference & Serving – Run LLMs efficiently on diverse hardware. • Safety & Security – Guardrails, red-teaming, and policy checks. • Evaluation & Quality – Test and monitor both LLMs and LLM-powered apps (benchmarks, unit tests, telemetry, feedback). • Model Management – Versioning, experiment tracking, lineage, and lifecycle management. • AI App Development – Build UIs and LLM-powered apps quickly with Python-based frameworks. Link to the repo in the comments!

Sumanth@Sumanth_077

The self-improving AI agent from Nous Research! Hermes Agent is a self-improving AI agent that builds skills from your work, improves them over time, and remembers across sessions. Most AI agents reset every conversation. You teach them your codebase structure, they forget. You show them your deployment workflow, gone. You explain your preferences, erased. This happens because most agents treat memory as an afterthought. They save conversation history but don't actually learn from it. They don't build skills or improve their own behavior. Hermes fixes this with a closed learning loop. After you complete a complex task, it autonomously creates a skill for it. That skill becomes permanent. Next time a similar task comes up, it uses the skill and improves it based on what works. It has a memory system with periodic nudges. It actively prompts itself to persist knowledge. FTS5 session search with LLM summarization recalls past conversations. Honcho dialectic user modeling builds a deepening profile of who you are. What this gives you: an agent that actually gets better at helping you. It works with any model. OpenRouter (200+ models), GLM, Kimi, MiniMax, OpenAI, or your own endpoint. Switch models without changing code. You can talk to Hermes from Telegram, Discord, Slack, WhatsApp, Signal, or CLI. Voice memo transcription works. Conversation continuity across platforms. Built-in cron scheduler for scheduled automations. Delegates and parallelizes work by spawning isolated subagents. It's 100% open source. I've shared the link in the comments!

English

4.6K

Yassin@yelf_fafa·10h

The high-variance read matches mine, it's more opinionated so it pushes back harder, sometimes best output i've had sometimes confidently wrong. averaging it out hides that, your eval has to catch it per-run not on aggregate, but I switch sometimes to 5.5 as it seems more grounded !

English

Dan Shipper 📧@danshipper·11h

Almost a week later! What are your thoughts on Opus 4.8? We were extremely bullish on it in testing—it seems the response was more tepid once y'all got your hands on it. If you disagreed with our take I'm curious why so we can tune our evaluations! One theory I have is that by nature it pushes on your frame a little more, and the results are high-variance—sometimes it does something amazing, and sometimes it disagrees in a way that is obviously wrong. But curious how you're feeling and what you're reaching for after a few days of testing

Dan Shipper 📧@danshipper

BREAKING: Anthropic just dropped Opus 4.8—and it is a MONSTER We've been testing for about a week @every and our verdict is they could've just called it Opus 5, it's that good. Here's our vibe check: - Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works. HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results. - Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context. HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high. - Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark. - Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic. THE BAD: These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude. Anthropic is back baby! Read the rest on @every: every.to/vibe-check/opu…

English

102

44.8K

Yassin@yelf_fafa·11h

@Sumanth_077 Solid map of what exists, but in prod you converge on maybe 5 of these, the other 95 are prototype tools you rip out the day you actually ship, the list is for exploring not for running

English

Yassin@yelf_fafa·13h

@_MaxBlade Tbh I agree, opus is getting better at NOT over engineering however on big tasks gpt-5.5 just has a better overview

English

149

Max Blade@_MaxBlade·1d

After using opus 4.8 for 100 hours, this is what you need to know : GPT 5.5 is still the goat when it comes to ensuring you have a clean, bug free code base. Opus is still the goat when it comes to making beautiful Ui's that don't scream ai slop It also feels more magical, like a super fun and adventurous coding experience. HOWEVER opus 4.8 feels more like opus 4.7.1 The real winner will be the company that ships frontier intelligence with composer 2.5 speed.

English

190

26.7K

Yassin@yelf_fafa·13h

@eng_khairallah1 The 'set it up and go to sleep' part is where every demo dies. You wake up to 400 personalized emails — 380 got the company name wrong. The eval is the part nobody films.

English

Yassin@yelf_fafa·14h

@RhysSullivan It’s impressive in the way it’s truly an agent, autonomous However workflows are increasingly becoming long, expansive and unoptimized

English

214

Rhys@RhysSullivan·21h

credit where credit is due, workflows in claude code are good i've been particularly impressed with them for writing effect, generally works really well with finding strong patterns from other repos and writing it properly

English

174

12.3K

Yassin@yelf_fafa·14h

@trq212 That’s actually smart, I do the same thing but more on a checkpointing side, much like « Every new step in the process, document the changes and the breaking points » that keeps me always in the loop by inspecting the .md file

English

1.3K

Thariq@trq212·1d

been asking others at Anthropic how they stay in the loop with Claude and fully understand the work being done this is one of my favorites from Suzanne:

English

181

488

568.8K

Yassin@yelf_fafa·15h

Hot take: Everyone's waiting on the next model to fix their agents Watched a "broken" pipeline last week, model’s thinking was fine. 11 tool calls doing the work of 3, looping on itself, torching the token budget, basically unoptimized workflow The model's a commodity => Orchestration is the product.

English

Yassin@yelf_fafa·15h

@emollick Tbh 17.3x vs 30% gap is the whole story honestly. generating code was never the bottleneck, review + eval + actually trusting the diff is. agents 10x'd the cheap part, the expensive part barely moved

English

142

Ethan Mollick@emollick·22h

Big paper on AI coding agents using Github & other data The auto-complete tools (Copilot) led to 2.2x more code, local agents like original Claude Code led to 7.4x, & current remote coding agents 17.3x(!) But human bottlenecks in coding means actual releases "only" went up 30%

English

337

31.9K

Keşfet

@altryne @jerryjliu0 @llama_index @TeksEdge @gippp69 @borvibe @huggingface @amilabs