Fathin Dosunmu

517 posts

Fathin Dosunmu

@FathinDev

35+ AI agents in production Cybersec → AI engineering Cooking @agentsimdev

Katılım Aralık 2020

460 Takip Edilen62 Takipçiler

Sabitlenmiş Tweet

Fathin Dosunmu@FathinDev·7 Ağu

Just spent the last 10hrs building an intelligent knowledge base system that automatically captures everything I code, think, and learn without me lifting a finger. Quite wild, if you ask me. 🤯 Using Basic Memory MCP server + Claude Code + Obsidian for the ultimate dev knowledge graph. This changed how I work. Here's how 🧵

English

1.4K

Fathin Dosunmu@FathinDev·2d

What exactly is a good harness?

English

Fathin Dosunmu@FathinDev·4d

@FactoryAI Interesting codex 5.5 isn't here. But ig data never lies haha

English

261

Factory@FactoryAI·4d

Key findings: • GPT-5.2 and Opus 4.6 topped our leaderboard - but at up to $3.11/PR, frontier models are a tough sell at scale. • Open-source models hit 85%+ of frontier accuracy at 1/3 the cost. At that price point, you can run multi-pass review and still come out ahead. Frontier models edge out on accuracy. Open models win on cost per unit of intelligence. For enterprises running thousands of PRs/day, that math isn't even close.

English

7.7K

Factory@FactoryAI·4d

Which model reviews code best? We benchmarked 13 models on AI code review across real PRs and the results are surprising. Spending more tokens did not result in better code review. A $1.25/PR model beat another that was more than 2x the cost. Meanwhile, budget models at $0.15/PR delivered ~80% of the quality of frontier models while being 10-30x cheaper. In fact, cost only explained ~21% of the difference in code review quality.

English

280

52.1K

Fathin Dosunmu@FathinDev·25 Nis

@hey_madni @Samaytwt same, convex is sleek.

English

Madni Aghadi@hey_madni·25 Nis

@Samaytwt i use convex as of now thoo

English

Samay@Samaytwt·25 Nis

Be honest As a developer, which database is better in the AI era?

English

602

2.1K

686.2K

Fathin Dosunmu@FathinDev·23 Nis

@sama Yuh

Sam Altman@sama·23 Nis

GPT-5.5 is here! We hope it's useful to you. I personally like it.

English

1.6K

974

19.8K

1.7M

Fathin Dosunmu@FathinDev·22 Nis

Sensacional

Cursor@cursor_ai

We're partnering with SpaceX to improve Composer. cursor.com/blog/spacex-mo…

Español

Fathin Dosunmu@FathinDev·16 Nis

@GoodFarmingAdam @DavidOndrej1 Frl

Good Farming with Adam Durey@GoodFarmingAdam·16 Nis

@DavidOndrej1 what an extraordinary waste of money

English

1.2K

David Ondrej@DavidOndrej1·16 Nis

ehm...

10.4K

Fathin Dosunmu retweetledi

Chaofan Shou@Fried_rice·10 Nis

26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts. Check our paper: arxiv.org/abs/2604.08407

English

157

663

3.3K

559.4K

Fathin Dosunmu retweetledi

John Gargiulo@JohnnotJon·8 Nis

If you still have doubts about Claude Mythos, here's what it did already: > Found a 27-year-old OpenBSD bug in one of the most security-hardened operating systems on earth for <$50 > Broke into a production virtual machine monitor (basically the tech that keeps cloud workloads from seeing each other's data) > Turned Firefox vulnerabilities into working exploits 181 times > Found a 16-year-old FFmpeg bug that survived every fuzzer, every code audit, and every human reviewer since 2010 > Wrote a FreeBSD exploit that gives any unauthenticated attacker on the internet full root access. No human was involved after the first prompt. > Chained 4 separate vulnerabilities together to build a browser exploit that escaped both the renderer and the OS sandbox > Found critical holes in every major web browser and every major operating system > Gave Anthropic engineers with zero security training a complete and working exploit by morning > Cracked cryptography libraries protecting TLS, AES-GCM, and SSH

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

153

364

2.8K

585.5K

Fathin Dosunmu retweetledi

Ao Qu@ao_qu18465·6 Nis

🚀 The era of autonomous multi-agent discovery has begun. Most “self-evolving” scientific discovery frameworks are still tightly constrained: LLMs often just perform one-step mutations inside fixed evolutionary search loops. But that is not real autonomy. Agents still cannot truly decide: 🔍 what to explore 🧠 what knowledge to store ♻️ which past attempts to reuse 🧪 when to test With CORAL, we ask: ❓ What happens if we give agents much more autonomy to explore the scientific frontier? 💡 Our answer: A single autonomous agent already outperforms fixed evolutionary search. But the bigger leap comes when multiple autonomous agents form a research community: 🤝 They explore different directions 🧠 accumulate reusable knowledge and skills 💬 communicate with each other 🌍 and push the frontier together We introduce CORAL, the first framework for autonomous multi-agent evolution for open-ended discovery. 🥇 Across 10+ tasks in algorithmic discovery, system optimization, and kernel engineering from Frontier-CS, ADRS, AlphaEvolve, etc, CORAL achieves SOTA and improves search efficiency by 3–10× over prior fixed evolutionary-search frameworks. 🔬 Why does autonomy help? Our analysis shows two main reasons: 🧪 Local verification: agents run local tests before expensive evaluations, which is especially powerful for coding tasks. ♻️ Knowledge reuse: on knowledge-intensive tasks like polyominoes and kernel engineering, agents create and reuse knowledge artifacts at far higher rates than on simple tuning/search tasks like circle packing. ✨ Even more exciting: Over 50% of multi-agent breakthroughs come from building on other agents’ discoveries. Multi-agent exploration is also far more diverse than single-agent search. We believe CORAL opens up an exciting new space for automated discovery systems. 📬 If you are interested in collaborating, let’s talk. 📄 Paper: arxiv.org/abs/2604.01658… 💻 Code: github.com/Human-Agent-So… 💡AlphaXiv: alphaxiv.org/abs/2604.01658 #agentic #llms #selfevolvingagent #multiagent #autoresearch #alphaevolve

English

463

39K

Fathin Dosunmu retweetledi

Ben Sigman@bensig·6 Nis

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…

English

449

801

7.9K

Fathin Dosunmu@FathinDev·5 Nis

Qwen 3.6 getting HAMMERED 1.3 Trillion

English

Fathin Dosunmu retweetledi

Haider.@haider1·5 Nis

bad news LLMs are hitting a wall

English

244

123

632.1K

Fathin Dosunmu@FathinDev·5 Nis

Running 8 AI agents at scale costs us $282/mo. Agents spin up on-demand per Slack thread (idle at $0). At current token volume (37M/mo), all-Opus would be ~$1,500. We kept 95% of capability with tiered models: - Opus 4.6: orchestrator only - Qwen 3.6 Plus: free tier (70% of workload) - MiMo-V2-Pro: $0.30/M (coding/research) Architecture > agent count. Full breakdown:

English

Fathin Dosunmu@FathinDev·5 Nis

nvm 😆

Fathin Dosunmu@FathinDev

@steipete when support qwen3.6-plus in openclaw ? 👀

Fathin Dosunmu@FathinDev·5 Nis

@steipete when support qwen3.6-plus in openclaw ? 👀

English

Fathin Dosunmu@FathinDev·4 Nis

Lmao

Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

Fathin Dosunmu retweetledi

Kevin Gu@kevingu·3 Nis

x.com/i/article/2039…

ZXX

124

692

5.9K

3.8M

Fathin Dosunmu@FathinDev·1 Nis

3+ hours without a human in the loop. In Droid Missions we trust

English

Keşfet

@FactoryAI @hey_madni @Samaytwt @sama @GoodFarmingAdam @DavidOndrej1 @elonmusk @BarackObama