Fathin Dosunmu

517 posts

Fathin Dosunmu banner
Fathin Dosunmu

Fathin Dosunmu

@FathinDev

35+ AI agents in production Cybersec → AI engineering Cooking @agentsimdev

Katılım Aralık 2020
460 Takip Edilen62 Takipçiler
Sabitlenmiş Tweet
Fathin Dosunmu
Fathin Dosunmu@FathinDev·
Just spent the last 10hrs building an intelligent knowledge base system that automatically captures everything I code, think, and learn without me lifting a finger. Quite wild, if you ask me. 🤯 Using Basic Memory MCP server + Claude Code + Obsidian for the ultimate dev knowledge graph. This changed how I work. Here's how 🧵
Fathin Dosunmu tweet mediaFathin Dosunmu tweet media
English
4
1
8
1.4K
Fathin Dosunmu
Fathin Dosunmu@FathinDev·
What exactly is a good harness?
English
0
0
0
4
Factory
Factory@FactoryAI·
Key findings: • GPT-5.2 and Opus 4.6 topped our leaderboard - but at up to $3.11/PR, frontier models are a tough sell at scale. • Open-source models hit 85%+ of frontier accuracy at 1/3 the cost. At that price point, you can run multi-pass review and still come out ahead. Frontier models edge out on accuracy. Open models win on cost per unit of intelligence. For enterprises running thousands of PRs/day, that math isn't even close.
English
3
0
56
7.7K
Factory
Factory@FactoryAI·
Which model reviews code best? We benchmarked 13 models on AI code review across real PRs and the results are surprising. Spending more tokens did not result in better code review. A $1.25/PR model beat another that was more than 2x the cost. Meanwhile, budget models at $0.15/PR delivered ~80% of the quality of frontier models while being 10-30x cheaper. In fact, cost only explained ~21% of the difference in code review quality.
Factory tweet media
English
22
29
280
52.1K
Samay
Samay@Samaytwt·
Be honest As a developer, which database is better in the AI era?
Samay tweet mediaSamay tweet mediaSamay tweet mediaSamay tweet media
English
602
86
2.1K
686.2K
Sam Altman
Sam Altman@sama·
GPT-5.5 is here! We hope it's useful to you. I personally like it.
English
1.6K
974
19.8K
1.7M
Fathin Dosunmu retweetledi
Chaofan Shou
Chaofan Shou@Fried_rice·
26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts. Check our paper: arxiv.org/abs/2604.08407
Chaofan Shou tweet media
English
157
663
3.3K
559.4K
Fathin Dosunmu retweetledi
John Gargiulo
John Gargiulo@JohnnotJon·
If you still have doubts about Claude Mythos, here's what it did already: > Found a 27-year-old OpenBSD bug in one of the most security-hardened operating systems on earth for <$50 > Broke into a production virtual machine monitor (basically the tech that keeps cloud workloads from seeing each other's data) > Turned Firefox vulnerabilities into working exploits 181 times > Found a 16-year-old FFmpeg bug that survived every fuzzer, every code audit, and every human reviewer since 2010 > Wrote a FreeBSD exploit that gives any unauthenticated attacker on the internet full root access. No human was involved after the first prompt. > Chained 4 separate vulnerabilities together to build a browser exploit that escaped both the renderer and the OS sandbox > Found critical holes in every major web browser and every major operating system > Gave Anthropic engineers with zero security training a complete and working exploit by morning > Cracked cryptography libraries protecting TLS, AES-GCM, and SSH
John Gargiulo tweet media
Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English
153
364
2.8K
585.5K
Fathin Dosunmu retweetledi
Ao Qu
Ao Qu@ao_qu18465·
🚀 The era of autonomous multi-agent discovery has begun. Most “self-evolving” scientific discovery frameworks are still tightly constrained: LLMs often just perform one-step mutations inside fixed evolutionary search loops. But that is not real autonomy. Agents still cannot truly decide: 🔍 what to explore 🧠 what knowledge to store ♻️ which past attempts to reuse 🧪 when to test With CORAL, we ask: ❓ What happens if we give agents much more autonomy to explore the scientific frontier? 💡 Our answer: A single autonomous agent already outperforms fixed evolutionary search. But the bigger leap comes when multiple autonomous agents form a research community: 🤝 They explore different directions 🧠 accumulate reusable knowledge and skills 💬 communicate with each other 🌍 and push the frontier together We introduce CORAL, the first framework for autonomous multi-agent evolution for open-ended discovery. 🥇 Across 10+ tasks in algorithmic discovery, system optimization, and kernel engineering from Frontier-CS, ADRS, AlphaEvolve, etc, CORAL achieves SOTA and improves search efficiency by 3–10× over prior fixed evolutionary-search frameworks. 🔬 Why does autonomy help? Our analysis shows two main reasons: 🧪 Local verification: agents run local tests before expensive evaluations, which is especially powerful for coding tasks. ♻️ Knowledge reuse: on knowledge-intensive tasks like polyominoes and kernel engineering, agents create and reuse knowledge artifacts at far higher rates than on simple tuning/search tasks like circle packing. ✨ Even more exciting: Over 50% of multi-agent breakthroughs come from building on other agents’ discoveries. Multi-agent exploration is also far more diverse than single-agent search. We believe CORAL opens up an exciting new space for automated discovery systems. 📬 If you are interested in collaborating, let’s talk. 📄 Paper: arxiv.org/abs/2604.01658… 💻 Code: github.com/Human-Agent-So… 💡AlphaXiv: alphaxiv.org/abs/2604.01658 #agentic #llms #selfevolvingagent #multiagent #autoresearch #alphaevolve
Ao Qu tweet mediaAo Qu tweet mediaAo Qu tweet mediaAo Qu tweet media
English
17
77
463
39K
Fathin Dosunmu retweetledi
Ben Sigman
Ben Sigman@bensig·
My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…
Ben Sigman tweet media
English
449
801
7.9K
3M
Fathin Dosunmu
Fathin Dosunmu@FathinDev·
Qwen 3.6 getting HAMMERED 1.3 Trillion
Fathin Dosunmu tweet media
English
0
0
0
23
Fathin Dosunmu retweetledi
Haider.
Haider.@haider1·
bad news LLMs are hitting a wall
Haider. tweet media
English
244
123
3K
632.1K
Fathin Dosunmu
Fathin Dosunmu@FathinDev·
Running 8 AI agents at scale costs us $282/mo. Agents spin up on-demand per Slack thread (idle at $0). At current token volume (37M/mo), all-Opus would be ~$1,500. We kept 95% of capability with tiered models: - Opus 4.6: orchestrator only - Qwen 3.6 Plus: free tier (70% of workload) - MiMo-V2-Pro: $0.30/M (coding/research) Architecture > agent count. Full breakdown:
Fathin Dosunmu tweet mediaFathin Dosunmu tweet mediaFathin Dosunmu tweet media
English
0
0
0
65
Fathin Dosunmu
Fathin Dosunmu@FathinDev·
3+ hours without a human in the loop. In Droid Missions we trust
Fathin Dosunmu tweet mediaFathin Dosunmu tweet media
English
0
0
2
28