Annas Bin Adil

8 posts

Annas Bin Adil

Annas Bin Adil

@annasbinadil

Katılım Haziran 2026
66 Takip Edilen4 Takipçiler
Annas Bin Adil
Annas Bin Adil@annasbinadil·
@cong_ml Amazing stuff. curious to look more into your approach for idea generation and selection
English
0
0
1
30
Cong Lu
Cong Lu@cong_ml·
Recursive just came out of stealth, and the team has been cooking 🔥 Our first results: an automated AI research system that can improve AI across 3 very different settings across training and GPU kernel optimization. recursive.com/articles/first…
Cong Lu tweet media
English
19
35
335
47.1K
Annas Bin Adil
Annas Bin Adil@annasbinadil·
@nlpxuhui This is amazing work and much needed, especially in safety research. would love to collab with you guys at atella atella.ai
English
1
0
1
119
Xuhui Zhou
Xuhui Zhou@nlpxuhui·
Does LLM really need to be a helpful assistant all the time? No. If you want to simulate people, “perfectly helpful” could be the wrong objective. Meet OdysSim, a journey toward LLMs beyond assistants, as behavioral foundation models (10B tokens of real human behavior; 23 sim benchmarks, finally in one place. new open models: outperform or on par with GPT-5.5, Gemini 3.1, or Claude Opus 4.7 in many behavior-sim dimensions). Human behavior simulation is becoming essential. Agent evaluation needs realistic users before real users show up. Medical and classroom training need realistic patients and students. Social science needs synthetic participants at scale. But real people are not ideal assistants. Real patients panic or ignore good advice. Real students misunderstand. Real customers are vague, picky, impatient, or simply leave. Human behavior is messy, diverse, and often imperfect. Frontier LLMs are getting better at math, code, and long-horizon tasks. They are NOT getting better at simulating human behavior. If anything, they drift the other way: more assistant-ish, more homogeneous, fewer of the errors and quirks real humans show. This is no accident. The whole pipeline is built for helpfulness and task success, not behavioral realism. And you can't prompt your way out of that. So we rethink the recipe from scratch and release: 🧠 The OdysSim corpus: 21.4M real human interactions (~10B tokens) from 62 sources, every conversation retrofitted with social grounding (who is talking, and why) 📏 SOUL-Index: 23 human-behavior benchmarks unified into one suite across 5 axes 🤖 OSim-8B: open weights; tops more SOUL-Index benchmarks than any frontier model, acts more like a real user than any of them on τ-bench (nearly matching real humans in the reaction dimension), and writes far more human-like text along the way.
English
12
75
459
131.9K
Epoch AI
Epoch AI@EpochAIResearch·
FrontierMath: Tiers 1–4 (v2) is live. We concluded an audit that addressed errors in 42% of problems. Rankings are similar but scores are higher across the board. The current leaders are GPT-5.5 (xhigh) with 85% on Tiers 1–3 and Google’s AI co-mathematician with 76% on Tier 4.
Epoch AI tweet media
English
27
66
581
116.9K
Annas Bin Adil
Annas Bin Adil@annasbinadil·
@DarioAmodei The shift in this essay from "theoretical risk" to "demonstrated risk" is the part policymakers will need evidence for, case by case. Documenting how agents fail in practice is the job we've taken on at atella.ai
English
0
0
2
672
Dario Amodei
Dario Amodei@DarioAmodei·
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-on…
English
1.3K
2.4K
13.5K
6.5M
Annas Bin Adil
Annas Bin Adil@annasbinadil·
why are so many things "load bearing"? iykyk
English
0
0
0
23
Matt Shumer
Matt Shumer@mattshumer_·
Fable has solved 3D worldbuilding... utterly insane. This is all completely custom-built ThreeJs, running in the browser.
English
503
300
5.3K
1.4M
Annas Bin Adil
Annas Bin Adil@annasbinadil·
Sometimes it takes slowing down to moving faster
English
1
0
0
39