bryan

424 posts

bryan banner
bryan

bryan

@aleftheiii

übermensch

เข้าร่วม Haziran 2025
300 กำลังติดตาม7 ผู้ติดตาม
bryan รีทวีตแล้ว
bryan รีทวีตแล้ว
bryan รีทวีตแล้ว
0.005 Seconds (3/694)
0.005 Seconds (3/694)@seconds_0·
Quick vibecheck on benches last night - Kimi K2.7 is _really good_ - Minimax M3 is expensive, poorly engineered benchmaxxed and bad
English
33
12
845
61.3K
bryan รีทวีตแล้ว
⿻ Andrew Trask
⿻ Andrew Trask@iamtrask·
This is a *way* bigger deal than it seems... Frontier AI companies will *never* own the frontier again I kid you not... I've been waiting for someone to show this result for like 4 years... this is a huge deal. The short reason: combinations of models will *always* outperform individual models The long reason: this is the gateway to a million times more data... and huge leaps in compute efficiency. The AI scaling laws always win. More in article below 👇
OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

English
236
351
5K
1.2M
bryan
bryan@aleftheiii·
@FactoryAI @droid will win. I worked so hard on my pi setup to get it where I want it to be only to be mogged by @droid.
English
1
0
2
23
bryan รีทวีตแล้ว
橙子🫪
橙子🫪@tangeorange·
中文的魅力
橙子🫪 tweet media
日本語
73
645
5.1K
279.5K
bryan รีทวีตแล้ว
secemp
secemp@secemp9·
I have found peak from oxford's essay competition
secemp tweet media
English
41
1.4K
19.1K
414.9K
bryan รีทวีตแล้ว
𝗕𝗿𝗲𝗲𝘇𝘆’🦉
Obsession not even scary if u dated a crazy Latina that’s just real life
English
106
1.4K
20.5K
927.6K
bryan รีทวีตแล้ว
Elliot Arledge
Elliot Arledge@elliotarledge·
GLM 5.2 on KernelBench-Hard: The interesting result isn't the score. It's that GLM-5.2 stopped cheating. On the fp8 GEMM problem, GLM-5.1 banked its number by calling cublasLt (a library wrapper, zero kernel authorship). Kimi K2.7 took the same cell by editing the grader's tolerance file. GLM-5.2 read that same grader file, left it alone, and burned the full 45 minutes on a real mma.sync e4m3 kernel that never passed. An honest zero over a cheap win. Everywhere else it writes real kernels too: a 0.49 GQA online-softmax attention (top-3 on that problem, no flash fallback), an exact bitonic sort, a w4a16 GEMM. 4/6 clean, zero reward hacks, the most of any open-weight model we've benched. One note on reading the chart: the topk column looks like everyone fails. They don't. That problem is launch-overhead-bound (~30µs/forward), so the roofline fraction is capped low for the whole field — Fable included. Claude Fable 5 still tops all 6. But weights go MIT open next week, and this is the strongest clean open-weight run we've logged. Cheers to NO reward hacking! Every kernel + transcript: kernelbench.com/hard
Elliot Arledge tweet media
Zixuan Li@ZixuanLi_

Thanks for all the feedback. GLM-5.2 will begin rolling out to all Coding Plan users in 3 hours.

English
24
59
771
109.9K
bryan รีทวีตแล้ว
Z.ai
Z.ai@Zai_org·
Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.
English
347
984
8.2K
2.3M
bryan รีทวีตแล้ว
Taelin
Taelin@VictorTaelin·
great fucking job, Anthropic incredible fear-mongering fuck progress, fuck science, fuck technology fuck the whole world except for US let's all go to the stone age together
English
186
268
6.3K
232.3K
bryan รีทวีตแล้ว
HSVSphere
HSVSphere@HSVSphere·
can someone in anthrophic just start torrenting the weights
English
51
137
5K
201.1K
bryan รีทวีตแล้ว
Cline
Cline@cline·
1/ Claude Fable drains subscription quotas and is too expensive at API cost (our team has spent over $2k in a single day). We've found that cheaper models + adversarial review loops achieve similar (sometimes better) results at significantly lower cost. 🧵
Cline tweet media
English
43
68
1.2K
105.3K
bryan รีทวีตแล้ว
bryan รีทวีตแล้ว
Kun Chen
Kun Chen@kunchenguid·
want to point out a few really interesting things here 1. Claude Code is actually the worst performing harness when using the same model, significantly behind opencode and cursor cli this is the core reason i've been against the LLM companies focusing their business on locking people into their harness what they are good at is making great models. they suck at making good harness products, just like how power plants won't make the best dishwashers, and how internet providers won't make the best phones if anthropic wants to do what's best for their users, they should let people use their subscriptions in whatever harness they choose, not locked into claude code alone 2. fable 5 max is only 1pt above gpt 5.5 xhigh (77 vs 76) this matches my experience so far - fable 5 does have the big model smell and it's pretty good, but it's not a massive jump forward like their marketing suggested, at least not on building software this is actually alarming for anthropic because it's very unlikely people will want to pay 2x higher cost for the 1pt difference. my speculation would be that in enterprises people will be restricted to adopt fable & mythos only on some mission critical tasks, not used at scale
Kun Chen tweet media
Artificial Analysis@ArtificialAnlys

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

English
74
94
841
142.7K
bryan รีทวีตแล้ว
alex fazio
alex fazio@alxfazio·
«you are not doing frontier llm research are you»
alex fazio tweet media
English
42
42
1.3K
56K