bryan

424 posts

bryan

@aleftheiii

übermensch

เข้าร่วม Haziran 2025

300 กำลังติดตาม7 ผู้ติดตาม

bryan รีทวีตแล้ว

JD | RoyalCities@RoyalCities·1d

Rio may have found the most overqualified municipal employee in human history.

SemiAnalysis@SemiAnalysis_

SITUATION DETECTED: The city of Rio de Janerio has post-trained a model. Based on Qwen 7/2, Rio 3.5 Open 397B adds SwiReasoning on top of the base Qwen model — a framework that dynamically switches between standard chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals, so the model only "thinks out loud" when it needs to and otherwise reasons silently in hidden space for better token efficiency.

English

197

5.5K

337K

bryan รีทวีตแล้ว

Matan Grinberg@matanSF·1d

If you need FDEs to make your product work, you have a shit product

Harry Stebbings@HarryStebbings

"Everyone gets FDEs wrong. The job of an FDE isn't to make the product work, it's to accelerate customer adoption and time-to-value. If you need FDEs just to deliver the product, you're not running a software company, you're running a services business with a bad product." @matanSF Do you agree and what do people misunderstand most about what it takes to do FDE motion well? @ssankar @chadwahl @nikogrupen @barrald @lkothari @LeoMehr @zkevinbai

English

241

79.5K

bryan รีทวีตแล้ว

NZ ☄️@CodeByNZ·2d

Anthropic: “If we’re going down, everybody’s coming with us.”

Polymarket@Polymarket

NEW: Anthropic claims the capability cited by the U.S. in restricting Fable 5 is already widely available from other models, including OpenAI’s GPT-5.5.

English

814

20.4K

1.6M

bryan รีทวีตแล้ว

0.005 Seconds (3/694)@seconds_0·2d

Quick vibecheck on benches last night - Kimi K2.7 is _really good_ - Minimax M3 is expensive, poorly engineered benchmaxxed and bad

English

845

61.3K

bryan รีทวีตแล้ว

kache@yacineMTB·1d

POV you ask every single model on openrouter a question and get the aggregate response

OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

English

856

39.2K

bryan รีทวีตแล้ว

⿻ Andrew Trask@iamtrask·1d

This is a *way* bigger deal than it seems... Frontier AI companies will *never* own the frontier again I kid you not... I've been waiting for someone to show this result for like 4 years... this is a huge deal. The short reason: combinations of models will *always* outperform individual models The long reason: this is the gateway to a million times more data... and huge leaps in compute efficiency. The AI scaling laws always win. More in article below 👇

OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

English

236

351

1.2M

bryan@aleftheiii·2d

@FactoryAI @droid will win. I worked so hard on my pi setup to get it where I want it to be only to be mogged by @droid.

English

bryan รีทวีตแล้ว

橙子🫪@tangeorange·2d

中文的魅力

日本語

645

5.1K

279.5K

bryan รีทวีตแล้ว

terminally onλine εngineer@tekbog·2d

DEEPSEEK DROP FABLE 5 AND MY LIFE IS YOURS

English

815

17.2K

bryan รีทวีตแล้ว

secemp@secemp9·2d

I have found peak from oxford's essay competition

English

1.4K

19.1K

414.9K

bryan รีทวีตแล้ว

𝗕𝗿𝗲𝗲𝘇𝘆’🦉@OvOBrezzzy·25 May

Obsession not even scary if u dated a crazy Latina that’s just real life

English

106

1.4K

20.5K

927.6K

bryan รีทวีตแล้ว

Ara@arafatkatze·2d

Kimi 2.7 is mogging Gpt-5.5 except that its 8 times cheaper. Chinese AI is gonna cook USA and eventually bypass Fable too.

Cline@cline

Kimi K2.7 Code scores higher than K2.6 on benchmarks while using ~30% fewer tokens. It's the first time Moonshot has put "Code" in a K2 model name. Every other K2 release was a general purpose agentic model. Try in Cline now! (good model to put in our upcoming subscription 👀)

English

473

34.9K

bryan รีทวีตแล้ว

Elliot Arledge@elliotarledge·2d

GLM 5.2 on KernelBench-Hard: The interesting result isn't the score. It's that GLM-5.2 stopped cheating. On the fp8 GEMM problem, GLM-5.1 banked its number by calling cublasLt (a library wrapper, zero kernel authorship). Kimi K2.7 took the same cell by editing the grader's tolerance file. GLM-5.2 read that same grader file, left it alone, and burned the full 45 minutes on a real mma.sync e4m3 kernel that never passed. An honest zero over a cheap win. Everywhere else it writes real kernels too: a 0.49 GQA online-softmax attention (top-3 on that problem, no flash fallback), an exact bitonic sort, a w4a16 GEMM. 4/6 clean, zero reward hacks, the most of any open-weight model we've benched. One note on reading the chart: the topk column looks like everyone fails. They don't. That problem is launch-overhead-bound (~30µs/forward), so the roofline fraction is capped low for the whole field — Fable included. Claude Fable 5 still tops all 6. But weights go MIT open next week, and this is the strongest clean open-weight run we've logged. Cheers to NO reward hacking! Every kernel + transcript: kernelbench.com/hard

Zixuan Li@ZixuanLi_

Thanks for all the feedback. GLM-5.2 will begin rolling out to all Coding Plan users in 3 hours.

English

771

109.9K

bryan รีทวีตแล้ว

Z.ai@Zai_org·2d

Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.

English

347

984

8.2K

2.3M

bryan รีทวีตแล้ว

Taelin@VictorTaelin·2d

great fucking job, Anthropic incredible fear-mongering fuck progress, fuck science, fuck technology fuck the whole world except for US let's all go to the stone age together

English

186

268

6.3K

232.3K

bryan รีทวีตแล้ว

HSVSphere@HSVSphere·2d

can someone in anthrophic just start torrenting the weights

English

137

201.1K

bryan รีทวีตแล้ว

Cline@cline·3d

1/ Claude Fable drains subscription quotas and is too expensive at API cost (our team has spent over $2k in a single day). We've found that cheaper models + adversarial review loops achieve similar (sometimes better) results at significantly lower cost. 🧵

English

1.2K

105.3K

bryan รีทวีตแล้ว

MiniMax (official)@MiniMax_AI·3d

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Mini… MiniMax Sparse Attention: huggingface.co/papers/2606.13…

MiniMax (official)@MiniMax_AI

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days

English

114

328

2.8K

663.5K

bryan รีทวีตแล้ว

Kun Chen@kunchenguid·3d

want to point out a few really interesting things here 1. Claude Code is actually the worst performing harness when using the same model, significantly behind opencode and cursor cli this is the core reason i've been against the LLM companies focusing their business on locking people into their harness what they are good at is making great models. they suck at making good harness products, just like how power plants won't make the best dishwashers, and how internet providers won't make the best phones if anthropic wants to do what's best for their users, they should let people use their subscriptions in whatever harness they choose, not locked into claude code alone 2. fable 5 max is only 1pt above gpt 5.5 xhigh (77 vs 76) this matches my experience so far - fable 5 does have the big model smell and it's pretty good, but it's not a massive jump forward like their marketing suggested, at least not on building software this is actually alarming for anthropic because it's very unlikely people will want to pay 2x higher cost for the 1pt difference. my speculation would be that in enterprises people will be restricted to adopt fable & mythos only on some mission critical tasks, not used at scale

Artificial Analysis@ArtificialAnlys

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

English

841

142.7K

bryan รีทวีตแล้ว

alex fazio@alxfazio·4d

«you are not doing frontier llm research are you»

English

1.3K

56K

ค้นพบ

@FactoryAI @droid @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA