BrainYoung
1.4K posts

BrainYoung
@BrainYoung_com
AI Insights | Daily Curated Intelligence Handpicked high-value AI news from across the globe. "We sift through the noise, you absorb the signal".

The state of the art in agency benchmarking: Claude claiming it's wearing a navy blazer and a red tie and waiting by the vending machine for a customer (TBC, this is a good thread and interesting experiment! Just also, wat) x.com/AnthropicAI/st…

My favorite, how each engineering group would build an aircraft Great hardware system design involves making the subsystems just good enough


Not convinced? We are releasing our full discovery system, experiments, and artifacts. Feel free to browse, run experiments, or even build your own discovery agents!

🚀 Meet Qwen-TTS – now live via the Qwen API ! Trained on millions of hours of speech, it delivers ultra-natural, expressive audio with smart prosody, pacing, and emotion. 🗣️ Supports 3 Chinese dialects: Beijing, Shanghai, Sichuan 🎙️ 7 bilingual voices: Cherry, Ethan, Chelsie, Serena, Dylan, Jada, Sunny 🔗 Learn more: qwenlm.github.io/blog/qwen-tts/ 👉 Start using it via API: help.aliyun.com/zh/model-studi…

Chain of Thought was just the beginning. Next up: Chain of Debate. We’re going from a single model “thinking out loud” to multiple models discussing out loud. Debating, debugging, deliberating. AI becomes AIs. “Two heads are better than one” is true for LLMs too.

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found that although very powerful, RL struggles to compose skills and to innovate new strategies that were not seen during training. 👇 work w. @UCBerkeley @allen_ai A thread on what we learned 🧵

Machines that reach consensus with machines about how machines judge machines which write machines that edit machines

If you studied algorithms, I'm sure you've heard of Dijkstra’s algorithm to find the shortest paths between nodes in a weighted graph. Super useful in scenarios such as road networks, where it can determine the shortest route from a starting point to various destinations. It's been the most optimal algorithm since 1956! Until now. The O(E + V log V) complexity just went down to O(E log^(2/3) V) for sparse graphs. It would be amazing if this kind of breakthrough came through AI that can code but I guess we're not there yet..

Mildly obsessed with what the "highest grade" pretraining data stream looks like for LLM training, if 100% of the focus was on quality, putting aside any quantity considerations. Guessing something textbook-like content, in markdown? Or possibly samples from a really giant model? Curious what the most powerful e.g. 1B param model trained on a dataset of 10B tokens looks like, and how far "micromodels" can be pushed. As an example, (text)books are already often included in pretraining data mixtures but whenever I look closely the data is all messed up - weird formatting, padding, OCR bugs, Figure text weirdly interspersed with main text, etc. the bar is low. I think I've never come across a data stream that felt *perfect* in quality.

I accidentally found a way to add a custom model to Claude Code...

Vibe coding in AI Studio coming soon

DeepSite v2 just dropped on Hugging Face

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released: 1️⃣ New benchmark for fair comparison of OCR engines and APIs 2️⃣ Improved inference that is faster and cheaper to run 3️⃣ Docker image for easy deployment

More open releases — exciting to see. Pretraining quality is so crucial — sets the ceiling for what is possible with post-training.

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs - MoE LLM with 2.75B active params - SotA small-scale reasoning model - Proposes C3PO to improve training stability and computational throughput for RL with MoE



The progress of Gemini over the last year +

Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the Pareto frontier of cost and speed. 🚀

