Minglai Yang

50 posts

Minglai Yang

@Yminglai

reasoning & interp #NLProc. Founder @uaaiclub | Research @labclu @thukeg @Abaka_AI @2077AI

Palo Alto, CA Katılım Mayıs 2024

183 Takip Edilen28 Takipçiler

Sabitlenmiş Tweet

Minglai Yang@Yminglai·22 Ağu

Our paper accepted at EMNLP 2025 Main! 🎉 @emnlpmeeting “How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark” 👉 arxiv.org/abs/2505.18761 📌 We introduce GSM-DC: a controlled benchmark for reasoning under Irrelevant Context (IC). We systematically vary reasoning depth and IC level via a knowledge DAG to study LLM reasoning behavior under distractions, not just accuracy🧭 👥 Huge thanks to my awesome team: @_ethan_huang @LiangZhang4825 @msurd @WilliamWangNLP @PanLiangming

English

5.2K

Minglai Yang retweetledi

David Pfau@pfau·30 Mar

Oh god are we really doing this? Jeff Dean trained an n-gram model on the entire internet in 2007. Jelinek coined the term "language model" in the '70s. It's called "Claude" because Claude Shannon was estimating the entropy rate of the English language in 1951!

Aran Komatsuzaki@arankomatsuzaki

While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.

English

1.3K

469.3K

Minglai Yang retweetledi

Peter Jansen ( @peterjansen-ai.bsky.social )@peterjansen_ai·28 Mar

I'm hiring a Research Scientist at @uarizona in AI/NLP to work on automated scientific feasibility assessment for scientific discovery, esp. through automated experiments. Remote work within the US can be considered. Applicant review starts March 30th. arizona.csod.com/ux/ats/careers…

Peter Jansen ( @peterjansen-ai.bsky.social ) tweet media

English

Minglai Yang retweetledi

slash1s@slash1sol·14 Mar

Game Changer: Chinese college student Guo Hangjiang (GitHub: 666ghj) codes MiroFish AI swarm engine solo in 10 days with AI assistants, explodes GitHub to 23k+ stars, bags $4.1M from Shanda Group in 24 hours, ditches dorm life to become Shanghai CEO. Tech Highlights: .GraphRAG builds detailed knowledge graphs (e.g., 905 entities, 3,822 relations in demos). .Agents with Zep memory, unique traits, sims capped at ~40 rounds for efficiency. .Hybrid Node.js/Python backend, Docker deploy, OpenAI-compatible LLMs. .Low-resource scaling: 8B param dragon boat festival sim runs smooth (vid demo). .AGPL license, credits CAMEL-AI, recent fixes for UI/errors. .Online demo: 666ghj.github.io/mirofish-demo/ - Test predictions interactively. Creator Quick Look: >> BaiFu (Guo Hangjiang), BUPT senior turned Shanghai CEO at Shanda. >> Built on 38.6k-star BettaFish & MindSpider for full data-prediction pipelines. >> 790 contribs last year. Real-World Plays -> -> Sentiment: Wuhan Uni backlash sim predicts trends (repo vid). -> Literary: Agents finish Dream of the Red Chamber's lost ending. -> Finance: Historical data for market forecasts; like bags on Polymarket SPX bets. -> Policy: "What-if" for bills/geopolitics. -> Cultural: Dragon boat fest agent dynamics. -> Ecosystems: Integrates crawlers for PR/decision tools. Repo: github.com/666ghj/MiroFish

slash1s@slash1sol

Mind blown: A Chinese quant college student builds an AI swarm engine in 10 days flat, explodes GitHub with 13,000+ stars, and scores $4,000,000 in funding! Introducing MiroFish is the multi-agent simulator that's revolutionizing predictions for trading, PR, and more. What is MiroFish? It's a digital sandbox where thousands of AI agents with individual memories and behaviors interact like a real society. Feed it any scenario (news leak, policy change, or even a classic novel's missing ending), and it simulates crowd reactions, debates, and outcomes to forecast real-world events. The Creator's Story: > In late 2025, fourth-year student Guo Hanjiang coded the core using AI assistants. > It went viral overnight, landing him 30m Yuan (~$4m) from Shanda Group. > He ditched the dorm, started a company, and now leads the charge. Key Applications: .Trading: Input financial news or reports, watch simulated market panics and price swings for predictive insights. .PR Testing: Companies/Politics run draft statements to spot backlash and refine messaging. .Creative Experiments: Loaded a lost-ending Chinese novel, agents role-played characters and generated a logical finale. .Easy setup: Deploy via Docker in minutes with any LLM API key. Pro tip: Simulate something wild like Elon Musk tweeting about Dogecoin 2.0 and spawn agent traders, influencers, and investors, generate real-time video clips of the frenzy to test moonshots or crashes risk-free. Traders are already winning big: Check this one on Polymarket - $120,000+ net profits from spot on SPX 500 bets, powered by MiroFish sims on historical data. His profile: polymarket.com/profile/%40moi… For effortless gains, try Kreo copy trading: Auto-mirror pros like him and ride their edges. Try here: @join" target="_blank" rel="nofollow noopener">kreo.app/@join Add his wallet: [0x17559efac103ac7f361be37ec0b93888d4c55aac] to [t.me/KreoPolyBot?st…] and start track/copy him. Repo: github.com/666ghj/MiroFish

English

194

1.6K

675.3K

Minglai Yang retweetledi

Abaka AI@AbakaAI_Tech·5 Mar

🙋Join the conversation at DataMFM @CVPR 2026 As multimodal research moves beyond brute-force data collection, the focus for Multimodal Foundation Models (MFMs) is increasingly on the development of principled data ecosystems. The DataMFM Workshop provides a forum to discuss these evolving foundations. Why join DataMFM? We deep-dive into the critical data challenges defining the next generation of AI: • Agentic Data Pipelines & Synthetic Data generation • Data Governance: Quality, contamination, and provenance • Cross-modal Alignment: Integrating Text, Image, Audio, and Video 📅 Extended Deadlines: • Archival (8pp): March 10, 2026 • Non-archival (4pp): April 1, 2026 🔗 Details: datamfm.github.io Let’s solve the data challenges that define the next decade of AI. 🤝 #CVPR2026 #ai #chart #abakaai #2077ai #MultimodalAI #DataMFM #Research #IBM #MIT #Harvard #Stanford #Watsonlab

English

224

Minglai Yang@Yminglai·25 Oca

@rasmalai Prolog

English

Minglai Yang@Yminglai·22 Oca

🚫Scaling laws alone can’t tell the full story. We know LLMs get better as they get bigger, but we still don't fully understand how they reason. In our new paper with Prof. Liangming Pan, we open the black box to answer these critical questions👇

Liangming Pan@PanLiangming

🔥 What actually happens during multi-step reasoning with LLMs? ❓What are the internal computations? ❓How is such capability acquired during training？ ❓Does the latent reasoning rely on shortcuts? ❓How does CoT remodel internal computation? ❓Why does CoT enhance reasoning capability? and more... Many questions remain about the internal machinery. We wrote a paper to systematically review the existing process of revealing the mechanisms behind LLM multi-step reasoning—from implicit latent reasoning to explicit CoT reasoning. We also highlight directions for future mechanistic studies. 📄Paper: arxiv.org/pdf/2601.14270 💻Github: github.com/PKU-PILLAR-Gro…

English

Minglai Yang@Yminglai·7 Oca

@xwang_lk 🤣🤣🤣

QME

3.8K

Xin Eric Wang@xwang_lk·6 Oca

ICLR ＝ I Can Locate Reviewers ACL ＝ Authors Can be Located From Rednote. 😂

English

277

144K

Minglai Yang@Yminglai·5 Oca

@CodeByNZ Where is GLM @Zai_org ? One of the best open source models

English

NZ ☄️@CodeByNZ·4 Oca

do we all agree?

English

1.1K

131

2.8K

320.5K

Minglai Yang@Yminglai·4 Oca

@ZeyuanAllenZhu Waiting for 4.2😁😁

English

169

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·31 Ara

Continuing Tutorial II for Physics of Language Models. We often trust large-scale results simply because they are large; but once noise is removed, the synthetic pretrain playground starts to push back — hard! The second video (Part 4.1b, 90 minutes) makes this pushback concrete. From it, I derive 20+ architectural principles, organized into 12 result blocks. Two highlights that consistently surprise even experienced readers: Result 2.1 (new): "Why Canon layers actually work." Not because of multi-token attention — that explanation only applies to the first layer. The real mechanism is how Canon reshapes hierarchical learning across depth. Result 11: "Why linear models reason 4× shallower than Transformers." This has nothing to do with memory size — it is a structural failure shared by nearly all linear architectures. In Result 12, I show which of these principles already emerge at academic-scale pretraining (1.3B / 100B) — with orders-of-magnitude lower cost and far cleaner signals than many real-life large-scale runs. The remaining principles do not disappear; they only emerge when scaling to 8B / 1T, which I will show in the third video (Part 4.2). ⏮️ Previous: Part 4.1a — methodology & playground design ▶️ This: Part 4.1b — architectural principles from the playground 🔜 Next: Part 4.2 — when the playground reshapes real-life pretraining

English

100

715

186.3K

Minglai Yang@Yminglai·2 Oca

@WenhuChen fr🤣🤣

698

Minglai Yang@Yminglai·1 Oca

@YouJiacheng Could I ask them for the training data? These are amazing results

English

286

You Jiacheng@YouJiacheng·1 Oca

what? 81.4 SWE-Bench Verified 40B parameters from Ubiquant

Zhipeng Huang@nopainkiller

another hardcore model arch using looped transformer from a chinese quant company github.com/IQuestLab/IQue…

English

687

122.9K

Minglai Yang@Yminglai·1 Oca

This is crazy, another deepseek moment?

Zephyr@zephyr_z9

What are Chinese quant companies smoking to get this kind of performance??? Mogging Sonnet 4.5 with a 40B

English

Minglai Yang@Yminglai·30 Ara

I love the controlled playground for architectural design! New ideas are always inspired and then verified through these controlled experiments.

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu

I’m launching Tutorial II for Physics of Language Models. Many people focus on large-scale results. This tutorial is about why those results are often artifacts of noise — and how to eliminate that noise at the design level. The first video (Part 4.1a, 1 hour) is the most important one. It focuses on methodology, not benchmarks: – how real-life pretraining can be “cheated”, – why academic-scale experiments are noisy, – and most importantly, how to design a versatile, skill-pure synthetic pretraining playground. I explain why our five synthetic tasks are designed the way they are, and how GPT2-small-scale (100M) models can reveal architectural truths that 8B models trained on 1T tokens often fail to expose reliably. This methodology is the backbone of the entire Physics series. ▶️ First video: Part 4.1a — methodology & playground design 🔜 Second video: Part 4.1b — architectural principles from the playground 🔜 Third video: Part 4.2 — when the playground reshapes real-life pretraining (You can find it via my profile.)

English

Minglai Yang retweetledi

will brown@willccbb·14 Ara

@severinhacker the point of a PhD is not to get a PhD, it’s to do a PhD

English

1.8K

106.9K

Minglai Yang@Yminglai·10 Ara

@NiJinjie Physics of Language Models - Part 2.3?

English

Jinjie Ni@NiJinjie·10 Ara

Physics of language models for RLVR! Comprehensive conclusions with clean methodologies.

Xiang Yue@xiangyue96

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass @128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

English

Minglai Yang retweetledi

David Bau@davidbau·10 Ara

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/…

English

100

553

107K

Xiang Yue@xiangyue96·9 Ara

There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵

English

241

1.4K

325.6K

Minglai Yang@Yminglai·10 Ara

@giffmana @xiangyue96 @ZeyuanAllenZhu That data example looks really similar to his Part 2😁

English

218

Lucas Beyer (bl16)@giffmana·10 Ara

@xiangyue96 Cc @ZeyuanAllenZhu you would like this paper

English

4.2K

Minglai Yang retweetledi

Christopher Potts@ChrisGPotts·2 Ara

This post seems to describe substantially the same view that I offer here: web.stanford.edu/~cgpotts/blog/… Why are people describing the GDM post as concluding that mech-interp is a failed project? Is it the renaming of the field and constant talk of "pivoting"?

Neel Nanda@NeelNanda5

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

English

127

31.8K

Minglai Yang retweetledi

Chuang Gan@gan_chuang·30 Kas

ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!

English

137

988

177.6K

Keşfet

@uarizona @CVPR @xwang_lk @CodeByNZ @Zai_org @ZeyuanAllenZhu @WenhuChen @YouJiacheng