Minglai Yang

50 posts

Minglai Yang banner
Minglai Yang

Minglai Yang

@Yminglai

reasoning & interp #NLProc. Founder @uaaiclub | Research @labclu @thukeg @Abaka_AI @2077AI

Palo Alto, CA Katılım Mayıs 2024
183 Takip Edilen28 Takipçiler
Sabitlenmiş Tweet
Minglai Yang
Minglai Yang@Yminglai·
Our paper accepted at EMNLP 2025 Main! 🎉 @emnlpmeeting “How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark” 👉 arxiv.org/abs/2505.18761 📌 We introduce GSM-DC: a controlled benchmark for reasoning under Irrelevant Context (IC). We systematically vary reasoning depth and IC level via a knowledge DAG to study LLM reasoning behavior under distractions, not just accuracy🧭 👥 Huge thanks to my awesome team: @_ethan_huang @LiangZhang4825 @msurd @WilliamWangNLP @PanLiangming
Minglai Yang tweet mediaMinglai Yang tweet media
English
1
3
28
5.2K
Minglai Yang retweetledi
David Pfau
David Pfau@pfau·
Oh god are we really doing this? Jeff Dean trained an n-gram model on the entire internet in 2007. Jelinek coined the term "language model" in the '70s. It's called "Claude" because Claude Shannon was estimating the entropy rate of the English language in 1951!
Aran Komatsuzaki@arankomatsuzaki

While Alec is one of the best ML researchers of all time, LLM started way before. Here's one from 2013 for non-neural architecture and one from 2016, which is afaik the first neural LLM if we define LLM as LM w/ >1B params.

English
34
84
1.3K
469.3K
Minglai Yang retweetledi
slash1s
slash1s@slash1sol·
Game Changer: Chinese college student Guo Hangjiang (GitHub: 666ghj) codes MiroFish AI swarm engine solo in 10 days with AI assistants, explodes GitHub to 23k+ stars, bags $4.1M from Shanda Group in 24 hours, ditches dorm life to become Shanghai CEO. Tech Highlights: .GraphRAG builds detailed knowledge graphs (e.g., 905 entities, 3,822 relations in demos). .Agents with Zep memory, unique traits, sims capped at ~40 rounds for efficiency. .Hybrid Node.js/Python backend, Docker deploy, OpenAI-compatible LLMs. .Low-resource scaling: 8B param dragon boat festival sim runs smooth (vid demo). .AGPL license, credits CAMEL-AI, recent fixes for UI/errors. .Online demo: 666ghj.github.io/mirofish-demo/ - Test predictions interactively. Creator Quick Look: >> BaiFu (Guo Hangjiang), BUPT senior turned Shanghai CEO at Shanda. >> Built on 38.6k-star BettaFish & MindSpider for full data-prediction pipelines. >> 790 contribs last year. Real-World Plays -> -> Sentiment: Wuhan Uni backlash sim predicts trends (repo vid). -> Literary: Agents finish Dream of the Red Chamber's lost ending. -> Finance: Historical data for market forecasts; like bags on Polymarket SPX bets. -> Policy: "What-if" for bills/geopolitics. -> Cultural: Dragon boat fest agent dynamics. -> Ecosystems: Integrates crawlers for PR/decision tools. Repo: github.com/666ghj/MiroFish
slash1s tweet media
slash1s@slash1sol

Mind blown: A Chinese quant college student builds an AI swarm engine in 10 days flat, explodes GitHub with 13,000+ stars, and scores $4,000,000 in funding! Introducing MiroFish is the multi-agent simulator that's revolutionizing predictions for trading, PR, and more. What is MiroFish? It's a digital sandbox where thousands of AI agents with individual memories and behaviors interact like a real society. Feed it any scenario (news leak, policy change, or even a classic novel's missing ending), and it simulates crowd reactions, debates, and outcomes to forecast real-world events. The Creator's Story: > In late 2025, fourth-year student Guo Hanjiang coded the core using AI assistants. > It went viral overnight, landing him 30m Yuan (~$4m) from Shanda Group. > He ditched the dorm, started a company, and now leads the charge. Key Applications: .Trading: Input financial news or reports, watch simulated market panics and price swings for predictive insights. .PR Testing: Companies/Politics run draft statements to spot backlash and refine messaging. .Creative Experiments: Loaded a lost-ending Chinese novel, agents role-played characters and generated a logical finale. .Easy setup: Deploy via Docker in minutes with any LLM API key. Pro tip: Simulate something wild like Elon Musk tweeting about Dogecoin 2.0 and spawn agent traders, influencers, and investors, generate real-time video clips of the frenzy to test moonshots or crashes risk-free. Traders are already winning big: Check this one on Polymarket - $120,000+ net profits from spot on SPX 500 bets, powered by MiroFish sims on historical data. His profile: polymarket.com/profile/%40moi… For effortless gains, try Kreo copy trading: Auto-mirror pros like him and ride their edges. Try here: @join" target="_blank" rel="nofollow noopener">kreo.app/@join Add his wallet: [0x17559efac103ac7f361be37ec0b93888d4c55aac] to [t.me/KreoPolyBot?st…] and start track/copy him. Repo: github.com/666ghj/MiroFish

English
42
194
1.6K
675.3K
Minglai Yang retweetledi
Abaka AI
Abaka AI@AbakaAI_Tech·
🙋Join the conversation at DataMFM @CVPR 2026 As multimodal research moves beyond brute-force data collection, the focus for Multimodal Foundation Models (MFMs) is increasingly on the development of principled data ecosystems. The DataMFM Workshop provides a forum to discuss these evolving foundations. Why join DataMFM? We deep-dive into the critical data challenges defining the next generation of AI: • Agentic Data Pipelines & Synthetic Data generation • Data Governance: Quality, contamination, and provenance • Cross-modal Alignment: Integrating Text, Image, Audio, and Video 📅 Extended Deadlines: • Archival (8pp): March 10, 2026 • Non-archival (4pp): April 1, 2026 🔗 Details: datamfm.github.io Let’s solve the data challenges that define the next decade of AI. 🤝 #CVPR2026 #ai #chart #abakaai #2077ai #MultimodalAI #DataMFM #Research #IBM #MIT #Harvard #Stanford #Watsonlab
Abaka AI tweet media
English
1
2
6
224
Xin Eric Wang
Xin Eric Wang@xwang_lk·
ICLR = I Can Locate Reviewers ACL = Authors Can be Located From Rednote. 😂
English
6
12
277
144K
NZ ☄️
NZ ☄️@CodeByNZ·
do we all agree?
NZ ☄️ tweet media
English
1.1K
131
2.8K
320.5K
Zeyuan Allen-Zhu, Sc.D.
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·
Continuing Tutorial II for Physics of Language Models. We often trust large-scale results simply because they are large; but once noise is removed, the synthetic pretrain playground starts to push back — hard! The second video (Part 4.1b, 90 minutes) makes this pushback concrete. From it, I derive 20+ architectural principles, organized into 12 result blocks. Two highlights that consistently surprise even experienced readers: Result 2.1 (new): "Why Canon layers actually work." Not because of multi-token attention — that explanation only applies to the first layer. The real mechanism is how Canon reshapes hierarchical learning across depth. Result 11: "Why linear models reason 4× shallower than Transformers." This has nothing to do with memory size — it is a structural failure shared by nearly all linear architectures. In Result 12, I show which of these principles already emerge at academic-scale pretraining (1.3B / 100B) — with orders-of-magnitude lower cost and far cleaner signals than many real-life large-scale runs. The remaining principles do not disappear; they only emerge when scaling to 8B / 1T, which I will show in the third video (Part 4.2). ⏮️ Previous: Part 4.1a — methodology & playground design ▶️ This: Part 4.1b — architectural principles from the playground 🔜 Next: Part 4.2 — when the playground reshapes real-life pretraining
Zeyuan Allen-Zhu, Sc.D. tweet media
English
13
100
715
186.3K
Minglai Yang
Minglai Yang@Yminglai·
@YouJiacheng Could I ask them for the training data? These are amazing results
English
0
0
0
286
Minglai Yang retweetledi
will brown
will brown@willccbb·
@severinhacker the point of a PhD is not to get a PhD, it’s to do a PhD
English
24
93
1.8K
106.9K
Minglai Yang retweetledi
David Bau
David Bau@davidbau·
At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/…
David Bau tweet media
English
24
100
553
107K
Xiang Yue
Xiang Yue@xiangyue96·
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic GSM-like reasoning data from scratch. Here are what we found: 🧵
Xiang Yue tweet media
English
28
241
1.4K
325.6K
Minglai Yang retweetledi
Christopher Potts
Christopher Potts@ChrisGPotts·
This post seems to describe substantially the same view that I offer here: web.stanford.edu/~cgpotts/blog/… Why are people describing the GDM post as concluding that mech-interp is a failed project? Is it the renaming of the field and constant talk of "pivoting"?
Neel Nanda@NeelNanda5

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

English
4
22
127
31.8K
Minglai Yang retweetledi
Chuang Gan
Chuang Gan@gan_chuang·
ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!
English
27
137
988
177.6K