Xiao Liang

203 posts

Xiao Liang

@MasterVito0601

Ph.D. student @UCLA, @uclanlp. Research Intern @MSFTResearch. LLMs, RL. Prev. @Tsinghua_Uni.

Beijing, China Katılım Ekim 2020

624 Takip Edilen208 Takipçiler

Xiao Liang retweetledi

Jenny Zhang@jennyzhangzt·1d

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

125

514

2.9K

247.8K

Xiao Liang@MasterVito0601·4d

@hbXNov @kaiwei_chang @adityagrover_ @VioletNPeng @AnthropicAI Congrats Hritik!

Indonesia

Hritik Bansal@hbXNov·14 Mar

Finally defended my Ph.D. thesis! 🥳 A very warm thank you to my family, friends, and advisors — @kaiwei_chang, @adityagrover_, @VioletNPeng, and Hongjing Lu. Next, I will be joining @AnthropicAI as a Member of Technical Staff. My defense slides ⬇️

English

290

23.6K

Xiao Liang retweetledi

Rosinality@rosinality·5d

Updating reward model and policy together during RLHF in a per-batch manner, with active learning. It is possible here with model-based feedback. But how could it be implemented practically with actual human feedback?

English

128

Xiao Liang retweetledi

Albert Gu@_albertgu·17 Mar

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

315

1.6K

420.5K

Xiao Liang retweetledi

Jainam Parmar@aiwithjainam·10 Mar

BREAKING: Claude can now research like a Stanford PhD student. Here are 9 insane Claude prompts that turn 40+ research papers into structured literature reviews, knowledge maps, and research gaps in minutes (Save this)

English

101

1.4K

10.5K

3.9M

Xiao Liang retweetledi

AK@_akhaliq·9 Mar

KARL Knowledge Agents via Reinforcement Learning paper: huggingface.co/papers/2603.05…

English

17.6K

Xiao Liang retweetledi

Ahmad@TheAhmadOsman·10 Mar

88 pages of gold for training MoEs Just got published yesterday, link below

English

122

893

35.1K

Xiao Liang retweetledi

DAIR.AI@dair_ai·8 Mar

The Top AI Papers of the Week (March 1 - March 8) - NeuroSkill - ParamMem - Numina-Lean-Agent - Bayesian Teaching for LLMs - Auton Agentic AI Framework - Theory of Mind in Multi-Agent LLMs - Why LLMs Form Geometric Representations Read on for more:

DAIR.AI@dair_ai

x.com/i/article/2030…

English

187

44.2K

Xiao Liang retweetledi

Berryxia.AI@berryxia·8 Mar

兄弟们！🍌这个太顶了！ Github 上的 Edit Banana这个开源项目太顶了！！！今日登上热榜，直接斩获2800Star！把AI生成的死图、流程图、架构图、PDF统计图、公式图，一键秒变完全可编辑的 DrawIO / SVG / PPTX！使用SAM3精准分割 + 本地OCR + 多模态LLM，颜色、箭头、层级、LaTeX公式全1:1还原，随便拖拽改样式！再也不用对着截图重做了，生产力直接起飞～兄弟们，地址见评论区！快去试试，谁用谁香！

中文

449

2.2K

136.9K

Xiao Liang retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

3.6K

28.2K

10.9M

Xiao Liang retweetledi

Junchen Liu@JunchenLiu77·25 Şub

Continual learning and online adaptation are often framed as the next frontier of AI. 🚀 Modern architectures use Test-Time Training (TTT) to memorize key-value pairs on the fly via gradient descent, or so we thought. To test this memorization hypothesis, we replaced gradient descent with gradient ASCENT. It should destroy memorization. Instead... performance was preserved, or even slightly improved. 😱 It turns out, TTT with KV Binding is secretly linear attention! Site: research.nvidia.com/labs/sil/proje…

English

235

52.9K

Xiao Liang@MasterVito0601·7 Mar

I also have a similar intuition about why long-CoT reasoning (e.g., DeepSeek R1) often outperforms earlier math fine-tuning approaches like Numina-Math in 2024. Long-CoT looks much closer to the distribution of pre-training data — with natural phrasing, spoken-like transitions (“wait”, “I think”, etc.), and less overly formal mathematical language. In contrast, many human-labeled math datasets use very polished, professional mathematical expressions that may drift away from the web-scale corpus used in pre-training. If long-CoT is generated from the model’s own pre-trained knowledge, its style may stay more aligned with the pre-training distribution — which might partly explain its stronger performance.

English

Xiao Liang@MasterVito0601·7 Mar

Just finished my UCLA TOP oral exam for TA qualification. It feels surprisingly analogous to the result of this great paper 😆 I tried to memorize a script generated by GPT for my presentation, but it actually hurt my performance — the language wasn’t aligned with my natural speaking style, so I spent cognitive effort recalling the script instead of communicating the ideas. Similarly, if fine-tuning data deviates too much from the pre-training distribution, performance may degrade. Mixing back some “pre-training data” — in my case, my own language style — helps.

Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English

843

Xiao Liang@MasterVito0601·7 Mar

@kothasuhas Thank you, Suhas. I really like this paper, as I have thought about this idea before. I think I will learn a lot from your great work~

English

Suhas Kotha@kothasuhas·7 Mar

@MasterVito0601 haha that’s a great example. also, congrats on finishing your exam!

English

117

Xiao Liang retweetledi

Harman Singh@Harman26Singh·5 Mar

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English

369

78.1K

Xiao Liang retweetledi

Ted Zadouri@tedzadouri·5 Mar

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

132

782

220.8K

Xiao Liang retweetledi

DailyPapers@HuggingPapers·5 Mar

HACPO: Collaborative RL for Heterogeneous Agents Enables agents to share rollouts during training while executing independently at inference. Improves performance by 3.3% while using only half the rollout cost.

English

1.9K

Xiao Liang retweetledi

Zara Zhang@zarazhangrui·4 Mar

This OpenAI blog is a must-read. Don't ask what your agents can do for you. Ask what you can do for your agent openai.com/index/harness-…

English

195

1.7K

140.8K

Xiao Liang@MasterVito0601·5 Mar

Very promising result. It’s surprising that mixing tokens from different modalities and sources can actually benefit each other. (20B VQA + 80B other tokens performs better than 100B pure VQA tokens). This makes me wonder whether future omni-models could eventually outperform pure language models on a wide range of tasks, like agentic ones. This also makes me wonder that could fewer mixed tokens already outperform full in-domain training? For instance, would 20B VQA + 30B mixed tokens beat 100B VQA😃.

Peter Tong@TongPetersb

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English

637

Xiao Liang retweetledi

Shizhe Diao@shizhediao·5 Mar

Time to upgrade your pretraining dataset. Instead of FineWeb-EDU / DCLM / X, try ClimbMix-400B. 📄 Paper: arxiv.org/pdf/2504.13161 📦 Data: huggingface.co/datasets/nvidi… CLIMBMix uses clustering-based iterative data mixture to improve pretraining efficiency and data quality. Would love to see the community experiment with it and push it further 🚀

Shizhe Diao@shizhediao

Nemotron-CLIMBMix is now becoming the default recipe in nanochat speedrun. During the Time-to-GPT-2 Leaderboard experiments started by @karpathy, the community revisited CLIMBMix and found that it delivers by far the single biggest improvement to nanochat’s GPT-2 speedrun time. It’s incredibly rewarding to see the idea validated and adopted by the community. Huge thanks to everyone who experimented with it and pushed it forward 🚀 #L42" target="_blank" rel="nofollow noopener">github.com/karpathy/nanoc…

English

164

27.1K

Keşfet

@AIatMeta @BingchenZhao @winnieyangwn @j_foerst @jeffclune @MinqiJiang @smdvln @rybolos