Xiao Liang

203 posts

Xiao Liang banner
Xiao Liang

Xiao Liang

@MasterVito0601

Ph.D. student @UCLA, @uclanlp. Research Intern @MSFTResearch. LLMs, RL. Prev. @Tsinghua_Uni.

Beijing, China Katılım Ekim 2020
624 Takip Edilen208 Takipçiler
Xiao Liang retweetledi
Jenny Zhang
Jenny Zhang@jennyzhangzt·
Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).
Jenny Zhang tweet media
English
125
514
2.9K
247.8K
Xiao Liang retweetledi
Rosinality
Rosinality@rosinality·
Updating reward model and policy together during RLHF in a per-batch manner, with active learning. It is possible here with model-based feedback. But how could it be implemented practically with actual human feedback?
Rosinality tweet media
English
5
22
128
8K
Xiao Liang retweetledi
Albert Gu
Albert Gu@_albertgu·
The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
Albert Gu tweet media
English
36
315
1.6K
420.5K
Xiao Liang retweetledi
Jainam Parmar
Jainam Parmar@aiwithjainam·
BREAKING: Claude can now research like a Stanford PhD student. Here are 9 insane Claude prompts that turn 40+ research papers into structured literature reviews, knowledge maps, and research gaps in minutes (Save this)
Jainam Parmar tweet media
English
101
1.4K
10.5K
3.9M
Xiao Liang retweetledi
Ahmad
Ahmad@TheAhmadOsman·
88 pages of gold for training MoEs Just got published yesterday, link below
Ahmad tweet media
English
12
122
893
35.1K
Xiao Liang retweetledi
DAIR.AI
DAIR.AI@dair_ai·
The Top AI Papers of the Week (March 1 - March 8) - NeuroSkill - ParamMem - Numina-Lean-Agent - Bayesian Teaching for LLMs - Auton Agentic AI Framework - Theory of Mind in Multi-Agent LLMs - Why LLMs Form Geometric Representations Read on for more:
DAIR.AI@dair_ai

x.com/i/article/2030…

English
6
35
187
44.2K
Xiao Liang retweetledi
Berryxia.AI
Berryxia.AI@berryxia·
兄弟们!🍌这个太顶了! Github 上的 Edit Banana这个开源项目太顶了!!! 今日登上热榜,直接斩获2800Star! 把AI生成的死图、流程图、架构图、PDF统计图、公式图,一键秒变完全可编辑的 DrawIO / SVG / PPTX! 使用SAM3精准分割 + 本地OCR + 多模态LLM,颜色、箭头、层级、LaTeX公式全1:1还原,随便拖拽改样式! 再也不用对着截图重做了,生产力直接起飞~ 兄弟们,地址见评论区! 快去试试,谁用谁香!
Berryxia.AI tweet media
中文
26
449
2.2K
136.9K
Xiao Liang retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1K
3.6K
28.2K
10.9M
Xiao Liang retweetledi
Junchen Liu
Junchen Liu@JunchenLiu77·
Continual learning and online adaptation are often framed as the next frontier of AI. 🚀 Modern architectures use Test-Time Training (TTT) to memorize key-value pairs on the fly via gradient descent, or so we thought. To test this memorization hypothesis, we replaced gradient descent with gradient ASCENT. It should destroy memorization. Instead... performance was preserved, or even slightly improved. 😱 It turns out, TTT with KV Binding is secretly linear attention! Site: research.nvidia.com/labs/sil/proje…
Junchen Liu tweet media
English
5
40
235
52.9K
Xiao Liang
Xiao Liang@MasterVito0601·
I also have a similar intuition about why long-CoT reasoning (e.g., DeepSeek R1) often outperforms earlier math fine-tuning approaches like Numina-Math in 2024. Long-CoT looks much closer to the distribution of pre-training data — with natural phrasing, spoken-like transitions (“wait”, “I think”, etc.), and less overly formal mathematical language. In contrast, many human-labeled math datasets use very polished, professional mathematical expressions that may drift away from the web-scale corpus used in pre-training. If long-CoT is generated from the model’s own pre-trained knowledge, its style may stay more aligned with the pre-training distribution — which might partly explain its stronger performance.
English
0
0
3
77
Xiao Liang
Xiao Liang@MasterVito0601·
Just finished my UCLA TOP oral exam for TA qualification. It feels surprisingly analogous to the result of this great paper 😆 I tried to memorize a script generated by GPT for my presentation, but it actually hurt my performance — the language wasn’t aligned with my natural speaking style, so I spent cognitive effort recalling the script instead of communicating the ideas. Similarly, if fine-tuning data deviates too much from the pre-training distribution, performance may degrade. Mixing back some “pre-training data” — in my case, my own language style — helps.
Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English
2
0
5
843
Xiao Liang
Xiao Liang@MasterVito0601·
@kothasuhas Thank you, Suhas. I really like this paper, as I have thought about this idea before. I think I will learn a lot from your great work~
English
0
0
1
12
Xiao Liang retweetledi
Harman Singh
Harman Singh@Harman26Singh·
Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇
English
13
62
369
78.1K
Xiao Liang retweetledi
Ted Zadouri
Ted Zadouri@tedzadouri·
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Ted Zadouri tweet media
English
7
132
782
220.8K
Xiao Liang retweetledi
DailyPapers
DailyPapers@HuggingPapers·
HACPO: Collaborative RL for Heterogeneous Agents Enables agents to share rollouts during training while executing independently at inference. Improves performance by 3.3% while using only half the rollout cost.
DailyPapers tweet media
English
3
6
21
1.9K
Xiao Liang retweetledi
Zara Zhang
Zara Zhang@zarazhangrui·
This OpenAI blog is a must-read. Don't ask what your agents can do for you. Ask what you can do for your agent openai.com/index/harness-…
Zara Zhang tweet media
English
52
195
1.7K
140.8K
Xiao Liang
Xiao Liang@MasterVito0601·
Very promising result. It’s surprising that mixing tokens from different modalities and sources can actually benefit each other. (20B VQA + 80B other tokens performs better than 100B pure VQA tokens). This makes me wonder whether future omni-models could eventually outperform pure language models on a wide range of tasks, like agentic ones. This also makes me wonder that could fewer mixed tokens already outperform full in-domain training? For instance, would 20B VQA + 30B mixed tokens beat 100B VQA😃.
Peter Tong@TongPetersb

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English
0
0
5
637
Xiao Liang retweetledi
Shizhe Diao
Shizhe Diao@shizhediao·
Time to upgrade your pretraining dataset. Instead of FineWeb-EDU / DCLM / X, try ClimbMix-400B. 📄 Paper: arxiv.org/pdf/2504.13161 📦 Data: huggingface.co/datasets/nvidi… CLIMBMix uses clustering-based iterative data mixture to improve pretraining efficiency and data quality. Would love to see the community experiment with it and push it further 🚀
Shizhe Diao@shizhediao

Nemotron-CLIMBMix is now becoming the default recipe in nanochat speedrun. During the Time-to-GPT-2 Leaderboard experiments started by @karpathy, the community revisited CLIMBMix and found that it delivers by far the single biggest improvement to nanochat’s GPT-2 speedrun time. It’s incredibly rewarding to see the idea validated and adopted by the community. Huge thanks to everyone who experimented with it and pushed it forward 🚀 #L42" target="_blank" rel="nofollow noopener">github.com/karpathy/nanoc…

English
3
15
164
27.1K