Kaizhao Liang

3.8K posts

Kaizhao Liang

@KyleLiang5

@MicrosoftAI, ex @SambaNovaAI, PhD student @UTCompSci, working on optimizers and neural architectures, alumni @IllinoisCDS

Redmond, Seattle Katılım Aralık 2018

101 Takip Edilen668 Takipçiler

Sabitlenmiş Tweet

Kaizhao Liang@KyleLiang5·26 Kas

TLDR: 1⃣ line modification, satisfaction (theoretically and empirically) guaranteed 😀😀😀 Core idea: 🚨Do not update if you are not sure 👨‍💻github.com/kyleliang919/C… 🤗huggingface.co/papers/2411.16… 📚arxiv.org/abs/2411.16085 @cranialxix @lqiang67 @Tim38463182

English

257

54.5K

Kaizhao Liang retweetledi

Satya Nadella@satyanadella·3d

Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports.

English

421

509

4.2K

1.4M

Kaizhao Liang retweetledi

Jia-Bin Huang@jbhuang0604·26 Mar

A great example that medium shapes impact. A research paper on arXiv 11 months ago: 👉 2 citations so far An accessible blog post one day ago: 👉 12 M views, instant community adoption

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

1.1K

121.1K

Kaizhao Liang retweetledi

Lucas Maes@lucasmaes_·23 Mar

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English

101

540

3.9K

897.6K

Kaizhao Liang retweetledi

elie@eliebakouch·24 Mar

whaaaaaat microsoft ai just poached part of ai2 leadership team

English

304

41.1K

Kaizhao Liang@KyleLiang5·23 Mar

If I invest 1 billion in OAI now, in the event of liquidation my preferred shares would be worth 1.175 billion at minimum regardless of the final stock price. 🤯🤯🤯 All the common stock holders 😢😢😢

Andrew Curran@AndrewCurran_

OpenAl is offering private-equity firms a guaranteed minimum return of 17.5%, as well as early access to models not yet in public release.

English

138

Kaizhao Liang retweetledi

Yuchen Jin@Yuchenj_UW·18 Mar

OpenAI just dropped a training challenge: Train a <16MB language model in 10 minutes on 8×H100s and minimize held-out loss on a fixed FineWeb dataset. Basically NanoGPT Speedrun. They’re sponsoring $1M in compute. I can summon my autoresearch army to win it… if I have time.

English

1.3K

110.2K

Kaizhao Liang retweetledi

Percy Liang@percyliang·18 Mar

In Marin, we are trying to get really good at scaling laws. We have trained models up to 1e22 FLOPs and have made a prediction of the loss at 1e23 FLOPs, which @WilliamBarrHeld is running. This prediction is preregistered on GitHub, so we'll see in a few days how accurate our prediction was. What we want is not just a single model but a training recipe that scales reliably.

English

469

76.1K

Kaizhao Liang retweetledi

Felix Rieseberg@felixrieseberg·17 Mar

We're shipping a new feature in Claude Cowork as a research preview that I'm excited about: Dispatch! One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. To try it out, download Claude Desktop, then pair your phone.

English

973

1.5K

17.4K

6.2M

Kaizhao Liang@KyleLiang5·17 Mar

Everyday this meme becomes more and more relevant

English

Kaizhao Liang@KyleLiang5·16 Mar

If you look at the linear layer, both fwd and bwd are linear attentions. Fwd it’s retrieving with activations, and bwd it’s retrieving with err signal (loss gradient) The symmetry is beautiful.

Andrej Karpathy@karpathy

@Yulun_Du @ilyasut SGD is a ResNet too (the blocks of it are fwd+bwd), the residual stream is the weights so... 🤔 We're not taking the Attention is All You Need part literally enough? :D

English

276

Kaizhao Liang retweetledi

Claude@claudeai·13 Mar

1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.

English

1.2K

25.2K

5.6M

Kaizhao Liang@KyleLiang5·13 Mar

This is arguably the first flapping of the wings of the Butterfly

Polymarket@Polymarket

BREAKING: New startup "RentAHuman" allows AI agents to rent humans to perform tasks they cannot physically perform themselves.

English

123

Kaizhao Liang@KyleLiang5·13 Mar

I was shocked that they rejected the author of Muon based on personal feeling back then. Telling people their research idea will never work in a 15 min convo was also kind of crazy. 🤪

Elon Musk@elonmusk

Many talented people over the past few years were declined an offer or even an interview @xAI. My apologies. @BarisAkis and I are going through the company interview history and reaching back out to promising candidates.

English

336

Kaizhao Liang@KyleLiang5·13 Mar

The story basically tells us it’s not easy to build an actually useful model without good data unless distillation.

Lisan al Gaib@scaling01

You had one job Meta: - take the DeepSeek recipe - scale the recipe to 5T params - train it with your bazillions of H100s and unlimited social media data - RL until your staff is burnt out from babysitting the runs - distill into cuter 30B, 100B and 500B models - profit

English

418

Kaizhao Liang retweetledi

Shuangfei Zhai@zhaisf·12 Mar

Say hi to Exclusive Self Attention (XSA), a (nearly) free improvement to Transformers for LM. Observation: for y = attn(q, k, v), yᵢ and vᵢ tend to have a very high cosine similarity Fix: exclude vᵢ from yᵢ via zᵢ = yᵢ - (yᵢᵀvᵢ)vᵢ/‖vᵢ‖² Result: better training/val loss across model sizes; increasing gains as sequence length grows. See more: arxiv.org/abs/2603.09078

English

944

214.9K

Kaizhao Liang retweetledi

sui ☄️@birdabo·12 Mar

years of not learning Excel is finally paying off.

Claude@claudeai

Claude for Excel and Claude for PowerPoint now sync together seamlessly. When you’ve got more than one file open, Claude shares the full context of your conversation between them. Pull data from spreadsheets, build out tables, and update a deck — without re-explaining a step.

English

352

5.7K

653.3K

Kaizhao Liang retweetledi

Saining Xie@sainingxie·10 Mar

i’m joining forces with @ylecun and an incredible group of people to start AMI Labs @amilabs. AMI isn’t a conventional lab. we don’t intend to become one. a lot to say about why this moment matters, but for now we’re heads down building. join us: amilabs.xyz

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

153

161

2.8K

474K

Kaizhao Liang retweetledi

Satya Nadella@satyanadella·9 Mar

Announcing Copilot Cowork, a new way to complete tasks and get work done in M365. When you hand off a task to Cowork, it turns your request into a plan and executes it across your apps and files, grounded in your work data and operating within M365’s security and governance boundaries.

English

2.3K

2.1K

16.7K

9.8M

Kaizhao Liang retweetledi

Andrej Karpathy@karpathy·8 Mar

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

529

715

7.6K

1.1M

Kaizhao Liang retweetledi

Andrej Karpathy@karpathy·7 Mar

(I still have the bigger cousin running on prod nanochat, working a bigger model and on 8XH100, which looks like this now. I'll just leave this running for a while...)

English

2.1K

422.3K

Keşfet

@WilliamBarrHeld @ylecun @amilabs @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates