Robin Jia

352 posts

Robin Jia banner
Robin Jia

Robin Jia

@robinomial

Assistant Professor @CSatUSC | Previously Visiting Researcher @facebookai | Stanford CS PhD @StanfordNLP

Los Angeles, CA Katılım Haziran 2018
925 Takip Edilen4.7K Takipçiler
Robin Jia retweetledi
Blaise Agüera (@blaiseaguera.bsky.social)
Just as single cells became multicellular life, 8B+ brains are now joining with AI to form a collective superintelligence. At @USC's Institute on Ethics and Trust in Computing inaugural summit, @robinomial, Jinchi Lv, @paria_rd and I discussed navigating this transition.
Blaise Agüera (@blaiseaguera.bsky.social) tweet mediaBlaise Agüera (@blaiseaguera.bsky.social) tweet mediaBlaise Agüera (@blaiseaguera.bsky.social) tweet media
English
1
2
27
1.9K
Robin Jia retweetledi
Ai2
Ai2@allen_ai·
Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵
Ai2 tweet media
English
13
57
403
84.6K
Robin Jia retweetledi
Ryan Yixiang Wang
Ryan Yixiang Wang@RyanYixiang·
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Ryan Yixiang Wang tweet media
Ai2@allen_ai

Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors. EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵

English
7
73
527
111.3K
Robin Jia retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
Glad to share that this paper is accepted to #ICML 2026 @icmlconf with an updated title "Transformers Provably Learn Algorithmic Solutions for Graph Connectivity, But Only with the Right Data". 🥳
Deqing Fu@DeqingFu

Why do Transformers fail at algorithmic reasoning? We find it's not a lack of power, but a capacity mismatch. Our new preprint proves a tight, non-asymptotic bound: an L-layer model can only solve graph connectivity on graphs with a diameter up to exactly 3^L. arxiv.org/abs/2510.19753 🧵(1/N)

English
2
3
32
3.6K
Robin Jia retweetledi
Yuqing Yang
Yuqing Yang@yyqcode·
🧵 1/8 What should an LLM assistant remember across conversations? Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem. Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction. 📄 Paper: arxiv.org/abs/2604.11610
Yuqing Yang tweet media
English
6
16
50
13.5K
Robin Jia retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
After three papers on Fourier features in LLMs, I think there's a principle worth naming. How should we do science on an LLM? It corresponds to the existential questions: > who am I? ↔ the phenomenon. > where do I come from? ↔ the emergence. > where am I going? ↔ the use. 🧵
English
103
173
3.7K
5.2M
Robin Jia retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
New paper: Convergent Evolution: How Different Language Models Learn Similar Number Representations. Language models, classical word embeddings, and even raw token frequencies all develop the same Fourier features for numbers. But only some develop the underlying structure. 🧵
Deqing Fu tweet media
English
2
22
107
45.2K
Robin Jia
Robin Jia@robinomial·
Excited to announce that Hubble, our new language model suite for studying LLM memorization, was recently featured in @ScienceMagazine ! Hubble has also received an oral presentation slot at ICLR; if you're there, check out @johntzwei and @Aflah02101 's presentation on Saturday!
English
3
12
90
6.3K
Robin Jia retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Hi all, I am going to Rio for ICLR! If you are interested in AI safety, governance, reducing bad model behaviors, I would like to talk to you! My expertise is in statistics, law, and LLM pretraining and memorization.
English
0
1
31
1.5K
Robin Jia retweetledi
Tim Dettmers
Tim Dettmers@Tim_Dettmers·
We in the quantization community could quickly see this and were flabbergastered by the response to TurboQuant. Whenever I saw TurboQuant on my timeline, I found it hurtful, because the work of other academics who worked so hard was discounted.
English
9
12
236
19.4K
Robin Jia retweetledi
Robin Jia retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Hi all, I wrote a Claude code tutorial for ML researchers who have never done SWE in their life: sunny-goal-aba.notion.site/claude-code-tu… I never learned SWE myself, so maybe there are others in the same boat. This is NOT just tips on how to write CLAUDE.md. 70% of my notes are on SWE principles
English
2
5
34
1.8K
Robin Jia retweetledi
Robin Jia retweetledi
Qingchuan Yang
Qingchuan Yang@qcyang20xx·
𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 has had the same problem for a while: privacy, quality, or efficiency - pick two 😵‍💫 We think 𝐄𝐏𝐒𝐕𝐞𝐜 changes that 🚀 Paper: arxiv.org/abs/2602.21218
Qingchuan Yang tweet media
English
1
5
17
6.6K
Robin Jia retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
How might the law hold AI accountable? How can we promote the development of responsible AI? The copyright challenge to AI reveals some clues, and I gave my perspective in a recent talk @stanfordnlp: youtu.be/9_I--Qg3_cA?si… Feel free to reach out if you have questions!
YouTube video
YouTube
English
1
2
15
1.3K
Robin Jia retweetledi
Qinyuan Ye
Qinyuan Ye@qinyuan_ye·
Now accepted to ICLR 2026! Looking back, stepping into mechanistic interpretability in my final PhD year was such a risky bet. But it turned out to be very rewarding and I enjoyed every bit of it. (Working on a blog post to share this winding journey...)
Qinyuan Ye@qinyuan_ye

1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵

English
1
4
80
9.2K
Robin Jia retweetledi
Deqing Fu
Deqing Fu@DeqingFu·
Fourier Number Embedding (FoNE) is accepted to #ICLR2026. Super excited! Check it out here: fouriernumber.github.io
Deqing Fu@DeqingFu

In our recent NeurIPS 2024 paper (openreview.net/forum?id=i4Mut…), we find pretrained LLMs use Fourier Features to add numbers (some called it helix recently). Is this representation truly powerful that LLMs naturally prefer it? Introducing FoNE (Fourier Number Embedding): one token is all you need to encode any number, precisely. 🖇️Blog post: fouriernumber.github.io

English
0
4
22
2.2K