Deqing Fu

220 posts

Deqing Fu banner
Deqing Fu

Deqing Fu

@DeqingFu

PhD-ing @CSatUSC. Alum @UChicago, B.S. '20, M.S.' 22. Interpretability of LLM; DL Theory; NLP | prev research intern @MetaAI @Google

Los Angeles Katılım Kasım 2020
905 Takip Edilen871 Takipçiler
Deqing Fu retweetledi
Qingchuan Yang
Qingchuan Yang@qcyang20xx·
𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 has had the same problem for a while: privacy, quality, or efficiency - pick two 😵‍💫 We think 𝐄𝐏𝐒𝐕𝐞𝐜 changes that 🚀 Paper: arxiv.org/abs/2602.21218
Qingchuan Yang tweet media
English
1
5
12
2.7K
Deqing Fu retweetledi
Qinyuan Ye
Qinyuan Ye@qinyuan_ye·
Now accepted to ICLR 2026! Looking back, stepping into mechanistic interpretability in my final PhD year was such a risky bet. But it turned out to be very rewarding and I enjoyed every bit of it. (Working on a blog post to share this winding journey...)
Qinyuan Ye@qinyuan_ye

1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵

English
1
4
80
9.2K
Deqing Fu retweetledi
Stanford NLP Group
Stanford NLP Group@stanfordnlp·
Hi everyone! For this week's seminar, we are excited to host @johntzwei from USC! Title: The shape of AI accountability and its contours in copyright Abstract: How do we establish accountability for AI? While the shape of AI accountability at large remains amorphous, its contours are revealed in the ongoing copyright challenge to AI. In this talk, I’ll outline a legal theory of change and situate two works in this context. The first work focuses on the legal setup, theorizing how the judiciary can establish copyright accountability for LLMs by interrogating LLM training decisions and examining how they affect the model's memorization. Further progress in copyright then depends on deriving best practices for auditing and mitigating undesirable memorization. The second work focuses on scientific follow up and our release of Hubble, a model suite to advance the study of LLM memorization. Hubble models are trained on English but also with controlled insertions of text designed to emulate key memorization risks. I’ll summarize the main findings and conclude on the potential of controlled insertions for safety-critical concerns beyond copyright. Date and Time: Thursday, 01/29, 11:00AM — 12:00 PM PST. Zoom: stanford.zoom.us/j/93941842999?… Excited to see everyone at the seminar!
Stanford NLP Group tweet media
English
1
9
37
4.6K
Deqing Fu
Deqing Fu@DeqingFu·
Fourier Number Embedding (FoNE) is accepted to #ICLR2026. Super excited! Check it out here: fouriernumber.github.io
Deqing Fu@DeqingFu

In our recent NeurIPS 2024 paper (openreview.net/forum?id=i4Mut…), we find pretrained LLMs use Fourier Features to add numbers (some called it helix recently). Is this representation truly powerful that LLMs naturally prefer it? Introducing FoNE (Fourier Number Embedding): one token is all you need to encode any number, precisely. 🖇️Blog post: fouriernumber.github.io

English
0
4
22
2.2K
Deqing Fu
Deqing Fu@DeqingFu·
@JeffDean I can see 10 types of people in the comments: computer scientists and those who are not.
English
0
0
2
53
Deqing Fu retweetledi
Jakob Hansen
Jakob Hansen@_jakobhansen·
I'm pretty proud of this: We trained cross-layer transcoders for Qwen3 and built a dashboard for exploring the features using TDA-based graph visualizations.
Jakob Hansen tweet media
English
2
6
11
565
Deqing Fu retweetledi
BluelightAI
BluelightAI@bluelightai·
Today marks the first-ever release of Cross-Layer Transcoders for Qwen3. BluelightAI has trained CLTs for Qwen3-0.6B and 1.7B, creating an explorable set of interpretable features that capture how Qwen3 represents concepts and transforms information across its layers. The Qwen3 Explorer allows you to examine these features directly, identify structure in the model’s representations, and use this understanding to analyze behavior, diagnose failures, and guide adaptations of Qwen3-based systems.
English
1
7
24
435.1K
Deqing Fu retweetledi
Wang Bill Zhu
Wang Bill Zhu@BillJohn1235813·
Presenting VisualLens on Wednesday 11–2 #4804 at NeurIPS, with @DeqingFu. We show how personal photo libraries can power task-agnostic personalization, no domain-specific data needed. We'll talk about two new benchmarks for task-agnostic visual recommendation. Stop by to chat!
Wang Bill Zhu tweet media
English
0
2
5
979
Deqing Fu
Deqing Fu@DeqingFu·
I'll be at #NeurIPS2025 from Dec 2-7. Looking forward to meeting old friends and making new ones!
English
0
1
26
1.2K
Sanxing Chen
Sanxing Chen@sanxing_chen·
@deliprao @chrmanning Sad enough that the majority of citations today come from people who don’t actually seek and read the paper but rely on quick feed. If people didn’t encounter your paper early enough, they would pretend not knowing it and republish through a broken peer-review system.
English
1
0
2
112
Delip Rao e/σ
Delip Rao e/σ@deliprao·
This is a terrible trend followed by serial paper poasters, who do none of the work except convert ChatGPT summaries to tweet threads. While they do it, they will not credit the actual authors but name the big orgs (“new Deepmind paper” even if the work was done by the intern first author from a mid-tier university), hoping name-dropping the big orgs will give them more engagement. This robs the students involved of their hard-earned recognition. I have pointed this out to many of these people, including the person Lisa is quoting, and instead of addressing it, they just block me on Twitter, proving this is pure engagement grift. At this point, saying all this feels like an old man yelling at the sky.
Lisa Dunlap@lisabdunlap

So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful

English
10
8
136
23.1K
Deqing Fu retweetledi
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
40
130
47.8K
Deqing Fu
Deqing Fu@DeqingFu·
This provides a concrete path to more reliable models: carefully curate training data to match the model's non-asymptotic capacity. (13/N)
English
1
1
1
232
Deqing Fu
Deqing Fu@DeqingFu·
Why do Transformers fail at algorithmic reasoning? We find it's not a lack of power, but a capacity mismatch. Our new preprint proves a tight, non-asymptotic bound: an L-layer model can only solve graph connectivity on graphs with a diameter up to exactly 3^L. arxiv.org/abs/2510.19753 🧵(1/N)
Deqing Fu tweet media
English
1
9
42
58.2K