Seonglae Cho

135 posts

Seonglae Cho banner
Seonglae Cho

Seonglae Cho

@SeonglaeC

Mechanistic Interpretability | Holistic AI | UCL

London, England Katılım Şubat 2021
182 Takip Edilen42 Takipçiler
Seonglae Cho
Seonglae Cho@SeonglaeC·
Hackathon idea → MSc thesis → ICML 2026 main 🥳 CorrSteer: generation-time LLM steering via SAE features correlated with the target behavior. See you in Seoul 🇰🇷 seongland.com/article/corrst…
Seonglae Cho tweet media
English
1
0
1
51
Seonglae Cho retweetledi
Zhuoran Yang
Zhuoran Yang@zhuoran_yang·
[New paper on in-context learning] "In-Context Linear Regression Demystified" (link: arxiv.org/abs/2503.12734). Joint work @JLiangHe, @xintianpan, @siyuc3141. We establish a rather complete understanding of how one-layer multi-head attention solves in-context linear regression,
English
2
24
110
7.7K
Seonglae Cho
Seonglae Cho@SeonglaeC·
Pick a target venue, predict acceptance. Papers matched by rating 61K+ papers already ranked across 6 major venues. You can rebut if you disagree
English
0
0
0
23
Seonglae Cho
Seonglae Cho@SeonglaeC·
Gamification of academic paper discovery. Expecting papers to find their place as matches scale. confarena.com/arena
English
1
0
1
38
Seonglae Cho
Seonglae Cho@SeonglaeC·
Both live in structured agent environments. The next frontier may be limited open environments, such as translation
English
0
0
0
25
Seonglae Cho
Seonglae Cho@SeonglaeC·
Vibe coding feels like driving a car That’s why coding and driving are being solved together?
English
1
0
0
31
Seonglae Cho retweetledi
草薙 昭彦
草薙 昭彦@nagix·
1/4朝(協定世界時1/3深夜)の種別不明、2発の弾道ミサイル実験を速報値として追加。平壌付近からの発射で、最高高度50km、1発目は900km、2発目は950kmを飛行。防衛省発表では変則軌道で飛翔した可能性ありとのこと #北朝鮮ミサイル実験ビジュアライゼーション nagix.github.io/nk-missile-tes…
日本語
1
18
52
14.9K
Seonglae Cho
Seonglae Cho@SeonglaeC·
@NeelNanda5 Interesting work! Would you expect this real-time detection approach to work with IDTA as well?
English
0
0
0
180
Neel Nanda
Neel Nanda@NeelNanda5·
I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines
Oscar Balcells Obeso@OBalcells

Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.

English
22
112
1.6K
119.9K
Seonglae Cho
Seonglae Cho@SeonglaeC·
We believe scalable, interpretable SAE steering can improve both performance & safety.
Seonglae Cho tweet media
English
1
0
0
75
Seonglae Cho
Seonglae Cho@SeonglaeC·
🚀 New paper drop! We found that inference-time SAE features strongly correlate with correctness, enabling fully automated steering without manual tuning.
Seonglae Cho tweet media
English
1
1
5
147
Seonglae Cho
Seonglae Cho@SeonglaeC·
Beyond performance, CorrSteer is interpretable AI Control: it uncovers underlying capabilities that drive task performance. 🔒 For example, on HarmBench, in the LLaMA 8B model, safety-related features were extracted in most layers.
Seonglae Cho tweet media
English
0
0
0
37
Seonglae Cho
Seonglae Cho@SeonglaeC·
We ran extensive ablations: - Generation-token vs all-token pooling - Raw activation vs SAE activation - Mean vs Max strategies - Multi-layer vs single-layer steering … 👉 The key insight: generation-time token correlation drives performance.
Seonglae Cho tweet media
English
1
0
1
42