Seonglae Cho

135 posts

Seonglae Cho

@SeonglaeC

Mechanistic Interpretability | Holistic AI | UCL

London, England Katılım Şubat 2021

182 Takip Edilen42 Takipçiler

Seonglae Cho@SeonglaeC·16 May

Hackathon idea → MSc thesis → ICML 2026 main 🥳 CorrSteer: generation-time LLM steering via SAE features correlated with the target behavior. See you in Seoul 🇰🇷 seongland.com/article/corrst…

English

Seonglae Cho@SeonglaeC·1 Nis

@Fried_rice which stocks could be affected?

English

Chaofan Shou@Fried_rice·31 Mar

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

3.3K

7.6K

48.6K

35.6M

Seonglae Cho retweetledi

Zhuoran Yang@zhuoran_yang·26 Mar

[New paper on in-context learning] "In-Context Linear Regression Demystified" (link: arxiv.org/abs/2503.12734). Joint work @JLiangHe, @xintianpan, @siyuc3141. We establish a rather complete understanding of how one-layer multi-head attention solves in-context linear regression,

English

110

7.7K

Seonglae Cho@SeonglaeC·25 Şub

Pick a target venue, predict acceptance. Papers matched by rating 61K+ papers already ranked across 6 major venues. You can rebut if you disagree

English

Seonglae Cho@SeonglaeC·25 Şub

Gamification of academic paper discovery. Expecting papers to find their place as matches scale. confarena.com/arena

English

Seonglae Cho@SeonglaeC·13 Oca

Both live in structured agent environments. The next frontier may be limited open environments, such as translation

English

Seonglae Cho@SeonglaeC·13 Oca

Vibe coding feels like driving a car That’s why coding and driving are being solved together?

English

Seonglae Cho retweetledi

草薙昭彦@nagix·4 Oca

1/4朝(協定世界時1/3深夜)の種別不明、2発の弾道ミサイル実験を速報値として追加。平壌付近からの発射で、最高高度50km、1発目は900km、2発目は950kmを飛行。防衛省発表では変則軌道で飛翔した可能性ありとのこと #北朝鮮ミサイル実験ビジュアライゼーション nagix.github.io/nk-missile-tes…

日本語

14.9K

Seonglae Cho@SeonglaeC·15 Kas

Can't wait for #TheGreatAgentHack

English

Seonglae Cho@SeonglaeC·10 Eyl

@NeelNanda5 Interesting work! Would you expect this real-time detection approach to work with IDTA as well?

English

180

Neel Nanda@NeelNanda5·9 Eyl

I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines

Oscar Balcells Obeso@OBalcells

Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.

English

112

1.6K

119.9K

Seonglae Cho@SeonglaeC·19 Ağu

This is the paper link: arxiv.org/abs/2508.12535 I’m open to Researcher positions in London (offline), feel free to reach out 📩

English

Seonglae Cho@SeonglaeC·19 Ağu

We believe scalable, interpretable SAE steering can improve both performance & safety.

English

Seonglae Cho@SeonglaeC·19 Ağu

🚀 New paper drop! We found that inference-time SAE features strongly correlate with correctness, enabling fully automated steering without manual tuning.

English

147

Seonglae Cho@SeonglaeC·19 Ağu

Beyond performance, CorrSteer is interpretable AI Control: it uncovers underlying capabilities that drive task performance. 🔒 For example, on HarmBench, in the LLaMA 8B model, safety-related features were extracted in most layers.

English

Seonglae Cho@SeonglaeC·19 Ağu

We ran extensive ablations: - Generation-token vs all-token pooling - Raw activation vs SAE activation - Mean vs Max strategies - Multi-layer vs single-layer steering … 👉 The key insight: generation-time token correlation drives performance.

English

Keşfet

@Fried_rice @JLiangHe @xintianpan @siyuc3141 @NeelNanda5 @elonmusk @BarackObama @taylorswift13