Dan Friedman

107 posts

Dan Friedman

@danfriedman0

PhD student @princeton_nlp

Katılım Eylül 2020

300 Takip Edilen814 Takipçiler

Sabitlenmiş Tweet

Dan Friedman@danfriedman0·16 Tem

How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constructed a Transformer that implements the classic ELIZA chatbot algorithm (with @Abhishek_034 and @danqi_chen). Paper: arxiv.org/abs/2407.10949 (1/6)

English

143

12.9K

Dan Friedman retweetledi

Princeton PLI@PrincetonPLI·8 May

In a new blog post, @HowardYen1 and @xiye_nlp introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs. Read now: pli.princeton.edu/blog/2025/long…

English

3.8K

Dan Friedman retweetledi

Michael Hu@michahu8·27 Şub

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

English

124

931

132.7K

Dan Friedman retweetledi

Alex Wettig@_awettig·18 Şub

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

English

210

49.4K

Dan Friedman retweetledi

Simon Park@parksimon0808·8 Oca

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… @Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora

English

19.4K

Dan Friedman retweetledi

Tianyu Gao@gaotianyu1350·6 Oca

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956

English

194

27.4K

Dan Friedman retweetledi

John Hewitt@johnhewtt·26 Kas

I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding

English

154

875

106.7K

Dan Friedman retweetledi

Xi Ye@xiye_nlp·21 Kas

🔔 I'm recruiting multiple fully funded MSc/PhD students @UAlberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

English

159

520

69.8K

Dan Friedman retweetledi

Griffiths Computational Cognitive Science Lab@cocosci_lab·18 Kas

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/9780262049412/…

Griffiths Computational Cognitive Science Lab tweet media

English

447

2.3K

176.2K

Dan Friedman retweetledi

Tom McCoy@RTomMcCoy·15 Kas

🤖🧠 I'll be considering applications for postdocs & PhD students to start at Yale in Fall 2025! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! Postdoc link: rtmccoy.com/prospective_po… PhD link: rtmccoy.com/prospective_st…

English

337

40.1K

Dan Friedman retweetledi

Angelina Wang @angelinawang.bsky.social@ang3linawang·3 Kas

I am recruiting PhD students for Fall 2025 at Cornell Tech! If you are interested in topics relating to machine learning fairness, algorithmic bias, or evaluation, apply and mention my name in your application: infosci.cornell.edu/phd/admissions Also, go vote!

Angelina Wang @angelinawang.bsky.social tweet media

English

231

924

105.5K

Dan Friedman retweetledi

Aaron Mueller@amuuueller·30 Eki

I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models). Details below!

English

184

710

63.1K

Dan Friedman retweetledi

Abhishek Panigrahi@Abhishek_034·11 Eki

Progressive distillation, where a student model learns from multiple checkpoints of the teacher, has been shown to improve the student–but why? We show it induces an implicit curriculum that accelerates training. Work w @BingbinL, @SadhikaMalladi, @risteski_a, @SurbhiGoel_

English

19.7K

Dan Friedman retweetledi

Tom McCoy@RTomMcCoy·10 Eki

🤖🧠NOW OUT IN PNAS🧠🤖 Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29 In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do pnas.org/doi/10.1073/pn… Major updates since the preprint! 1/n

English

357

54.1K

Dan Friedman retweetledi

Akshara Prabhakar@aksh_555·7 Eki

🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 arxiv.org/abs/2407.01687 1/n

English

308

77.3K

Dan Friedman retweetledi

John Yang@jyangballin·7 Eki

We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ @_carlosejimenez 🧵

English

269

52.4K

Dan Friedman retweetledi

Tianyu Gao@gaotianyu1350·4 Eki

Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): shorturl.at/JnBHD ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a Here is a story of how we arrived there

English

197

55.9K

Dan Friedman retweetledi

Tianyu Gao@gaotianyu1350·22 Tem

Meet ProLong, a Llama-3 based long-context chat model! huggingface.co/princeton-nlp/… (64K here, 512K coming soon) ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks.

English

139

21.2K

Dan Friedman retweetledi

Mengzhou Xia@xiamengzhou·19 Tem

🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀 SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity. ✨ Recipe: on-policy data annotated by a strong reward model + SimPO 💪 Strong performance on chat benchmarks (i.e., AlpacaEval 2, Arena-Hard and WildBench) 📈 Retains GSM8K and MMLU scores in ZeroEval 🔢 Understands that 9.11 is bigger than 9.8 🔗 More details at #gemma" target="_blank" rel="nofollow noopener">github.com/princeton-nlp/… 🔬 Through extensive experiments, we find that - gemma-2-9b-it exhibits significantly less catastrophic forgetting than Llama-3-8b-Instruct during fine-tuning and is more robust to different learning rates - With a small learning rate, both DPO and SimPO can improve math domains - SimPO has large gains over DPO when the SFT model is weaker, or the PO data is noisy. The gap is reduced when the model and data quality improve. - We also made several major updates to our preprint, added more baselines (i.e., RRHF, SLiC-HF, and CPO), conducted KL divergence analysis since SimPO has no regularization, and investigated adding an additional SFT term. 🌟 More insights in our preprint: arxiv.org/abs/2405.14734. And we welcome feedback and look forward to discussions! Joint work with @yumeng0818 and @danqi_chen. And Many thanks to @yanndubs @billyuchenlin @infwinston @LiTianleli for maintaining the amazing benchmarks!

English

176

42.1K

Dan Friedman retweetledi

Tianyu Gao@gaotianyu1350·16 Tem

If you are attending ICML this year, stop by our workshop on long-context foundation models! Schedule: longcontextfm.github.io/schedule/ Also, RSVP for our social event with our sponsor @togethercompute on July 24: lu.ma/9fctiq9k 🥳

English

218

40.3K

Dan Friedman@danfriedman0·16 Tem

@BingbinL I’ll also have a poster about this project at the ICML Mechanistic Interpretability workshop in Vienna next week (icml2024mi.pages.dev) and would love to chat there 🙂

English

230

Dan Friedman@danfriedman0·16 Tem

@BingbinL We think ELIZA raises a number of possibilities for future work, including as a benchmark for automated interpretability methods, and as a setting for mechanistic analysis of learning dynamics. (6/6) Paper: arxiv.org/abs/2407.10949 Data and code: github.com/princeton-nlp/…

English

238

Dan Friedman@danfriedman0·16 Tem

English

143

12.9K

Keşfet

@HowardYen1 @xiye_nlp @Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora @UAlberta