Dan Friedman

107 posts

Dan Friedman

Dan Friedman

@danfriedman0

PhD student @princeton_nlp

Katılım Eylül 2020
300 Takip Edilen814 Takipçiler
Sabitlenmiş Tweet
Dan Friedman
Dan Friedman@danfriedman0·
How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constructed a Transformer that implements the classic ELIZA chatbot algorithm (with @Abhishek_034 and @danqi_chen). Paper: arxiv.org/abs/2407.10949 (1/6)
Dan Friedman tweet media
English
4
30
143
12.9K
Dan Friedman retweetledi
Michael Hu
Michael Hu@michahu8·
Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇
Michael Hu tweet media
English
23
124
931
132.7K
Dan Friedman retweetledi
Alex Wettig
Alex Wettig@_awettig·
🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N
Alex Wettig tweet media
English
5
58
210
49.4K
Dan Friedman retweetledi
Tianyu Gao
Tianyu Gao@gaotianyu1350·
Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956
Tianyu Gao tweet media
English
4
43
194
27.4K
Dan Friedman retweetledi
John Hewitt
John Hewitt@johnhewtt·
I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding
English
18
154
875
106.7K
Dan Friedman retweetledi
Xi Ye
Xi Ye@xiye_nlp·
🔔 I'm recruiting multiple fully funded MSc/PhD students @UAlberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!
English
15
159
520
69.8K
Dan Friedman retweetledi
Tom McCoy
Tom McCoy@RTomMcCoy·
🤖🧠 I'll be considering applications for postdocs & PhD students to start at Yale in Fall 2025! If you are interested in the intersection of linguistics, cognitive science, & AI, I encourage you to apply! Postdoc link: rtmccoy.com/prospective_po… PhD link: rtmccoy.com/prospective_st…
Tom McCoy tweet media
English
3
78
337
40.1K
Dan Friedman retweetledi
Angelina Wang @angelinawang.bsky.social
I am recruiting PhD students for Fall 2025 at Cornell Tech! If you are interested in topics relating to machine learning fairness, algorithmic bias, or evaluation, apply and mention my name in your application: infosci.cornell.edu/phd/admissions Also, go vote!
Angelina Wang @angelinawang.bsky.social tweet media
English
15
231
924
105.5K
Dan Friedman retweetledi
Aaron Mueller
Aaron Mueller@amuuueller·
I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models). Details below!
Aaron Mueller tweet media
English
11
184
710
63.1K
Dan Friedman retweetledi
Abhishek Panigrahi
Abhishek Panigrahi@Abhishek_034·
Progressive distillation, where a student model learns from multiple checkpoints of the teacher, has been shown to improve the student–but why? We show it induces an implicit curriculum that accelerates training. Work w @BingbinL, @SadhikaMalladi, @risteski_a, @SurbhiGoel_
Abhishek Panigrahi tweet media
English
2
25
92
19.7K
Dan Friedman retweetledi
Tom McCoy
Tom McCoy@RTomMcCoy·
🤖🧠NOW OUT IN PNAS🧠🤖 Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29 In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do pnas.org/doi/10.1073/pn… Major updates since the preprint! 1/n
Tom McCoy tweet media
English
9
81
357
54.1K
Dan Friedman retweetledi
Akshara Prabhakar
Akshara Prabhakar@aksh_555·
🤖 NEW PAPER 🤖 Chain-of-thought reasoning (CoT) can dramatically improve LLM performance Q: But what *type* of reasoning do LLMs use when performing CoT? Is it genuine reasoning, or is it driven by shallow heuristics like memorization? A: Both! 🔗 arxiv.org/abs/2407.01687 1/n
Akshara Prabhakar tweet media
English
6
46
308
77.3K
Dan Friedman retweetledi
John Yang
John Yang@jyangballin·
We're launching SWE-bench Multimodal to eval agents' ability to solve visual GitHub issues. - 617 *brand new* tasks from 17 JavaScript repos - Each task has an image! Existing agents struggle here! We present SWE-agent Multimodal to remedy some issues Led w/ @_carlosejimenez 🧵
John Yang tweet media
English
8
58
269
52.4K
Dan Friedman retweetledi
Tianyu Gao
Tianyu Gao@gaotianyu1350·
Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): shorturl.at/JnBHD ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a Here is a story of how we arrived there
Tianyu Gao tweet media
English
5
46
197
55.9K
Dan Friedman retweetledi
Tianyu Gao
Tianyu Gao@gaotianyu1350·
Meet ProLong, a Llama-3 based long-context chat model! huggingface.co/princeton-nlp/… (64K here, 512K coming soon) ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks.
Tianyu Gao tweet media
English
4
24
139
21.2K
Dan Friedman retweetledi
Mengzhou Xia
Mengzhou Xia@xiamengzhou·
🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀 SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity. ✨ Recipe: on-policy data annotated by a strong reward model + SimPO 💪 Strong performance on chat benchmarks (i.e., AlpacaEval 2, Arena-Hard and WildBench) 📈 Retains GSM8K and MMLU scores in ZeroEval 🔢 Understands that 9.11 is bigger than 9.8 🔗 More details at #gemma" target="_blank" rel="nofollow noopener">github.com/princeton-nlp/… 🔬 Through extensive experiments, we find that - gemma-2-9b-it exhibits significantly less catastrophic forgetting than Llama-3-8b-Instruct during fine-tuning and is more robust to different learning rates - With a small learning rate, both DPO and SimPO can improve math domains - SimPO has large gains over DPO when the SFT model is weaker, or the PO data is noisy. The gap is reduced when the model and data quality improve. - We also made several major updates to our preprint, added more baselines (i.e., RRHF, SLiC-HF, and CPO), conducted KL divergence analysis since SimPO has no regularization, and investigated adding an additional SFT term. 🌟 More insights in our preprint: arxiv.org/abs/2405.14734. And we welcome feedback and look forward to discussions! Joint work with @yumeng0818 and @danqi_chen. And Many thanks to @yanndubs @billyuchenlin @infwinston @LiTianleli for maintaining the amazing benchmarks!
Mengzhou Xia tweet mediaMengzhou Xia tweet mediaMengzhou Xia tweet mediaMengzhou Xia tweet media
English
8
40
176
42.1K
Dan Friedman
Dan Friedman@danfriedman0·
@BingbinL I’ll also have a poster about this project at the ICML Mechanistic Interpretability workshop in Vienna next week (icml2024mi.pages.dev) and would love to chat there 🙂
English
0
0
3
230
Dan Friedman
Dan Friedman@danfriedman0·
How can we understand neural chatbots in terms of interpretable, symbolic mechanisms? To explore this question, we constructed a Transformer that implements the classic ELIZA chatbot algorithm (with @Abhishek_034 and @danqi_chen). Paper: arxiv.org/abs/2407.10949 (1/6)
Dan Friedman tweet media
English
4
30
143
12.9K