David Wadden

53 posts

David Wadden

David Wadden

@davidjwadden

Graduate student at @uwcse studying NLP.

Katılım Mayıs 2020
98 Takip Edilen387 Takipçiler
David Wadden retweetledi
Kyle Lo
Kyle Lo@kylelostat·
@soldni and I are arriving to #ACL2024 🇹🇭today! come find us at our talks & poster sessions for our OLMo & Dolma projects with @allen_ai & frens 🤩 also dont miss our poster on KIWI🥝for interactive science QA w/ our intern @brunchavecmoi & mentors @eunsolc @davidjwadden
Kyle Lo tweet media
English
0
11
66
12.6K
David Wadden retweetledi
Yuling Gu
Yuling Gu@gu_yuling·
LLMs are evaluated on the same tasks in so many different ways! 🤯 ✨ We introduce OLMES – a standard for reproducible LLM evaluations that is open, practical, completely documented, and can be applied to current leaderboards & eval code bases! ✨ 📜 arxiv.org/abs/2406.08446 1/
Yuling Gu tweet media
English
3
23
137
16.7K
David Wadden retweetledi
Kejian Shi
Kejian Shi@shi_kejian·
Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:
Kejian Shi tweet media
English
1
30
103
22.1K
David Wadden retweetledi
Ai2
Ai2@allen_ai·
Looking for a dataset to enhance language model instruction-following over scientific literature? Introducing SciRIFF, a dataset of 137K expert-written demonstrations spanning 5 essential task categories for literature understanding: information extraction, summarization, question answering, claim verification, and classification. Download on HuggingFace: huggingface.co/collections/al…
Ai2 tweet media
English
1
35
150
20.1K
David Wadden retweetledi
Hanna Hajishirzi
Hanna Hajishirzi@HannaHajishirzi·
Introducing our best OLMo yet. OLMo 1.7-7B outperforms LLaMa2-7B, approaching LLaMa2-13B at MMLU and GSM8k. High-quality data and staged training are key. I am so proud of our team making such significant improvement in a short period after our first release.
Hanna Hajishirzi tweet media
Ai2@allen_ai

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog: blog.allenai.org/olmo-1-7-7b-a-…

English
2
14
107
13.8K
David Wadden retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models. 1. We evaluated 30+ of the currently available RMs (w/ DPO too). 2. We created new datasets covering chat, safety, code, math, etc. We learned a lot. We hope this is a major step in understanding why reward models work, rather than just how they do for RLHF. In short, we created pairs of responses, one good one bad (with manual review) and see where reward models agree! It's a simple and powerful process. Key takeaways: * Running reward models is hard, we build infra to make this easier. * We're already using this to learn more about PPO RLHF training (more on this soon). * Reward models mirror the refusals behavior we're confused about in RLHF. Some refuse everything (including llama 2 style stuff), some refuse nothing, and few models handle both cases well. * Datasets like Anthropic HH / Learning to Summarize only take us so far (and don't work for DPO) * Scaling matters (big models win again) Here's the current leaderboard: I'm very excited about future work. Figuring out what values are reflected, generative RMs, better RMs for training, more on DPO, and everything in between. Links! Leaderboard: huggingface.co/spaces/allenai… Code: github.com/allenai/reward… Paper (arxiv soon): github.com/allenai/reward… Eval dataset: huggingface.co/datasets/allen… YouTube walkthrough: youtu.be/CAaHAfCqrBA
YouTube video
YouTube
Nathan Lambert tweet mediaNathan Lambert tweet mediaNathan Lambert tweet media
English
111
150
445
112.6K
David Wadden retweetledi
Fangyuan Xu
Fangyuan Xu@brunchavecmoi·
Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task? We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.
Fangyuan Xu tweet media
English
3
44
211
30.4K
David Wadden retweetledi
Semantic Scholar Research @ AI2
Semantic Scholar Research @ AI2@ai2_s2research·
📣 Job opportunities at Semantic Scholar Research @ the Allen Institute for AI (AI2) for post-doctoral & pre-doctoral researchers starting in 2024! 📣 Our team works on NLP and HCI research with a focus on open LLMs and LLM-powered research support tools and assistants.
Semantic Scholar Research @ AI2 tweet media
English
3
25
60
34.4K
David Wadden retweetledi
Ai2
Ai2@allen_ai·
OLMo is here! And it’s 100% open. It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…
GIF
English
28
332
1.4K
358.1K
David Wadden retweetledi
Yanai Elazar
Yanai Elazar@yanaiela·
This is fantastic news!! Somewhat of a coincidence, but our paper that studies the effect of early arxiving on acceptance that suggested this effect is small and that it does not fill its purpose was accepted to CLeaR (Causal Learning and Reasoning) 2024 twitter.com/yanaiela/statu…
Graham Neubig@gneubig

ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged. aclweb.org/adminwiki/imag…

English
0
2
34
3.3K
David Wadden retweetledi
Semantic Scholar
Semantic Scholar@SemanticScholar·
New feature alert 🚨On each paper page, scroll down to find AI-generated Topic pages related to the paper, which include topic definitions, papers most cited for the topic, and more! Now available for Computer Science fields. Here’s an example: semanticscholar.org/paper/SPECTER%…
English
1
21
62
9.9K
David Wadden retweetledi
Hamish Ivison
Hamish Ivison@hamishivi·
Check out the Tulu 2 suite 🐪, a set of Llama-2 models finetuned+DPO-trained on a mixture of publicly available datasets! Our best-performing models are competitive with SoTA open models on a range of benchmarks incl. AlpacaEval and MT-Bench. 📜Paper: arxiv.org/abs/2311.10702
Hamish Ivison tweet media
Hamish Ivison@hamishivi

Check out our new 70B DPO model here: huggingface.co/allenai/tulu-2… AFAIK currently the best model on AlpacaEval with a public finetuning set! More details once the AI sphere calms down a bit... 😅

English
2
31
131
75.1K
David Wadden retweetledi
Orion Weller @ ICLR’26
Orion Weller @ ICLR’26@orionweller·
Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going 📈 But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more 📉 🚨 📝 (arxiv soon): orionweller.github.io/assets/pdf/LLM…
Orion Weller @ ICLR’26 tweet media
English
3
47
143
24.9K
David Wadden retweetledi
Daniel Weld
Daniel Weld@dsweld·
Interested in a better way to explore #VLDB2023 papers? Try exp-sum.apps.allenai.org for an LLM-powered way to probe those papers… * Ask questions w/ a single click * Explore answer provenance using the ending “ * Dive deep w/ recursive questions Powered by @SemanticScholar
English
1
19
41
21.6K
David Wadden retweetledi
Ashish Sharma
Ashish Sharma@sharma_ashish_2·
Absolutely thrilled🎉 to receive the @aclmeeting #ACL2023NLP 🏆Outstanding Paper Award🏆 for our work on cognitive reframing of negative thoughts! A huge shoutout to the diverse team behind this work @uwcse @uwnlp @MentalHealthAm and @StanfordHealth 💖
Ashish Sharma@sharma_ashish_2

Each time a paper gets rejected, you can't help but think "I'll never succeed as a researcher" Such negative thoughts are normal, but how can we overcome them? Our #ACL2023 📰 studies Human-LM Interaction for Cognitive Reframing of Negative Thoughts arxiv.org/abs/2305.02466 🧵

English
13
16
154
20K
David Wadden retweetledi
Yanai Elazar
Yanai Elazar@yanaiela·
Does arXiving have a casual effect on acceptance? The answer is nuanced, and depends on what assumptions you are willing to make, but arguably more importantly, we observe no difference in acceptance for different groups. arxiv.org/abs/2306.13891
Yanai Elazar tweet media
English
5
37
193
50.4K
David Wadden retweetledi
Yizhong Wang
Yizhong Wang@yizhongwyz·
🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT? 🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B! 📜arxiv.org/abs/2306.04751
Yizhong Wang tweet media
English
10
152
612
287.4K
David Wadden retweetledi
Ai2
Ai2@allen_ai·
Today we're thrilled to announce our new undertaking to collaboratively build the best open language model in the world: AI2 OLMo. Uniquely open, 70B parameters, coming early 2024 – join us! blog.allenai.org/announcing-ai2…
English
31
180
628
237.1K