David Wadden

53 posts

David Wadden

@davidjwadden

Graduate student at @uwcse studying NLP.

Katılım Mayıs 2020

98 Takip Edilen387 Takipçiler

David Wadden retweetledi

Abhilasha Ravichander@lasha_nlp·31 Oca

We are launching HALoGEN💡, a way to systematically study *when* and *why* LLMs still hallucinate. New work w/ @shrusti_ghela* @davidjwadden @YejinChoinka 💫 🧵 [1/n]

English

174

36K

David Wadden retweetledi

Kyle Lo@kylelostat·11 Ağu

@soldni and I are arriving to #ACL2024 🇹🇭today! come find us at our talks & poster sessions for our OLMo & Dolma projects with @allen_ai & frens 🤩 also dont miss our poster on KIWI🥝for interactive science QA w/ our intern @brunchavecmoi & mentors @eunsolc @davidjwadden

English

12.6K

David Wadden retweetledi

Yuling Gu@gu_yuling·18 Haz

LLMs are evaluated on the same tasks in so many different ways! 🤯 ✨ We introduce OLMES – a standard for reproducible LLM evaluations that is open, practical, completely documented, and can be applied to current leaderboards & eval code bases! ✨ 📜 arxiv.org/abs/2406.08446 1/

English

137

16.7K

David Wadden retweetledi

Kejian Shi@shi_kejian·14 Haz

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:

English

103

22.1K

David Wadden retweetledi

Ai2@allen_ai·14 Haz

Looking for a dataset to enhance language model instruction-following over scientific literature? Introducing SciRIFF, a dataset of 137K expert-written demonstrations spanning 5 essential task categories for literature understanding: information extraction, summarization, question answering, claim verification, and classification. Download on HuggingFace: huggingface.co/collections/al…

English

150

20.1K

David Wadden retweetledi

Hanna Hajishirzi@HannaHajishirzi·17 Nis

Introducing our best OLMo yet. OLMo 1.7-7B outperforms LLaMa2-7B, approaching LLaMa2-13B at MMLU and GSM8k. High-quality data and staged training are key. I am so proud of our team making such significant improvement in a short period after our first release.

Ai2@allen_ai

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog: blog.allenai.org/olmo-1-7-7b-a-…

English

107

13.8K

David Wadden retweetledi

Yanai Elazar@yanaiela·1 Nis

I am excited to be back at CLeaR and present this work today!

Yanai Elazar@yanaiela

Does arXiving have a casual effect on acceptance? The answer is nuanced, and depends on what assumptions you are willing to make, but arguably more importantly, we observe no difference in acceptance for different groups. arxiv.org/abs/2306.13891

English

2.8K

David Wadden retweetledi

Nathan Lambert@natolambert·20 Mar

Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models. 1. We evaluated 30+ of the currently available RMs (w/ DPO too). 2. We created new datasets covering chat, safety, code, math, etc. We learned a lot. We hope this is a major step in understanding why reward models work, rather than just how they do for RLHF. In short, we created pairs of responses, one good one bad (with manual review) and see where reward models agree! It's a simple and powerful process. Key takeaways: * Running reward models is hard, we build infra to make this easier. * We're already using this to learn more about PPO RLHF training (more on this soon). * Reward models mirror the refusals behavior we're confused about in RLHF. Some refuse everything (including llama 2 style stuff), some refuse nothing, and few models handle both cases well. * Datasets like Anthropic HH / Learning to Summarize only take us so far (and don't work for DPO) * Scaling matters (big models win again) Here's the current leaderboard: I'm very excited about future work. Figuring out what values are reflected, generative RMs, better RMs for training, more on DPO, and everything in between. Links! Leaderboard: huggingface.co/spaces/allenai… Code: github.com/allenai/reward… Paper (arxiv soon): github.com/allenai/reward… Eval dataset: huggingface.co/datasets/allen… YouTube walkthrough: youtu.be/CAaHAfCqrBA

YouTube

English

111

150

445

112.6K

David Wadden retweetledi

Fangyuan Xu@brunchavecmoi·8 Mar

Instruction-following capabilities of LLMs are a prerequisite to AI ✒️ writing assistance. How are good current LLMs at this task? We present 🥝 𝗞𝗜𝗪𝗜, a dataset with instructions for knowledge-intensive, document-grounded writing for long-form answers to research questions.

English

211

30.4K

David Wadden retweetledi

Semantic Scholar Research @ AI2@ai2_s2research·23 Şub

📣 Job opportunities at Semantic Scholar Research @ the Allen Institute for AI (AI2) for post-doctoral & pre-doctoral researchers starting in 2024! 📣 Our team works on NLP and HCI research with a focus on open LLMs and LLM-powered research support tools and assistants.

Semantic Scholar Research @ AI2 tweet media

English

34.4K

David Wadden retweetledi

Ai2@allen_ai·1 Şub

OLMo is here! And it’s 100% open. It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…

GIF

English

332

1.4K

358.1K

David Wadden retweetledi

Yanai Elazar@yanaiela·12 Oca

This is fantastic news!! Somewhat of a coincidence, but our paper that studies the effect of early arxiving on acceptance that suggested this effect is small and that it does not fill its purpose was accepted to CLeaR (Causal Learning and Reasoning) 2024 twitter.com/yanaiela/statu…

Graham Neubig@gneubig

ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged. aclweb.org/adminwiki/imag…

English

3.3K

David Wadden retweetledi

Semantic Scholar@SemanticScholar·7 Ara

New feature alert 🚨On each paper page, scroll down to find AI-generated Topic pages related to the paper, which include topic definitions, papers most cited for the topic, and more! Now available for Computer Science fields. Here’s an example: semanticscholar.org/paper/SPECTER%…

English

9.9K

David Wadden retweetledi

Hamish Ivison@hamishivi·21 Kas

Check out the Tulu 2 suite 🐪, a set of Llama-2 models finetuned+DPO-trained on a mixture of publicly available datasets! Our best-performing models are competitive with SoTA open models on a range of benchmarks incl. AlpacaEval and MT-Bench. 📜Paper: arxiv.org/abs/2311.10702

Hamish Ivison@hamishivi

Check out our new 70B DPO model here: huggingface.co/allenai/tulu-2… AFAIK currently the best model on AlpacaEval with a public finetuning set! More details once the AI sphere calms down a bit... 😅

English

131

75.1K

David Wadden retweetledi

Orion Weller @ ICLR’26@orionweller·15 Eyl

Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going 📈 But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more 📉 🚨 📝 (arxiv soon): orionweller.github.io/assets/pdf/LLM…

English

143

24.9K

David Wadden retweetledi

Daniel Weld@dsweld·30 Ağu

Interested in a better way to explore #VLDB2023 papers? Try exp-sum.apps.allenai.org for an LLM-powered way to probe those papers… * Ask questions w/ a single click * Explore answer provenance using the ending “ * Dive deep w/ recursive questions Powered by @SemanticScholar

English

21.6K

David Wadden retweetledi

Ashish Sharma@sharma_ashish_2·10 Tem

Absolutely thrilled🎉 to receive the @aclmeeting #ACL2023NLP 🏆Outstanding Paper Award🏆 for our work on cognitive reframing of negative thoughts! A huge shoutout to the diverse team behind this work @uwcse @uwnlp @MentalHealthAm and @StanfordHealth 💖

Ashish Sharma@sharma_ashish_2

Each time a paper gets rejected, you can't help but think "I'll never succeed as a researcher" Such negative thoughts are normal, but how can we overcome them? Our #ACL2023 📰 studies Human-LM Interaction for Cognitive Reframing of Negative Thoughts arxiv.org/abs/2305.02466 🧵

English

154

20K

David Wadden retweetledi

Yanai Elazar@yanaiela·3 Tem

English

193

50.4K

David Wadden retweetledi

Yizhong Wang@yizhongwyz·9 Haz

🦙🐪🐫 So many instruction tuning datasets came out recently! How valuable are they, and how far are open models really from proprietary ones like ChatGPT? 🧐We did a systematic exploration, and built Tülu---a suite of LLaMa-tuned models up to 65B! 📜arxiv.org/abs/2306.04751

English

152

612

287.4K

David Wadden retweetledi

Ai2@allen_ai·11 May

Today we're thrilled to announce our new undertaking to collaboratively build the best open language model in the world: AI2 OLMo. Uniquely open, 70B parameters, coming early 2024 – join us! blog.allenai.org/announcing-ai2…

English

180

628

237.1K

Keşfet

@shrusti_ghela @YejinChoinka @soldni @allen_ai @brunchavecmoi @eunsolc @SemanticScholar @aclmeeting