Victoria Graf (@VictoriaWGraf) - Twitter-Profil

Angehefteter Tweet

Victoria Graf@VictoriaWGraf·20 Kas

So excited to release Olmo 3!!! 🥳 Loved bringing a usability focus to such an impactful model And bringing row 27 to Team Moo Moo 🥰🐄🦖

Ai2@allen_ai

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

English

1

0

22

6.2K

Victoria Graf@VictoriaWGraf·12 Ara

Olmo 3 Instruct is now bigger and better 🚀 Olmo 3 Think? Better too Check out Olmo 3.1! ✨

Ai2@allen_ai

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵

English

1

2

18

3.6K

Victoria Graf@VictoriaWGraf·6 Ara

Loved talking to everyone at our IFBench poster at NeurIPS! If you missed us at the poster and want to chat, reach out!

English

1

13

1.2K

Victoria Graf@VictoriaWGraf·3 Ara

@soldni @_lewtun dinos are a cow’s best friend 🦖

English

0

2

67

Luca Soldaini 🎀@soldni·3 Ara

@_lewtun @VictoriaWGraf cows and dinos live in gpu harmony at Ai2 🫶

English

1

0

4

195

Luca Soldaini 🎀@soldni·3 Ara

please DO ask me for stickers, I have to many Ai2/Olmo 3/moo moo rawr swag

English

2

0

30

1.7K

Victoria Graf@VictoriaWGraf·3 Ara

@soldni @_lewtun we missed the chance to name the model olMOO! 🐄

English

1

0

2

53

Luca Soldaini 🎀@soldni·3 Ara

@_lewtun cuz @VictoriaWGraf says so! moo moo

English

2

0

1

135

Victoria Graf@VictoriaWGraf·22 Kas

Saumya was an amazing part of our team for Olmo 3 - a force to be reckoned with for posttraining and evals. Loved working with you! Can’t wait to see her shine in her PhD ✨ Don’t miss your chance - admit her asap!

Saumya Malik @ ICLR 🇧🇷@saumyamalik44

Olmo 3 is out!!!! It was so much fun working on post-training. Loved seeing this come together with the best team!!!!

English

0

9

506

Victoria Graf@VictoriaWGraf·20 Kas

Delta Learned ✅🫡

Scott Geng@scottgeng00

Super excited to release Olmo 3 🦕🐄! Wild to see my Delta Learning research go all the way from theory-land to becoming a core piece of the world’s best fully open model. It's good day to be a researcher 🥳

English

0

13

1.5K

Victoria Graf@VictoriaWGraf·20 Kas

Why is Moo Moo upside down? 😭

Luca Soldaini 🎀@soldni

this all makes sense i promise

English

1

0

7

690

Victoria Graf@VictoriaWGraf·9 Tem

A game-changer for post-training!

Scott Geng@scottgeng00

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

English

0

8

521

Victoria Graf retweetet

Scott Geng@scottgeng00·9 Tem

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

English

7

51

165

24.2K

Victoria Graf@VictoriaWGraf·4 Tem

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ @valentina__py! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

Valentina Pyatkin @ ICLR 🇧🇷@valentina__py

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

English

2

14

55

10.5K

Victoria Graf retweetet

Nathan Lambert@natolambert·3 Tem

This new benchmark created by @valentina__py should be the new default replacing IFEval. Some of the best frontier models get <50% and it comes with separate training prompts so people don’t effectively train on test. Wild gap from o3 to Gemini 2.5 pro of like 30 points.

Ai2@allen_ai

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

English

10

22

195

22.9K

Victoria Graf retweetet

Ai2@allen_ai·3 Tem

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

English

4

48

313

47.5K

Victoria Graf@VictoriaWGraf·21 Kas

Super excited to release Tülu 3, a family of fully-open state-of-the-art post-trained models, including its data, eval, code, and training recipes in a comprehensive guide for post-training techniques! allenai.org/papers/tulu-3-…

English

0

1

7

256

Victoria Graf retweetet

Ai2@allen_ai·21 Kas

Meet Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms. We invented new methods for fine-tuning language models with RL and built upon best practices in the community to scale synthetic instruction and preference data. Demo, GitHub, technical report, and models below 👇

English

14

132

526

218.2K

Victoria Graf@VictoriaWGraf·23 Haz

Had a wonderful time at #NAACL2024 this week! Thanks to everyone who came to my oral presentation on defending LLMs against backdoor attacks!

English

0

9

216

Victoria Graf

Entdecken