Doug Downey

118 posts

Doug Downey

Doug Downey

@_DougDowney

Researching AI for Science @allen_ai, Prof @northwesterncs

Katılım Mayıs 2020
266 Takip Edilen421 Takipçiler
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
🚨 The best AI gets built in the open. Next week, we’re bringing that message to #NVIDIAGTC — with panels, demos, and a window into what fully open models can do. Here's where to find us 🧵👇
Ai2 tweet media
English
4
7
92
13.4K
Doug Downey retweetledi
Pao Siangliulue
Pao Siangliulue@Siangliulue·
Are you a researcher in CS or a CS-adjacent field curious about how an AI agent can help you with your research project? Want to try a new tool for your research support in a paid user study ($100, 2 hr)? Limited spot numbers. See details and sign up here: forms.gle/JzLtkAhe7Ttvui…
English
2
22
101
9.4K
Doug Downey
Doug Downey@_DougDowney·
TL;DR: Evaluating Deep Research systems is hard. We discuss why and call out the importance of fine-grained metrics, annotator expertise, and subjectivity. Enjoyed this collaboration led by @JenaHwang2, with mentorship from @SergeyFeldman and contributions from a great team.
Ai2@allen_ai

🔎 Deep research agents like Asta ScholarQA and OpenAI Deep Research are transforming how we perform literature review. But how do we know if the way we evaluate them is actually meaningful? Announcing our new paper: “Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks” 🧵

English
0
0
8
312
Doug Downey
Doug Downey@_DougDowney·
Releasing the Asta Interaction Dataset: large-scale logs of real interactions with LLM-powered scientific research tools. Analysis led by Dany Haddad reveals how scientists use these systems in practice: longer, more complex queries and treating results as persistent artifacts. Special shout-out to one of his favorite figures: this Sankey diagram tracing section expansion (Si = section i expanded).
Doug Downey tweet media
Ai2@allen_ai

We analyzed 250K+ queries & 430K+ clickstream interactions from Asta, our AI-powered research assistant—and today we're releasing the full dataset. How do researchers actually use AI science tools? Here's what we found. 🧵

English
0
4
22
2.3K
Doug Downey
Doug Downey@_DougDowney·
Can today’s agents anticipate future scientific collaborations, ideas, and impact? Introducing PreScience, a large-scale AI benchmark for scientific forecasting. Careful dataset construction led by @anirudhajith42, with @aps6992, @jaydepun, @Hoper_Tom and collaborators.
Ai2@allen_ai

Can AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with @UChicago, supported by @NSF.

English
0
3
12
594
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Knowing which questions to ask is often the hardest part of science. Today we're releasing AutoDiscovery in AstaLabs, an AI system that starts with your data and generates its own hypotheses. 🧪
Ai2 tweet media
English
5
31
173
261.5K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜 Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵
Ai2 tweet media
English
14
92
607
55.3K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵
Ai2 tweet media
English
42
143
938
346.5K
Doug Downey retweetledi
Kyle Lo
Kyle Lo@kylelostat·
olmo 3 paper finally on arxiv 🫡 thx to our teammates esp folks who chased additional baselines thx to arxiv-latex-cleaner and overleaf feature for chasing latex bugs thx for all the helpful discussions after our Nov release, best part of open science is progressing together!
Kyle Lo tweet media
English
11
98
458
52.7K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥—and leads many open models on challenging industry video benchmarks. 🧵
Ai2 tweet mediaAi2 tweet mediaAi2 tweet media
English
7
63
325
125.6K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Update: DataVoyager, which we launched in Preview early this fall, is now available in Asta. 🎉 You can upload real datasets, ask complex research questions in natural language, & get back reproducible answers + visualizations. 🔍📊
English
6
16
66
13K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
Ai2 tweet media
English
54
328
1.7K
607.4K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚
English
13
122
669
123.3K
Doug Downey retweetledi
Jonathan Bragg
Jonathan Bragg@turingmusician·
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕arxiv.org/abs/2510.21652 🧵👇
English
4
20
30
4.1K
Doug Downey
Doug Downey@_DougDowney·
New project led by Shriya Atmakuri in collaboration with @aps6992: Ai2's Asta system now reports weekly which papers its research summaries have cited.  The aim is to give credit to the work that powers the reports, and provide a dataset for studying how AI systems cite science.
Ai2@allen_ai

📊 Today we're releasing data showing which scientific papers our AI research tool Asta cites most frequently. Think of it as creating citation counts for the AI era—tracking which research is actually powering AI answers across thousands of queries. 🧵

English
0
3
16
5.1K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Introducing Asta DataVoyager—our new AI capability in Asta that turns structured data into transparent, reproducible insights. Built for scientists, grounded in open, inspectable workflows. 🧵
English
5
27
115
371.8K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
A few new challengers enter SciArena—including DeepSeek-V3.2-Exp and Claude Sonnet 4.5 🔬
Ai2 tweet media
English
1
3
13
5.6K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
As part of Asta, our initiative to accelerate science with trustworthy AI agents, we built AstaBench—the first comprehensive benchmark to compare them. ⚖️
Ai2 tweet media
English
3
14
106
9.6K
Doug Downey retweetledi
Ai2
Ai2@allen_ai·
Introducing Asta—our bold initiative to accelerate science with trustworthy, capable agents, benchmarks, & developer resources that bring clarity to the landscape of scientific AI + agents. 🧵
English
10
49
223
295.7K