Alex Ratner

1.8K posts

Alex Ratner banner
Alex Ratner

Alex Ratner

@ajratner

@SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.

Menlo Park, CA Katılım Kasım 2013
680 Takip Edilen6.6K Takipçiler
Sabitlenmiş Tweet
Alex Ratner
Alex Ratner@ajratner·
This week we launched the Open Benchmarks Grant with a $3M initial commitment from @SnorkelAI + partner support from @huggingface @togethercompute @PrimeIntellect @PyTorch @harborframework & others, in order to close the evaluation gap in AI. Our ability to measure AI has been outpaced by our ability to develop it - and open benchmarks are one of several critical, complementary tools to fix this. We're particularly interested in novel benchmarks that push and probe the frontier along three key vectors: (1) Environment complexity --> E.g. complex, domain-specific context and tool/action spaces, human interaction, world modeling) (2) Autonomy horizon --> E.g. long horizon, non-stationary goals (3) Output complexity --> E.g. complex outputs with nuanced, rubric-based evaluation / reward signals Check out more detail + link to apply here! benchmarks.snorkel.ai
English
1
7
43
7K
Alex Ratner
Alex Ratner@ajratner·
We are hiring for a ton of roles on our #Research team @SnorkelAI - if interested please reply/reach out! As one of the first academic teams to focus on AI data development back at @StanfordAILab / @UW - we have long believed this is one of *the* most exciting areas to be as a researcher :) Today - as a frontier data lab & partner to the world's leading AI labs and companies - we have more research vectors than we can possibly handle! Come help us tackle problems in complex environment generation; long-horizon and non-stationary benchmarking; complex rubric and process reward design; data valuation and curriculum learning; core data quality control; human-in-the-loop system design; large scale RL systems; and more!!
English
28
36
469
30K
Sarvesh Gharat
Sarvesh Gharat@SarveshGharat12·
@ajratner @SnorkelAI @StanfordAILab @UW Hey Alex, any upcoming PostDoc or Visiting Researcher opportunities? Would be very much interested in non-stationary benchmarks, and human in the loop systems PS; I majorly work on Sequential Decision Making and more recently has been working on LLM Reasoning
English
1
0
2
539
Alex Ratner retweetledi
Armin Parchami
Armin Parchami@ArminPCM·
We are hiring for multiple junior #Research roles within our research team at @SnorkelAI, focusing on the following areas: 1. Evaluations and benchmarking, particularly in domains such as legal and healthcare. 2. Post-training, with an emphasis on data valuation and curriculum learning. 3. Data quality evaluation research. 4. Human-in-the-loop annotation workflows. 5. MLOps for large scale LLM post training & RLFT. Minimum requirements include: - Master's (preferred PhD) degree with focus on machine learning and AI - Prior publications and contributions to real-world use cases and applications of AI - Prior experience with Large Language Models is a must with experience on RLFT for Agentic AI (computer use, cli, coding, mcp, etc). - Strong communication skills and the ability to solve problems independently If you are interested, please share your resume. #research #hiring #snorkelai
English
19
25
310
19.6K
Alex Ratner
Alex Ratner@ajratner·
Excited to share some of the work we've been doing with @harvey on BLB Research - a new benchmark pushing the frontier of agentic legal research!
Harvey@harvey

We've partnered with @SnorkelAI to build BigLaw Bench: Research. This benchmark focuses on hard legal research tasks like assessing earn-out manipulation in M&A and evaluating securities fraud defenses, where even frontier models with search tools fall short.

English
0
2
18
1.8K
Alex Ratner retweetledi
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.
Jon Saad-Falcon tweet media
English
35
91
307
92.5K
Alex Ratner
Alex Ratner@ajratner·
As task horizons increase and the input context / output product expands - verification becomes harder too! E.g. unit tests no longer suffice when you're talking about a software *project* Really appreciate the transparency & commitment to scientific principle of @METR_Evals !
Joel Becker@joel_bkr

new @METR_Evals research note from @whitfill_parker, @cherylwoooo, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.

English
0
0
11
1.3K
Alex Ratner
Alex Ratner@ajratner·
Excited chatter about "RL environments" continues - but the environment itself (as often defined) is only one small part! Environments (for eval or tuning/RL) require - Tasks that reflect model failure modes + real world usage distributions - Detailed task-specific rubrics - Reference output and traces. Just like a desk with a calculator isn't much without test questions, answers, and grading keys. Depending on how you define the "environment", there are many dynamic components needed as well - e.g. simulated data, user personas for multi-turn / collaboration, etc. The "environment" itself (as often narrowly defined) - e.g. a website/CRM/etc clone - is generally the *least* important part, especially given intersection with today's agentic coding capabilities. Tl/dr: Pay most attention to the data and dynamic components - not just the "environment" itself.
English
5
7
95
8.1K
Alex Ratner retweetledi
Daniel
Daniel@danielisdizzy·
Larry Ellison $ORCL highlighted something critical: models like ChatGPT, Gemini, Grok, and Llama are all trained on largely the same public internet data. When everyone trains on the same information, models inevitably converge. That’s why AI is moving toward commoditization. The real moat isn’t the model itself. It’s the proprietary data behind it. Companies that can train on exclusive datasets gain an advantage competitors can’t replicate. Having data that no one else has will allow you to dominate your market.
English
363
528
5.1K
1.1M
Alex Ratner
Alex Ratner@ajratner·
@JayadityaSethi Environments alone are only a small part of the puzzle - the tasks themselves (and accompanying rubrics, reference answers, labels) introduce the diversity! E.g. "a computer" is an environment - all about the different tasks/skills/etc!
English
0
0
1
77
Jay Sethi
Jay Sethi@JayadityaSethi·
@ajratner what happens when an env’s learning is exhausted? are envs just meant to be solved and then replaced?
English
1
0
2
105
Alex Ratner
Alex Ratner@ajratner·
There are three major vectors of progress for AI capabilities, and the benchmarks that measure them: (1) Environment complexity --> E.g. complex, domain-specific context and tool/action spaces, human interaction, world modeling (2) Autonomy horizon --> E.g. long horizon, non-stationary goals (3) Output complexity --> E.g. complex outputs with nuanced, rubric-based evaluation / reward signals We are just beginning to systematically *measure* tasks with truly complex inputs and envs, complex outputs/rubrics, and long horizon execution - let alone solve them. The frontier remains wide open!
English
3
8
59
4.5K
Alex Ratner
Alex Ratner@ajratner·
And, if you are working at the frontier of measurement along any of these vectors - check out our new Open Benchmarks Grants: benchmarks.snorkel.ai
English
0
0
0
384