Snorkel AI

1.8K posts

Snorkel AI banner
Snorkel AI

Snorkel AI

@SnorkelAI

🧠 Frontier AI Data Lab | Advancing AI through better data 🚀 Powering frontier labs, Fortune 500 & gov't

Redwood City, California شامل ہوئے Temmuz 2019
298 فالونگ16.8K فالوورز
Snorkel AI ری ٹویٹ کیا
vincent sunn chen
vincent sunn chen@vincentsunnchen·
ICYMI - How can we build the benchmark factory? I'm very excited about the infra approach from @harborframework, because @alexgshaw @ryanmart3n & team obsess over researcher/developer UX (e.g. quality guardrails, low friction to RL/scaled rollouts)!
English
1
5
16
2K
swyx
swyx@swyx·
so AIE Europe is completely taking over 🇬🇧London next week! very very hyped to showcase the best companies, research, and AI engineers in Europe! 3 COMPLETELY FREE ways to join in: - there are a dozen side events around town! from Snorkel to GitHub to Arize to ClawCon and Claude Code meetups! - subscribe on YouTube! everything will be livestreamed and published for free @aidotengineer" target="_blank" rel="nofollow noopener">youtube.com/@aidotengineer - we are releasing 20 more volunteer slots here ai.engineer/associates meant for local, early career folks who otherwise could not afford a ticket! join in/see you in london town!
swyx tweet mediaswyx tweet media
English
47
35
264
94.7K
Snorkel AI ری ٹویٹ کیا
vincent sunn chen
vincent sunn chen@vincentsunnchen·
We'll been in London next week for AIE. Come say hi (DMs open)!! 🇬🇧
swyx@swyx

so AIE Europe is completely taking over 🇬🇧London next week! very very hyped to showcase the best companies, research, and AI engineers in Europe! 3 COMPLETELY FREE ways to join in: - there are a dozen side events around town! from Snorkel to GitHub to Arize to ClawCon and Claude Code meetups! - subscribe on YouTube! everything will be livestreamed and published for free @aidotengineer" target="_blank" rel="nofollow noopener">youtube.com/@aidotengineer - we are releasing 20 more volunteer slots here ai.engineer/associates meant for local, early career folks who otherwise could not afford a ticket! join in/see you in london town!

English
1
2
9
680
Snorkel AI
Snorkel AI@SnorkelAI·
See you in London 🇬🇧 Snorkel AI is hosting a happy hour at Bantof on April 7 for folks working on AI agents, evals, datasets, and open source. Great chance to meet others building in the space (plus food & drinks 🍻) Request an invite: luma.com/SnorkelVIPHapp…
swyx@swyx

so AIE Europe is completely taking over 🇬🇧London next week! very very hyped to showcase the best companies, research, and AI engineers in Europe! 3 COMPLETELY FREE ways to join in: - there are a dozen side events around town! from Snorkel to GitHub to Arize to ClawCon and Claude Code meetups! - subscribe on YouTube! everything will be livestreamed and published for free @aidotengineer" target="_blank" rel="nofollow noopener">youtube.com/@aidotengineer - we are releasing 20 more volunteer slots here ai.engineer/associates meant for local, early career folks who otherwise could not afford a ticket! join in/see you in london town!

English
0
0
6
319
Snorkel AI
Snorkel AI@SnorkelAI·
“We need a thousand times more benchmarks than we have right now” is @alexgshaw of @LaudeInstitute's take on the current moment. “Coding is an extremely broad domain, 89 tasks isn’t nearly enough.” Full Benchtalks interview posted by @vincentsunnchen and YouTube in the replies
English
1
1
5
386
Snorkel AI
Snorkel AI@SnorkelAI·
Top scores on Terminal-Bench 2 went from ~25% → 75-80% in just 4 months. For Benchtalks #1, @vincentsunnchen sat down with @alexgshaw to dig into what happens when your benchmark gets solved before you're ready for the next one. Key takes: → The terminal is the right abstraction for agentic AI → Harbor exists because benchmarking and RL at scale are infra problems → "Benchmaxxing" is real; the defense is shipping harder tasks faster → TB3 is coming, and they want your hardest unsolvable problems "We need 1000x more benchmarks than we have right now" — @alexgshaw
English
1
2
13
466
Snorkel AI ری ٹویٹ کیا
vincent sunn chen
vincent sunn chen@vincentsunnchen·
Terminal-Bench 2.0 went from ~25% → 80% in four months and became the standard eval for frontier CLI agents. Now, TB3 is in the works. I talked to @alexgshaw about what happens when model capabilities climb faster than we can measure them. His answer: the benchmark factory (@harborframework)— infrastructure to develop hard, representative evals at the pace that the frontier moves. As Alex put it: "we need a thousand times more benchmarks than we have right now." 00:23 - How quickly models hill-climbed TB2 01:46 - What rapid progress reveals about benchmarks vs. real-world capability 03:28 - What made Terminal-Bench stick 04:58 - Why the terminal is the right abstraction for agentic AI 07:14 - How TB2 maintains task quality at scale 09:23 - Managing benchmark integrity in a benchmaxxing world 10:47 - Harbor: from experiment to benchmark factory 12:19 - What Harbor does that nothing else did 14:37 - The invariants: what won't change as agent evals evolve 16:55 - The benchmark Alex most wants to see built 18:18 - The ideal human-in-the-loop task creation flywheel 20:32 - How to contribute to Terminal-Bench 3.0
English
2
11
60
10.6K
Snorkel AI ری ٹویٹ کیا
Armin Parchami
Armin Parchami@ArminPCM·
We just open-sourced FinQA — an #RL environment for financial reasoning agents. Real SEC 10-K data, multi-step reasoning + tool use, constrained SQL, binary rewards. The whole 9 yards! The kicker: a 4B model fine-tuned with FinQA outperformed a 235B model from the same family on finance reasoning: 58x smaller!
English
3
14
144
11.8K
Snorkel AI
Snorkel AI@SnorkelAI·
In the FinQA env, a 4B model was fine-tuned to outperform a 235B model from the same family on our Finance Reasoning benchmark. What did we teach the 4B model? Tool discipline. Learn more: snorkel.ai/blog/building-…
English
0
1
3
208
Snorkel AI
Snorkel AI@SnorkelAI·
Our FinQA environment is available on OpenEnv (s/o @huggingface + @PyTorch) FinQA is an open RL environment with: • 290 expert-curated questions • Real SEC 10-K data • Tasks requiring multi-step tool use RL proof point on FinQA: make a 4B model > 235B model 👇
Snorkel AI tweet media
English
1
1
12
570
Snorkel AI ری ٹویٹ کیا
Snorkel AI ری ٹویٹ کیا
Armin Parchami
Armin Parchami@ArminPCM·
Scaling RL training for agentic models is one of the hardest infra problems in ML right now and honestly, one of the most exciting jobs🔥 Our research team @SnorkelAI is deep in RLFT (data valuation, curriculum learning, and more). We're #hiring an ML Training Infra engineer who's actually done this at scale with complex environments and medium sized models. If that sounds like you (or someone you know), DM me or drop a comment 👇 #MLJobs | Link in thread
English
2
10
62
5.2K
Snorkel AI
Snorkel AI@SnorkelAI·
Snorkel was just named one of @FastCompany’s Most Innovative AI Companies of 2026. We’re helping to design and pressure test the datasets and evaluations that make AI models and agents work in the real world. Join us: snorkel.ai/join-us/
English
1
6
17
714
Snorkel AI
Snorkel AI@SnorkelAI·
Coming soon: BenchTalks—a candid podcast series by Snorkel AI on benchmarks, AI evals, and frontier research. 👀🎙️
English
0
1
22
747