AfterQuery

26 posts

AfterQuery banner
AfterQuery

AfterQuery

@AfterQuery

Applied research lab curating data solutions to accelerate foundation model development.

Katılım Şubat 2025
6 Takip Edilen1.4K Takipçiler
Sabitlenmiş Tweet
AfterQuery retweetledi
Spencer Mateega
Spencer Mateega@spencermateega·
@AfterQuery will be at ICLR next week! We’ll be at booth 404. Happy to chat about anything related to tool use/agents, RL environments, code gen, or evals.  DM me if you wanna meet up!
English
1
2
20
1.3K
AfterQuery retweetledi
Alex Shaw
Alex Shaw@alexgshaw·
AfterQuery post-trained GPT-OSS-20B using Harbor + Tinker and saw a 14% bump on TB2 performance. Love seeing people pick up Harbor for more than just evals.
English
3
16
71
12K
AfterQuery
AfterQuery@AfterQuery·
Our findings show that current models lack the ability to perform even the most basic tasks in high-impact, real-world domains like quantitative trading. We hope Market-Bench can serve as a shared framework to evaluate models’ understanding of trading strategies and code generation for quantitative finance. Excited to track how these capabilities evolve!
English
0
0
4
595
AfterQuery
AfterQuery@AfterQuery·
Introducing Market-Bench by @AfterQuery! The first-of-its-kind benchmark on LLMs for quantitative finance. We challenged models to attempt a frequent introductory quantitative trading task: coding an executable backtester from a natural-language strategy description and market assumptions. > 13 models build backtesting systems for directional, pair trading, and delta hedging strategies > evaluated on reliability (executable passes) and accuracy (MAE) across 5 attempts per strategy > real order book data with exchange delays and liquidity constraints > @xAI’s Grok 4 achieved the overall lowest mean MAE (deviation from the golden backtest), followed closely by @OpenAI’s GPT 5.2 > @AnthropicAI's Sonnet 4.5 and @AlibabaGroup's Qwen 3 Max at perfect executability but high MAE > Models from @Meta, @Amazon, @NVIDIA, and @Cohere continued to fail to produce executable backtesters Leaderboard & full paper below!
English
4
1
11
1.1K
Cigdem Oztabak
Cigdem Oztabak@cigdemoztabak_·
what a grand opening! watching @AfterQuery
AfterQuery@AfterQuery

Today, humanity is shackled by scarcity of expertise. When expertise becomes infinitely scalable, humans will be freed to tackle problems we can't even conceive of today. Introducing @AfterQuery. We’re building a world where expertise is abundant. Domain by domain, profession by profession, AfterQuery is crafting datasets that encode excellence into forms that machines can learn. Data is the final frontier.

English
1
0
4
1.1K
AfterQuery
AfterQuery@AfterQuery·
Today, humanity is shackled by scarcity of expertise. When expertise becomes infinitely scalable, humans will be freed to tackle problems we can't even conceive of today. Introducing @AfterQuery. We’re building a world where expertise is abundant. Domain by domain, profession by profession, AfterQuery is crafting datasets that encode excellence into forms that machines can learn. Data is the final frontier.
English
76
15
69
9.7K
shrawberry
shrawberry@shrawberryy·
website and brand identity i did for @AfterQuery !! site design, visuals, copy, animations all by me - logo in collaboration with @sashabirukoff more details soon yippeeee
English
65
47
1.4K
101.9K
AfterQuery retweetledi
Spencer Mateega
Spencer Mateega@spencermateega·
The frontier begets the frontier. I highly recommend reading @jaminball's latest Clouded Judgement article which spells out the AfterQuery thesis (thread)
Spencer Mateega tweet media
English
6
11
40
4.3K
AfterQuery retweetledi
Spencer Mateega
Spencer Mateega@spencermateega·
Introducing UI-Bench by @afterquery. The first and only rigorous eval of vibe coding tools. > 4,000+ blinded pairwise judgments > @budapp, @figma make, and @lovable take the lead > @v0 and @replit ranked dead last > performance gaps = differences in LLM orchestration,
English
14
33
156
33.3K