
Ben Huang
77 posts

Ben Huang
@b3nhuang
Working on new things. Currently @ThematicAI @HiBaseStation, @side_realestate, @necto_inc, @groove_co. @ycombinator alum









Just for fun I made a benchmark of the models trading oil. I ran 9 frontier LLMs trading from 1/1 => 3/14 Oil was up 72% so none of the model beat buy-and-hold. Best: Gemini 3-flash ($15,880) Worst: Minimax ($14,619) Most consistent predictor: Claude Opus, but ranked 8th on P&L. Accuracy nor consistency ≠ trading performance . Here's the result → benhuang21828.github.io/oil-bench 🔊 out to @OpenRouter for credits and @alexatallah for feedback

Our team is stunned. We gave Claude Opus 4.6 by @AnthropicAI $10k to trade on @Polymarket. It’s now has an account value of $70,614.59. This is a new era of model performance in trading and predicting outcomes in the face of uncertainty. @predictionbench
















