fleet retweetledi

Excited to share new benchmarking work from @fleet_ai & friends.
We challenge frontier models to draw!
Surprising, across the entire frontier, models are really bad. The ways they fail can teach us about how AI perceives our world 🧵

James Zhou@jameszhou02
Models can do hard math and write complex code. But ask them to replicate a simple drawing step-by-step and they often break basic spatial constraints. Preview results from our new benchmark: Printing Machines Made with @jerryzhou and @fleet_ai printingmachines.ai/blog/printing-…
English



