Louis
7.6K posts

Louis
@logicus
philosophy phd candidate | multiagent systems | decision | scientific discovery | rust | lean | seeking employment (summer 2026+)


Regarding our Esolang Benchmark: - Our study’s conclusions were about model performance with restrictions (limited token budget to 32k and without tools like bash/python) - But if you let models attempt these problems with tools (like bash/python) and give them lots of iterations and thinking budget, models are able to solve problems (they do take tens of minutes, tens of iterations and many hundreds of thousands of tokens) We had noted this difference in our launch thread and plan to publish our updated analysis soon, but here’s an independent analysis which shows the same ⬇️ We are thankful to the community for all the feedback. In our follow up paper, we aim to emphasise this nuanced take clearly.

There are two ways to build AI for mathematics. One is to work in private and surface results after the fact. The other is to put real tools in the hands of mathematicians, learn from real use, engage in public, credit the community you build on, and support the ecosystem itself. We believe in the second model. Mathematics is a profoundly human endeavor. AI should strengthen mathematicians, not route around them. Build with mathematicians, not around them.


If you found this post helpful, follow me for more content like this. I publish a weekly newsletter where I share practical insights on data and AI. It focuses on projects I'm working on + interesting tools and resources I've recently tried: alexeyondata.substack.com



The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task


Cream Abdul-Jabbar Milk Chamberlin LeBron Frames Steph Blurry Larry Nerd

neck tatken prediction

@anujsaharan_ Gotta be a risk taker not a philosopher










