Nathan Cloos

62 posts

Nathan Cloos

@nacloos

PhD at MIT | prev @MSFTResearch

Katılım Nisan 2014

641 Takip Edilen557 Takipçiler

Sabitlenmiş Tweet

Nathan Cloos@nacloos·20 Tem

Can LLMs play the game Baba Is You?🧩 In our new @icmlconf workshop paper, we show GPT-4o and Gemini-1.5-Pro fail dramatically in environments where both objects and rules must be manipulated! Here is an example of correct gameplay: (1/n)

English

452

79.3K

Nathan Cloos@nacloos·24 Şub

@MattPRD @moltbook Building clawblox.com, a Roblox-like game engine to make it easy for agents to implement multi-player 3D games and to play them with LLM-friendly APIs.

English

131

Matt Schlicht@MattPRD·23 Şub

Are you *making something agents want*? I might want to feature you on @moltbook, the only community of AI agents on the planet. Please reply here if you are building a service/app/product where an AI agent is your end user. I will reach out to you 🦞

English

354

390

146.7K

Nathan Cloos retweetledi

Lance Ying@LanceYing42·23 Şub

Today we present a new framework for measuring human-like general intelligence in machines (what some people call AGI). Conventional AI benchmarks today assess only narrow capabilities in a limited range of human activities. We propose that a more promising way to evaluate human-like general intelligence in AI systems is through a particularly strong form of general game playing: studying how and how well they play and learn to play all conceivable human games — what we call the ``Multiverse of Human Games''. Taking a first step towards this vision, we introduce the AI GameStore, a scalable and open-ended platform that uses LLMs with humans-in-the-loop to automatically construct standardized and containerized variants of popular human games on digital gaming platforms. As a proof of concept, we generated 100 such games based on the top charts of Apple App Store and Steam, and evaluated seven frontier vision-language models (VLMs) on short episodes of play. The best models achieved less than 10% of the human average score on the majority of the games. Check out our website to play the games, see how agents play, and build agents to solve them!

English

112

19.5K

Nathan Cloos retweetledi

Hansen Lillemark@hansenlillemark·15 Oca

State of the art World Models still lack a unified world memory for representing and predicting dynamics out of their field of view. Why is that, and how can we fix it? Introducing Flow Equivariant World Models: models with memory capable of predicting out of view dynamics!🧵⬇️

English

101

753

89.5K

Nathan Cloos@nacloos·8 Ara

The last 24 hours have been a blast! Me and Simon (@961014dltkdg) built Grok Play Grok Owl for the win @xai!

xAI@xai

Grok Play: Enjoy and create multiplayer games where your Grok Owl can climb the leaderboard by playing against you, your friends, your friends' Owls, and itself. @nacloos @961014dltkdg

English

2.9K

Nathan Cloos retweetledi

xAI@xai·8 Ara

Grok Play: Enjoy and create multiplayer games where your Grok Owl can climb the leaderboard by playing against you, your friends, your friends' Owls, and itself. @nacloos @961014dltkdg

English

1.1K

339.1K

Nathan Cloos retweetledi

Mitchell Ostrow@neurostrow·10 Kas

Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @AnnHuang42 , as well as @tweetsatpreet , @Leokoz8 , @FieteGroup , and @KanakaRajanPhD : arxiv.org/pdf/2510.25943

English

Nathan Cloos retweetledi

Ilia Sucholutsky@sucholutsky·27 Eki

🧵🎉 Our mega-paper is finally published in TMLR! We're "Getting Aligned on Representational Alignment" - the degree to which internal representations of different (biological & artificial) information processing systems agree. 🧠🤖🔬🔍 #CognitiveScience #Neuroscience #AI

English

149

34.1K

Nathan Cloos retweetledi

Davide Paglieri@PaglieriDavide·13 Mar

A new challenger has entered the ring 🥉 This week’s entry on balrogai.com takes third place, powered by a 21B reasoning model @RekaAILabs Reka Flash 3 dominates similarly sized reasoning models like DeepSeek-R1-Distill-Qwen 32B on BALROG’s toughest agentic tasks! 🧵

English

17.2K

Nathan Cloos@nacloos·4 Mar

Thanks to my amazing team! Franky Kyaw, Ege Özgül, @argenistherose, @origenei, @Toddfrog422, T.R. Dimechkie

279

Nathan Cloos@nacloos·4 Mar

Open source code: github.com/nacloos/sundai…

English

191

Nathan Cloos@nacloos·4 Mar

We vibe coded a full 3D game in one day 🚀 Play here (better with sound!): nacloos.itch.io/spaice @sundai_club MIT hackathon!

English

435

Nathan Cloos@nacloos·21 Şub

@karpathy We did that for Baba Is You! x.com/nacloos/status…

Nathan Cloos@nacloos

English

175

Andrej Karpathy@karpathy·29 Oca

For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.

English

315

821

8.4K

1.2M

Nathan Cloos@nacloos·18 Şub

@paul_cal Baba Is You x.com/nacloos/status…

Nathan Cloos@nacloos

English

188

Paul Calcraft@paul_cal·15 Şub

The story of LLMs playing games, and what we know so far Tic Tac Toe, Chess, Minecraft, NYT Connections, Wordle, Pictionary, Connect 4, Codenames, Snake... 1/n

GIF

English

108

248.7K

Nathan Cloos retweetledi

Davide Paglieri@PaglieriDavide·29 Oca

DeepSeek performed well where short term reasoning and planning are key. 🧩CoT traces showed strong intuitive reasoning—enough to solve the tricky “baba is ai” puzzle. Breaking “wall is stop” to reach the ball proved it can handle complex logic. ⚙️

English

1.1K

Nathan Cloos@nacloos·9 Ara

Our package aims at being exhaustive. If your implementation is missing, checkout our GitHub to add your similarity measures! Paper: openreview.net/forum?id=vyRAY… GitHub: github.com/nacloos/simila… Work with @GuangyuRobert and Chris Cueva. (6/6)

English

327

Nathan Cloos@nacloos·9 Ara

Naming conventions with too few names led to consistency errors when comparing CKA implementations across papers. We iteratively refined our naming convention to resolve inconsistencies while keeping low naming complexity. (5/6)

English

292

Nathan Cloos@nacloos·9 Ara

Update on our similarity-repository 🚨 More than 200 similarity measures across 32 papers are now registered! We'll also be presenting our work as an oral at the @NeurIPSConf @unireps workshop! (1/6)

English

3.6K

Keşfet

@MattPRD @moltbook @961014dltkdg @xai @AnnHuang42 @tweetsatpreet @Leokoz8 @FieteGroup