Basis (@BasisOrg) - Twitter प्रोफ़ाइल

पिन किया गया ट्वीट

Basis@BasisOrg·1 Kas

New paper from Basis' Project MARA team and collabs. The ability to learn and use world models is a key aspect of human intelligence, but evaluating this ability remains elusive. In this work we propose WorldTest, a representation-agnostic, behavior-based agent eval framework.

English

1

10

20

3.3K

Basis@BasisOrg·13 Oca

We're hiring research scientists in PL and other areas. Join us! #careers" target="_blank" rel="nofollow noopener">basis.ai/join-us/#caree…

English

0

2

269

Basis@BasisOrg·13 Oca

We're attending and sponsoring #POPL2026 in Rennes, France 🇫🇷 -- if you're around, stop by our sponsor booth to chat about research and open opportunities at Basis. We'll be there Wednesday 10:00-19:30 and Friday 10:00-18:00.

English

2

0

6

332

Basis रीट्वीट किया

Yichao Liang@yichao_liang·11 Kas

New preprint on learning abstract world models for robotics planning. Paper + code below. 🤖🌐 Must an agent plan by simulating pixels frame by frame, or can it think in abstractions? Consider planning an international flight: we can reason about buying tickets, changing airplanes, and crossing borders without committing to the color of the airplane or the milliseconds before takeoff. Absent abstraction, planning over long time horizons would be intractable, because every minute detail of the world would need to be simulated. [1/7]

English

2

10

22

3.4K

Basis@BasisOrg·1 Kas

@alex_prompter To everyone reading: Basis is hiring! Join us! jobs.ashbyhq.com/basis-research…

English

0

49

Basis रीट्वीट किया

Alex Prompter@alex_prompter·28 Eki

🚨 MIT and Basis Research just dropped a new way to measure if AI actually understands the world and the results are brutal. It’s called "WorldTest", and it doesn’t just check how well an AI predicts the next frame or maximizes reward. It checks whether the model can build an internal model of reality and use it to handle new situations. They built 'AutumnBench', a suite of 43 interactive worlds and 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. top AI models Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild... current AIs don’t understand environments; they pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. Paper: Benchmarking World-Model Learning (arxiv. org/abs/2510.19788)

English

54

217

932

109.5K

Basis@BasisOrg·1 Kas

@alex_prompter Thank you for the thread! Come discuss at our NeurIPS social if you'll be around luma.com/ivw952te

English

0

2

223

Basis रीट्वीट किया

Gary Marcus@GaryMarcus·30 Eki

like i have been saying since 2019, world models are the next key step.

Dr Alex Young ⚡️@AlexanderFYoung

🔥 MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment “Send” I’ll DM you the paper)

English

10

29

221

32.5K

Basis रीट्वीट किया

Eric Bourdages@EZE3D·30 Eki

"Today’s AIs don’t understand environments; they just pattern-match inside them." Literally what critics have been saying for years now.

Dr Alex Young ⚡️@AlexanderFYoung

🔥 MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment “Send” I’ll DM you the paper)

English

37

1.1K

7K

151.4K

Basis@BasisOrg·1 Kas

We'll also be at NeurIPS; come talk to us! Visit our booth or register for our social: luma.com/ivw952te

English

0

344

Basis@BasisOrg·1 Kas

All open roles: jobs.ashbyhq.com/basis-research

English

1

0

222

Basis@BasisOrg·1 Kas

New paper from Basis' Project MARA team and collabs. The ability to learn and use world models is a key aspect of human intelligence, but evaluating this ability remains elusive. In this work we propose WorldTest, a representation-agnostic, behavior-based agent eval framework.

English

1

10

20

3.3K

Basis@BasisOrg·31 Eki

And we're hosting a social at NeurIPS. If you want to come chat with us, RSVP: luma.com/ivw952te

English

0

3

310

Basis@BasisOrg·31 Eki

Our Project MARA team who led this work is looking for research scientists to join us! Link to apply below.

Dr Alex Young ⚡️@AlexanderFYoung

🔥 MIT just exposed every top AI model and it’s not pretty. They built a new test called WorldTest to see if AI actually understands the world… and the results are brutal. It doesn’t just check how well a model predicts the next frame or maximizes reward it tests whether it can build an internal model of reality and use it to handle new situations. They built AutumnBench 43 interactive worlds, 129 tasks where AIs must: • Predict hidden parts of the world (masked-frame prediction) • Plan sequences of actions to reach a goal • Detect when the environment’s rules suddenly change Then they tested 517 humans vs. Claude, Gemini 2.5 Pro, and o3. Humans crushed every model. Even massive compute scaling barely helped. The takeaway is wild.. today’s AIs don’t understand environments; they just pattern-match inside them. They don’t explore strategically, revise beliefs, or run experiments like humans do. WorldTest might be the first benchmark that actually measures understanding, not memorization. The gap it reveals isn’t small it’s the next grand challenge in AI cognition. (Comment “Send” I’ll DM you the paper)

English

3

0

2

276

Basis

खोजें