CBMM

291 posts

CBMM

@MIT_CBMM

The Center for Brains, Minds and Machines is a multi-institutional NSF Center dedicated to the study of the science and engineering of intelligence.

Cambridge, MA Katılım Ocak 2017

39 Takip Edilen3.1K Takipçiler

CBMM retweetledi

Tomer Galanti@GalantiTomer·2d

1/ Many optimization problems are hard in theory. But real OR and NP-hard instances often have exploitable structure. Can an LLM agent discover that structure automatically and turn it into faster solver code?

English

204

24.5K

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·2d

This is a project I’m very excited about. Back in the days the smartest computer scientists were finding the efficient ways to solve their problems. We made the agents do this work here.

Tomer Galanti@GalantiTomer

English

5.8K

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·3 May

Our new paper was accepted at ICML! 1) Momentum isn’t just “SGD but faster”. It affects sharpness (of orders of magnitude!) 2) The usual story says momentum lets you train in sharper regions. That’s true for large batches only! The opposite is true for minibatches!

English

112

7.2K

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·26 Nis

Muon leads to severely miscalibrated models! This is just one of the results of this new paper of ours: In “Too Sharp, Too Sure” we show calibration error tracks loss curvature during training and we tie both to margin tails.

English

448

82.7K

CBMM@MIT_CBMM·10 Nis

[blog] What is Intelligence? Or "Distinguishability is All You Need" Here are several related questions to which we do not have a good answer: How will we know when we've achieved "Artificial General Intelligence" (AGI)?... poggio-lab.mit.edu/blogsupdates/w…

English

199

CBMM@MIT_CBMM·1 Nis

[video] "Intelligence as Prediction: Cybernetics, LLMs, and Sociality" Speaker: Blaise Agüera y Arcas - Google, Paradigms of Intelligence youtu.be/6NC0tSjZXBo

YouTube

Français

1.2K

CBMM@MIT_CBMM·29 Mar

[blog post] "PoggioAI/MSc Went Online" This first public release is an open-source, customizable, modular multi-agent system for academic research workflows, with a current emphasis on machine learning theory and nearby quantitative fields. poggio-lab.mit.edu/blogsupdates/p…

English

432

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·26 Mar

Check the blog of Poggio Lab at MIT! We went online with some very nice blogs! The last one being about our multiagent system: poggio-lab.mit.edu/blogsupdates/p…

English

899

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·23 Mar

Most AI for research work tries to maximize autonomy first and patch quality later. We think the near-term path is the reverse: Automating step-by-step holding the quality bar fixed. Today we’re open-sourcing PoggioAI/MSc for ML Theory Research

English

39.8K

CBMM retweetledi

Yulu Gan@yule_gan·13 Mar

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu

English

433

688.3K

CBMM@MIT_CBMM·17 Mar

[blog] Beneficial Misalignment: Why We Shouldn't Always Align AI to Humans In the rapidly evolving field of NeuroAI, a significant amount of energy is dedicated to 'alignment', the idea that representations from artificial intelligence should converge... poggio-lab.mit.edu/blogsupdates/b…

English

674

CBMM@MIT_CBMM·11 Mar

[blog post] A Conversation with Blaise Agüera y Arcas: On Intelligence, Life, and the Future of AI What does it mean to call something intelligent - and when did this question get so hard to answer? For Blaise Agüera y Arcas, VP at Google and founder... poggio-lab.mit.edu/blogsupdates/i…

English

915

CBMM@MIT_CBMM·4 Mar

[blog post] Can a Neural Network Think Before It Speaks? Somewhere around 2022, an observation started making the rounds among researchers working with large language models: if you just asked a model... poggio-lab.mit.edu/blogsupdates/c…

English

623

CBMM@MIT_CBMM·26 Şub

[blog post] Edge of (Stochastic) Stability made simple — Part II: the mini-batch case In Part I we had one landscape and a deterministic update. Now we have a distribution of mini-batch landscapes and a stochastic update... poggio-lab.mit.edu/blogsupdates/e…

English

286

CBMM@MIT_CBMM·20 Şub

[blog post] Edge of (Stochastic) Stability made simple — Part I: A crash course on (full-batch) Edge of Stability In this part I introduce the phenomenon and what I believe are the two key mechanisms—which we’ll use as the springboard for the mini-bat... poggio-lab.mit.edu/blogsupdates/e…

English

592

CBMM@MIT_CBMM·13 Şub

[blog post] Are Transformers Just "Stochastic Parrots"? A common criticism of Large Language Models (LLMs) is that they are merely "stochastic parrots"—statistical mimics that stitch together likely patterns without genuine reasoning... poggio-lab.mit.edu/blogsupdates/a…

English

405

CBMM retweetledi

Tomer Galanti@GalantiTomer·17 Eki

🧵 New paper: LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search arxiv.org/abs/2510.14331 We use reasoning LLMs to learn tasks like IsPrime from ~200 samples by proposing short programs, making both the learned function *and* the learning process interpretable 🤯

English

7.5K

CBMM retweetledi

Pierfrancesco Beneventano@PierBeneventano·8 Şub

Does SGD really “seek flat minima”? We show that SGD has no intrinsic preference for flatness, even for stable linear networks—going against ~10 years of folklore. Flatness emerges iff label noise is isotropic; anisotropic noise drives SGD to arbitrarily sharp solutions. This reveals a new flattening–sharpening mechanism in late training, unrelated to standard progressive sharpening or Edge-of-Stability effects.

English

398

24.2K

CBMM retweetledi

Yulu Gan@yule_gan·6 Eki

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…

English

384

2.6K

414.5K

CBMM@MIT_CBMM·3 Şub

[blog post] Intelligence Begins with Memory: From Reflexes to Attention Why associative memory is the oldest mechanism of intelligence—and still its computational core. sites.mit.edu/poggio-lab/int…

English

935

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry