Maissam Barkeshli

683 posts

Maissam Barkeshli

@MBarkeshli

Visiting Researcher @ Meta FAIR. Professor of Physics @ University of Maryland & Joint Quantum Institute. Previously @ Berkeley,MIT,Stanford,Microsoft Station Q

University of Maryland, College Park Katılım Aralık 2011

395 Takip Edilen2.9K Takipçiler

Sabitlenmiş Tweet

Maissam Barkeshli@MBarkeshli·6 Tem

An absolutely incredible, highly interconnected web of ideas connecting some of the most important discoveries of late twentieth century physics and mathematics. This is an extremely abridged, biased history (1970-2010) with many truly ground-breaking works still not mentioned:

English

129

485

Maissam Barkeshli retweetledi

Surya Ganguli@SuryaGanguli·10 Şub

Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!

English

117

571

59.6K

Maissam Barkeshli@MBarkeshli·3 Şub

Nice to make some progress on a basic topic in theoretical physics -- universal response of emergent Dirac fermions to crystal defects. Now published in @PhysRevX , with @ZoharKo , @CFechisin , Siwei Zhong. @JQInews @UMDPhysics

Physical Review X@PhysRevX

By studying how symmetries of lattice models map to symmetries of continuum theories that emerge at criticality, researchers show that the (2+1)D Dirac fermion exhibits a continuum of infrared fixed points at a particular applied magnetic flux. go.aps.org/46h8ACo

English

2.9K

Maissam Barkeshli@MBarkeshli·30 Oca

@GoonGarrett No, we have no understanding so far. I think it is relatively robust to model size but we didn’t do a careful study.

English

Garrett Goon@GoonGarrett·30 Oca

@MBarkeshli This is cool. Do you have some analytic understanding of the ~sqrt(m) context-length scaling law? Missed it if so. Curious how/if that scaling is sensitive to model size, as well

English

Maissam Barkeshli@MBarkeshli·28 Oca

Our ICLR 2026 paper shows how transformers can learn pseudo-random numbers. We demonstrate successful in-context prediction of pseudo-random sequences from permuted congruential generators, which are used in practice in NumPy. We succesfully attacked PCGs with moduli up to 2^22. Surprisingly, the transformer can learn the sequence even when only one bit is output from the hidden state. We found that curriculum learning is essential for these problems. We also found novel structures in the embedding layers: the model spontaneously clusters numbers according to how their bit strings transform under rotations.

English

494

Maissam Barkeshli@MBarkeshli·20 Oca

The full story is in our recent paper, arxiv.org/abs/2601.10684 . Thanks to @Andr3yGR @albe_alfa for the collaboration

English

542

Maissam Barkeshli@MBarkeshli·20 Oca

Scaling laws in AI – where do they come from? The discovery of neural scaling laws several years ago showed that the loss decreases predictably as a power law in model size, amount of data, and compute. But why? And what sets the exponents of the power law? The most popular explanation is that the dataset already has power law correlations in it (for example, power laws are prevalent in natural language corpora, e.g. Zipf’s law, etc), which translate to power laws in the loss. We studied transformers performing next token prediction on sequences coming from random walks on random graphs, where the data has no power law correlations. Nevertheless, after training the model, we observed power laws in the loss that look similar to those found in natural language. For example, here are results from a random walk on an Erdös-Renyi graph with 8K edges and 50K nodes: This challenges existing explanations, since this dataset of random walks falls outside of the assumptions made in existing models of scaling laws. Going forward, we need explanations of scaling laws based on expressivity and learnability of discrete data, where there is no data manifold, and which do not require the data to already have power laws built in. We also found a setting where we could tune the complexity of a language dataset by starting with a bigram model and gradually dialing up complexity until we get to natural language. This allowed us to track how the exponents of the scaling laws change with complexity:

English

3.8K

Maissam Barkeshli@MBarkeshli·11 Oca

@ZoharKo @BSeradjeh It’s surprising to me to hear about pro-Khamenei protestors

English

137

Zohar Komargodski@ZoharKo·11 Oca

@MBarkeshli @BSeradjeh I mean it is also not surprising at all, Hamas and Hezbollah, which the pro Palestine crowd supported for months, is funded by the Ayatollah.

English

132

Zohar Komargodski@ZoharKo·11 Oca

Over 2000 protesters were killed, apparently 😭 But note that there are no demonstrations on university campuses, no daily UN meetings, no petitions, and the keyboard freedom fighters that we all got to know so well on this platform are very quiet. nytimes.com/2026/01/10/wor…

English

3.3K

Maissam Barkeshli@MBarkeshli·11 Oca

@ZoharKo @BSeradjeh How do you know they are pro ayatollah?

English

130

Zohar Komargodski@ZoharKo·11 Oca

@MBarkeshli @BSeradjeh I just saw a demonstration in NYC pro Hamas and pro Ayatollah. These go together, very unsurprisingly, as one funds the other.

English

166

Maissam Barkeshli@MBarkeshli·11 Oca

@BSeradjeh @ZoharKo Anyway maybe there will be demonstrations if this keeps going, perhaps it’s too soon to tell.

English

150

Maissam Barkeshli@MBarkeshli·11 Oca

@BSeradjeh @ZoharKo Well, almost no Iranians in the US have protested in this case, whereas many protested during Israel-Gaza, so that can’t be true.

English

129

Maissam Barkeshli@MBarkeshli·11 Oca

@ZoharKo But I think a lot of those activities were aimed at trying to affect foreign policy. In this case there is no foreign policy to try to change. The lack of media coverage is more concerning to me

English

175

Zohar Komargodski@ZoharKo·11 Oca

@MBarkeshli it would be nice to condemn the killing of innocent people

English

428

Maissam Barkeshli retweetledi

Surya Ganguli@SuryaGanguli·15 Kas

We have 14 survey lectures for our @SimonsFdn Collaboration on the Physics of Learning and Neural Computation! All videos available at: physicsoflearning.org/webinar-series Here is the list: @zdeborova: Attention-based models and how to solve them using tools from quadratic networks and matrix denoising @KempeLab: Recent lessons from LLM reasoning @MBarkeshli: Sharpness dynamics in neural network training @KrzakalaF: How Do Neural Networks Learn Simple Functions with Gradient Descent? Michael Douglas: Mathematics, Economics and AI Yuhai Tu: Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization @SuryaGanguli: An analytic theory of creativity for convolutional diffusion models Eva Silverstein: Hamiltonian dynamics for stabilizing neural simulation-based inference @adnarim066: Generation with Unified Diffusion Bernd Rosenow: Random matrix analysis of neural networks: distinguishing noise from learned information @jhhalverson Nerual networks and conformal field theory @KempeLab Synthetic data: friend or foe in the age of scaling @WyartMatthieu Learning hierarchical representations with deep architectures @CPehlevan Mean-field theory of deep network learning dynamics and applications to neural scaling laws

English

249

22K

Maissam Barkeshli@MBarkeshli·27 Eki

From IBM 1960. old but fresh

English

473

Maissam Barkeshli@MBarkeshli·3 Eki

looks interesting

English

923

Maissam Barkeshli@MBarkeshli·19 Eyl

Wonderful workshop at Harvard on AI + math, I’m grateful to have been a part of it!

Eve Bodnia@evelovesolive

Thank you so much to everyone for this wonderful dinner! I’m truly grateful to Harvard University CMSA for this amazing experience. It makes me so happy to see the Math & AI community growing, can’t wait to see all the incredible things these brilliant minds will create together

English

919

Maissam Barkeshli retweetledi

Patrick Shafto@patrickshafto·19 Eyl

Many thanks to the speakers! Amazing week. @MBarkeshli @evelovesolive Adam Brown, Bennett Chow, Michael Freedman, @ElliotGlazer @jhhalverson @jessemhan @jdlichtman Junehyuk Jung @AlexKontorovich @ylecun Brice Ménard, Michael Mulligan, also Michael R. Douglas

Patrick Shafto@patrickshafto

AI and math. Geometry and symbolic reasoning. Amazing recent developments and stellar line up of speakers. It is going to be an exciting week! The Geometry of Machine Learning @ Harvard Center for Mathematical Sciences and Applications (CMSA) cmsa.fas.harvard.edu/event/mlgeomet…

English

5.6K

Maissam Barkeshli retweetledi

Surya Ganguli@SuryaGanguli·8 Eyl

A nice @Stanford news report on how university research is essential for understanding AI and sharing these insights openly with the world. humsci.stanford.edu/feature/three-…

English

9.8K

Maissam Barkeshli retweetledi

Simons Foundation@SimonsFdn·5 Eyl

Under the leadership of @Stanford's @SuryaGanguli, our new Simons Collaboration on the Physics of Learning and Neural Computation will study the fundamental scientific principles underlying AI: simonsfoundation.org/2025/08/18/sim… #science

English

4.5K

Keşfet

@Fraccagnetta @AllanRaventos @PhysRevX @ZoharKo @CFechisin @JQInews @UMDPhysics @GoonGarrett