StandingOnTheMoon
1.8K posts

StandingOnTheMoon
@entangledQbit
Recursively Self-Improving...

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

Introducing Flux Matching, a generative modeling paradigm that generalizes diffusion models to vector fields that need not be the score function. Enables structural priors in the dynamics, faster sampling, interpretable generation, and more! w/ @StefanoErmon @Xiaojie_Qiu 🧵⤵️






You know you're in the Bay Area when there are cracked-open laptops outside the bathroom running agents

Decided to try a little experiment - zooming in from 15x to 100,000x in smooth progression. The SEM isn't particularly well calibrated so the result is only ~30-40nm resolution, but I think it's still pretty neat. Working on making SEMs so affordable you could have one at home :)

somebody made a huggingface model visualizer!! just plug in the url and explore at any granularity

This is the footprint ratio of data center to solar panels in the sunniest country in the world. Yeah, I think we're gonna have to go nuclear.


why do ML ppl refer to almost any low-dimensional subspace of almost any high-dimensional space as a manifold



Can you hold a more powerful thing in one hand? A high-pressure turbine blade in a jet engine can generate almost 1,000 horsepower, spinning at 20,000g in an environment 200 degrees higher than its melting point. But the most special thing about it is how it's made...






