Mark Young
412 posts
Mark Young
@vizmo
Designer, developer, scientist

otoh, and this is the part i find hopeful: the post qt’d is about the local problem: language is a brutally narrow channel. 2 people can hear the same words, yet run totally different reconstructions, and walk away feeling either perfectly clear or completely alien to each other. meaning is inferred, not transmitted whole. but there may also be a global convergence pressure. the platonic representation hypothesis is basically the idea that when very different systems get good enough at modeling the same world, their internal representations start becoming more alike. larger models don’t just perform better, but across architectures, and even across language vs vision, they begin organizing reality in more similar ways. some fMRI work suggests the strongest overlap is often not at the final “answer” layer, but in the middle of the stack, where the system is still actively building a general-purpose world model of what’s going on. in a 2025 review of 25 fMRI studies, this intermediate-layer pattern shows up repeatedly, and the broader alignment story tends to get stronger as models scale in size and capability. that’s how I reconcile this with the post qt’d: a single conversation may be nowhere near enough to make 2 differently trained systems converge. shared reality may exert a long-run pull toward similar abstractions, but the channel between 2 people is still narrow, lossy, and path-dependent. yes, sometimes you really do hit the compression limit of a relationship, but that isn’t proof that shared understanding is fake, just that it is expensive. paper links: arxiv.org/abs/2405.07987 phillipi.github.io/prh/ arxiv.org/abs/2510.17833

Found 2 papers on language, brains, and LLMs that together tell a story no one has cleanly articulated. One looks at spoken conversation and finds that contextual LLM embeddings can track linguistic content as it moves from one brain to another, word by word. The relevant representation shows up in the speaker before the word is said, then shows up again in the listener after the word is heard. The other looks within a single brain and finds that the timeline of verbal comprehension lines up with the layer hierarchy of LLMs: earlier layers match earlier neural responses, deeper layers match later ones, especially in higher-order language regions. Both papers are from the same group at Princeton. Quick summary of each, then what I think they mean together. Zada et al. (Neuron 2024) recorded ECoG from pairs of epilepsy patients having spontaneous face-to-face conversations. They aligned neural activity to a shared LLM embedding space and found that contextual embeddings captured brain-to-brain coupling better than syntax trees, articulatory features, or non-contextual vectors. The embedding space works like a shared codec. Speaker encodes into it before they open their mouth, listener decodes after. Goldstein, Ham, Schain et al. (Nat Comms 2025) pulled embeddings from every layer of GPT-2 XL and Llama 2 while people listened to a 30-minute podcast. In Broca’s area, correlation between layer index and peak neural lag hits r = 0.85. As you move up the ventral stream, the temporal receptive window stretches from basically nothing in auditory cortex to a ~500ms spread between shallow and deep layer peaks in the temporal pole. The classical phonemes → morphemes → syntax → semantics pipeline doesn’t recover this temporal structure. The learned representations do. Together, these papers make conversation look a lot like two brains running closely related forward passes, with speech acting as a brutally lossy bottleneck between them. Inside a single brain, the structure of that forward pass (shallow layers tracking fast local features, deeper layers integrating slower contextual information) looks a lot like the way comprehension actually unfolds over time. What's crazy is these models were only trained on text, and yet their layer hierarchy STILL mirrors the temporal dynamics of spoken-language processing, so whatever structure they picked up is probably not just a quirk of modality. It actually seems to fall out of language statistics themselves, which is not what the classical picture would predict at all. If comprehension were really a tidy pipeline of discrete symbolic modules, you’d likely expect to see that cleanly in the neural timing, but you don’t. If you take compression seriously, this suggests language is not really about explicit symbolic manipulation, but more accurately about lossy compression over a learned continuous space. Brains and transformers may be landing on similar solutions because the statistical structure of meaning constrains the geometry hard enough that very different objective functions (natural selection vs next token prediction) still push you into roughly the same region. Something I find kinda funny is transformers compute all layers for a token in one feedforward pass, while brains seem to realize something like the same hierarchy sequentially in time, sometimes within the same cortical region. Broca’s area obviously does not have 48 anatomical layers, but its temporal dynamics behave almost as if it does, which is quietly a point in favor of recurrence. What transformers learned may be right even if the brain implements it more like an RNN unrolling over a few hundred milliseconds. The field ditched RNNs for engineering reasons. The brain, apparently, did not get the memo. The better frame than “LLMs think like brains” is representing meaning in context may just be a problem with fewer good solutions than we assumed. If you optimize hard enough on language statistics, you may end up in a solution family that overlaps miraculously well with what evolution found. There’s a real isomorphism in the problem, even if not necessarily in the machinery. Paper links: pubmed.ncbi.nlm.nih.gov/39096896/ nature.com/articles/s4146…


i think chips with burnt-in LLMs that run at a very low power will probably result in much of the world around us being unneccesarily intelligent. cheaper to throw that chip and some flash with a readme into an automatic door opener than develop firmware for it.




Passenger rail networks in the United States vs Europe


Watching obscenely rich people and their friends blast off into (not quite) space for a jolly is bizarre when there’s so much to fix here on earth. This song by Gil Scot Heron song does a good job of summing it up…











