Luis
189 posts












looking ahead, we’re prototyping something new -- we call it predictive sensing. our paper cited tons of work from cogsci and developmental psychology. the more we read, the more amazed we became by human / animal sensing. the human visual system is super high-bandwidth, yet insanely efficient. each eye’s 6 million cone receptors can transmit ~1.6 Gbit/s, yet the brain uses only about 10 bits/s to guide behavior. most sensory data is filtered, compressed, and everything is autopiloted -- you don’t even notice. how does our brain pull that off? one leading theory: your brain runs a predictive world model in the background for sensing, constantly forecasting the future and comparing it to what actually happens. - if the prediction error is low → it’s expected, you can ignore it. - if it’s high → it’s a surprise, and your brain pays attention, updating memory. we don't have anything comparable in LLMs right now. to test this idea, we trained a latent frame prediction (LFP) head on top of Cambrian-S. we estimate "surprise" during inference, and use it in two ways: 1️⃣ surprise-driven memory management -- compress or skip non-surprising frames, focus compute on surprising ones. 2️⃣ surprise-driven event segmentation -- use surprise spikes to detect event boundaries or scene changes. by leveraging signals from this internal predictive model, we’re already seeing promising gains on spatial cognition tasks. it’s just a toy predictive world model -- but with this mechanism, our small model outperforms gemini on vsi-super. [6/n]





Is COLMAP still widely used or are Mast3r / VGGT taking over?












