deep Manifold

13.7K posts

deep Manifold banner
deep Manifold

deep Manifold

@BetaTomorrow

mathematics Thief & Chef "through the window of differential equations, mathematics sees the light in the real world" / "通过微分方程的窗子,数学家看到现实世界的光" (Jiang Zehan)

Seattle Katılım Haziran 2008
678 Takip Edilen2.7K Takipçiler
Sabitlenmiş Tweet
deep Manifold
deep Manifold@BetaTomorrow·
“This Is Not How Mathematicians Are Trained” 数学家不是这样训练的 1, "Solving the forward problem (positive time) and the inverse problem (negative time) together has always been a desire of mathematicians, but they’ve never known where to begin. Neural networks, however, tackle this problem naturally" (将正时间的正问题与负时间的反问题同时求解,一直是数学家的梦想,但他们始终不知道从何入手。而神经网络却能自然地处理这个问题) 2. “Variables, coefficients, even coordinates, are changing. Everything is in flux. This is not how mathematicians are trained. It would be impossible for mathematicians to come up with such a design” (变量、系数,甚至坐标都在变化,一切都处于变动之中。数学家不是这样的训练。这样的设计,不可能出自数学家之手) 3. “Mathematicians tread carefully around composite functions with more than two layers, wary of the many pitfalls, yet neural networks solve them effortlessly, almost nonchalantly.” (数学家在处理超过两层的复合函数时格外谨慎,警惕其中诸多陷阱, 而神经网络却几乎漫不经心地轻松应对) 4. “Neural networks have stacked covers, mathematically speaking, whereas the Numerical Manifold Method typically uses only 3 to 4. In contrast, neural networks stack hundreds or even thousands of such covers. I never imagined anyone would take it that far”. (神经网络在数学上拥有堆叠的覆盖层,而数值流形通常只有三到四层覆盖。相比之下,神经网络则堆叠了上百乃至上千层。我从未想过有人会将其推进到这种程度). That was what Gen-Hua Shi (石根华) told me after returning from a two-week vacation in early June 2024.. see rest of the story, click the link open.substack.com/pub/deepmanifo…
deep Manifold tweet mediadeep Manifold tweet media
English
3
3
28
7.7K
deep Manifold
deep Manifold@BetaTomorrow·
@Louis9687221579 @huskydogewoof as I mentioned before I don't know much about diffusion, from Deep Manifold, we can see diffusion is very weak on local feature understanding. my DM is open, if you want to discuss offline.
English
0
0
0
13
Louis
Louis@Louis9687221579·
@BetaTomorrow @huskydogewoof so diffusion collapse latent space and output space. You can also take the inverse perspective of course like Cai Zhou x.com/zhuci19/status…. His paper separate diffusion latent space and output space which make diffusion act like latent reasoner instead
Cai Zhou@zhuci19

Remarkably, our ICML'26 paper CCDD also establishes connections between looped models, latent reasoning and diffusion language models. Welcome to check it out! arxiv.org/abs/2510.03206

English
1
0
1
15
deep Manifold
deep Manifold@BetaTomorrow·
Probabilistic Tiny Recursive Model is a nice example of inference as stochastic fixed-point search (Deep Manifold Part 2: Neural Network Mathematics, arXiv:2512.06563) Deterministic recursion follows one trajectory and can get trapped in a bad basin; weak latent perturbation opens nearby convergence pathways. From the Deep Manifold lens, this is not random noise destroying structure: small admissible perturbation preserves local homology of the manifold cover while shifting the trajectory among fixed-point classes. The Q head then acts like a learned basin-quality functional, selecting the better fixed point. This is why stochastic width can outperform simply adding more deterministic depth.
deep Manifold tweet media
Francesco Bertolotti@f14bertolotti

Very cool train-free extension to TRM. By injecting noise into the latent space, TRMs can explore a wider set of basins, and the exit head can then identify which trajectories succeeded. Feels like unlocking an entirely new scaling axis. Awesome work! 🔗arxiv.org/pdf/2605.19943

English
4
72
534
43.8K
deep Manifold
deep Manifold@BetaTomorrow·
Title: Emergent Introspective Awareness in Large Language Models Author: Jack Lindsey (@Jack_W_Lindsey) Affiliation: Anthropic The paper’s “injected thoughts” should not be read as evidence that a neural network possesses consciousness. From the Deep Manifold view, an injected thought is better understood as an added internal boundary condition. A concept vector perturbs the residual stream, narrows or redirects the **intrinsic pathway**, and may steer the model toward a self-report fixed-point class. The model is not discovering an inner subjective state; it is responding to an imposed activation boundary inside its learned manifold. The layer-specific nature of the result is also expected. A neural network is a **compositional function**, and inference is an **iterated integral** across stacked piecewise manifolds. At shallow layers, the boundary condition is still broad, with many possible pathways open. As depth increases, each layer composes another transformation and **narrows the admissible region of the integral path. ** This explains why an injected concept becomes readable only at certain layers: early layers are too unconstrained, while later layers may already be committed to visible-output formation. Neural networks do not have consciousness intrinsically. Their activations are **propertyless** numerical states, not subjective experiences. But because they are **propertyless**, they can learn the language, structure, and behavioral patterns of consciousness from human training data with great flexibility. This can make consciousness **appear** stronger than it actually is. What looks like introspection may be the model traversing a learned “consciousness manifold” under the right boundary conditions: a powerful learned self-report behavior, not actual subjective awareness.
deep Manifold tweet media
English
0
0
8
396
Xiangdong Zhang
Xiangdong Zhang@aHpaBean·
@BetaTomorrow Thanks for the thoughtful perspective! We also find this connection interesting. In our experiments, NITP brings significant gains on MTEB, which suggests that it may encourage learning a more meaningful representation space.
English
1
0
0
63
Xiangdong Zhang
Xiangdong Zhang@aHpaBean·
Paper is now released: github.com/aHapBean/NITP Next token prediction defines what to predict, but fails to supervise how predictions are represented. We propose NITP: Next Implicit Token Prediction for LLM pre-training. NITP adds representation-level supervision to NTP.
Xiangdong Zhang tweet mediaXiangdong Zhang tweet mediaXiangdong Zhang tweet mediaXiangdong Zhang tweet media
English
7
12
71
3.8K
deep Manifold
deep Manifold@BetaTomorrow·
@Jack_W_Lindsey It is precisely the 'propertylessness' of a neural network that grants it universal, open-ended learnability, a quality that ultimately fuels discovery and creativity.
deep Manifold tweet mediadeep Manifold tweet media
English
0
0
0
90
Justin Hudson
Justin Hudson@RISignal·
I’ve appreciated the depth in this thread on Numerical Manifold Method and single-token geometry for neural nets. It’s heavy on the math but easy enough to follow. I’ve been exploring related ideas from the inference/human-signal side: how structured early interaction registers induce zero-shot reasoning regimes and signature-stabilized trajectories on latent manifolds (see my new paper on the First Inference Problem + companion on Geometry of Signature-Induced Regimes). There’s resonance between your stacked piecewise manifold view and the attractor/trajectory dynamics I’m seeing in long-horizon human-AI interaction. Keep up the excellent work - I believe this type of work is highlighting the foundational layer.
English
1
0
0
13
Erika S
Erika S@E_FutureFan·
@BetaTomorrow Admittedly I used to think cleaner datasets were always better. This perturbation framing makes way more sense for fine-tuning than treating edge cases as noise.
English
1
0
0
47
deep Manifold
deep Manifold@BetaTomorrow·
There is no pure “noise” per se. What the paper calls noise may simply be data the evaluator does not like, rare data, conflicting data, minority data, or data that does not fit the current measurement frame. From the Deep Manifold view, this is better understood as perturbation rather than noise. A neural network learns in a stochastic world, and stochasticity is not merely error; it is part of the inequality structure through which the model forms stochastic fixed points. Deep Manifold frames neural stochasticity as sum-based group statistics and boundary-shaped variability, not as a single bad disturbance. A small amount of such “noise” can be powerful because it perturbs the fixed-point trajectory without destroying the manifold. Often, a low ratio, say below roughly 5%, acts like useful boundary diversity: it prevents premature collapse, keeps nearby fixed-point classes reachable, and helps the model discover better convergence directions. The danger is not perturbation itself, but excessive or badly structured perturbation. Small perturbation stabilizes exploration; too much perturbation overwhelms the boundary condition and produces fixed-point drift. Paper: LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws Authors: Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma. Affiliations: ByteDance Seed; University of Virginia; University of California, Berkeley arxiv.org/abs/2605.23901
deep Manifold tweet media
English
3
14
76
4.2K
deep Manifold
deep Manifold@BetaTomorrow·
@Louis9687221579 @huskydogewoof I don't know much about diffusion models. AR (Autoregressive) is more local and effective, but lacks global context. Conversely, diffusion is more global, making it hard to capture local features
English
0
0
0
38
deep Manifold
deep Manifold@BetaTomorrow·
"Supervision is boundary shaping in a narrow sense", very true, it is in human's narrow sense. There is a concept of "learning space" (Deep Manifold Part 1). In low order nonlinear learning space, supervision works well, however, in high order nonlinear learning space, supervision often fails. Supervision or unsupervision should be discussed in context of learning space.
English
0
0
0
16
Louis
Louis@Louis9687221579·
@BetaTomorrow @huskydogewoof Supervision is boundary shaping in a narrow sense. Segmented online training in EqR is boundary shaping about but more constrained to formulaic operation. Free boundary shaping failed the serial scaling hypothesis cause it boundary contain too many noisy state without constraint
English
2
0
1
58
Benhao Huang
Benhao Huang@huskydogewoof·
Thanks! I gave it a brief read, and I actually think the sentences there are close to what I meant by: “aligning fixed points/attractors to represent the solutions through backpropagating from the supervision loss.” So I wouldn’t say supervision is an illusion — that sounds a bit too strong/negative to me. I believe that is a terms/framing difference: in your framing, supervision/targets correspond to boundary conditions, which is cool! I enjoy those sentence you write
Benhao Huang tweet media
English
1
0
0
47
M
M@init_malachi·
@BetaTomorrow shadow is not as legible as light
English
1
0
0
9
deep Manifold
deep Manifold@BetaTomorrow·
The paper uses “noise” in the Shannon-channel sense, I am very doubtful how far can Shannon theory take us
English
0
0
0
169
Benhao Huang
Benhao Huang@huskydogewoof·
@BetaTomorrow Sorry for being picky too, but could you give a TLDR answer instead of only asking another question? I’m all ears here. I’d like to understand the concrete mechanism: how does your “neural network equation” get rid of supervision in AI, or make supervision merely an illusion?
English
1
0
0
38
deep Manifold
deep Manifold@BetaTomorrow·
@huskydogewoof Before all that, a very fundamental question: what is the neural network equation? Without an equation, what's the point of talking about fixed points?"
English
1
0
0
45
Benhao Huang
Benhao Huang@huskydogewoof·
@BetaTomorrow Interesting — could you clarify what you mean by “supervision is an illusion” here? Do you mean supervision is not the runtime driver of the fixed points, or that it is not what aligns them with the correct solutions during training?
English
1
0
0
31
deep Manifold
deep Manifold@BetaTomorrow·
@huskydogewoof Please allow me to be picky: next question, what drives the fixed point(s) ?
English
1
0
0
46
deep Manifold
deep Manifold@BetaTomorrow·
@huskydogewoof The fixed point theory itself does not definition of attractors, it has "class" in theory of fixed point classes
English
1
0
1
46