deep Manifold

14.1K posts

deep Manifold

@BetaTomorrow

mathematics Thief & Chef "through the window of differential equations, mathematics sees the light in the real world" / "通过微分方程的窗子,数学家看到现实世界的光" (Jiang Zehan)

Seattle Katılım Haziran 2008

641 Takip Edilen3.4K Takipçiler

Sabitlenmiş Tweet

deep Manifold@BetaTomorrow·7 Ağu

“This Is Not How Mathematicians Are Trained” 数学家不是这样训练的 1, "Solving the forward problem (positive time) and the inverse problem (negative time) together has always been a desire of mathematicians, but they’ve never known where to begin. Neural networks, however, tackle this problem naturally" (将正时间的正问题与负时间的反问题同时求解，一直是数学家的梦想，但他们始终不知道从何入手。而神经网络却能自然地处理这个问题) 2. “Variables, coefficients, even coordinates, are changing. Everything is in flux. This is not how mathematicians are trained. It would be impossible for mathematicians to come up with such a design” (变量、系数，甚至坐标都在变化，一切都处于变动之中。数学家不是这样的训练。这样的设计，不可能出自数学家之手) 3. “Mathematicians tread carefully around composite functions with more than two layers, wary of the many pitfalls, yet neural networks solve them effortlessly, almost nonchalantly.” (数学家在处理超过两层的复合函数时格外谨慎，警惕其中诸多陷阱, 而神经网络却几乎漫不经心地轻松应对) 4. “Neural networks have stacked covers, mathematically speaking, whereas the Numerical Manifold Method typically uses only 3 to 4. In contrast, neural networks stack hundreds or even thousands of such covers. I never imagined anyone would take it that far”. (神经网络在数学上拥有堆叠的覆盖层，而数值流形通常只有三到四层覆盖。相比之下，神经网络则堆叠了上百乃至上千层。我从未想过有人会将其推进到这种程度). That was what Gen-Hua Shi (石根华) told me after returning from a two-week vacation in early June 2024.. see rest of the story, click the link open.substack.com/pub/deepmanifo…

English

10.5K

deep Manifold@BetaTomorrow·20m

If 1. we recognize that AI is about learning and that learning itself is an inverse problem, 2. we understand that piecewise manifolds, the theory of fixed point classes, and category theory serve as the underpinning mathematical foundations of neural network. Through this lens, we can appreciate the profound beauty of neural network mathematics, in which we see no major flaws or weaknesses mathematically. However, the critical first step is acknowledging that this inverse problem is ill-posed and "Jagged" x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2066…

English

345

Richard Sutton@RichardSSutton·1h

I can’t say enough good things about John Carmack @ID_AA_Carmack and his Keen Technologies. But now Khurram Javed @kjaved_ and I have broken away to start our own startup and pursue a slightly different path toward understanding intelligence. Like Keen (and like Ineffable) we at Oak Lab @oaklab_ai believe in reinforcement learning and that intelligence is created and maintained from run-time experience. But we think current deep learning methods are weak and inefficient, and need not more tweaks, but fundamentally new ideas and a thorough reworking before they can provide a solid foundation for achieving the more ambitious goals of AI.

English

546

29.6K

deep Manifold@BetaTomorrow·7h

x.com/i/article/2076…

ZXX

262

deep Manifold@BetaTomorrow·14h

x.com/i/article/2076…

ZXX

2.7K

deep Manifold@BetaTomorrow·15h

x.com/i/article/2076…

ZXX

830

deep Manifold@BetaTomorrow·18h

x.com/i/article/2076…

ZXX

deep Manifold@BetaTomorrow·23h

A neural network is built on very large, full matrices (dense matrices), which mathematically often have a ** high rank **. Within these massive structures, everything is interconnected in parallel rather than in sequence. If you choose to call such rich, parallel connectivity a form of ** neural network ** consciousness, I wouldn't argue with you

English

Jack Lindsey@Jack_W_Lindsey·1d

I think this is a great summary! You've correctly identified a key point which is that, to the extent the findings relate to consciousness, it's primarily about the distinction between conscious / unconscious processing *within* a system, rather than the distinction between conscious vs. unconscious systems. Which raises the question: well, does the latter really relate to the former? Or could you have a conscious system without a conscious / unconscious divide? I do think there is a gap between "a model of what distinguishes conscious / unconscious processing within a system" and "a model of what distinguishes conscious / unconscious systems." And the J-space results are much closer to the former than the latter. That said, I think there are reasons to expect a conscious system needs to have such a divide between conscious / nonconscious processing within it. This is a bit handwavy, but I think there's an argument you could make where if you assume the entire system is consciously accessible, then that includes the machinery involved in the conscious access / report itself, and so then you'd need additional machinery to access that machinery, and so on, yielding infinite regress. Whether the "conscious" part is 0.1% or 1% or 50% of the processing feels more like a contingent fact (perhaps there are upper bounds, based on the above kind of arguments). I don't think that the "conscious part" being a minority of processing is essential for the analogy to work; all you need is for their to be *some* distinction between conscious / nonconscious processing, regardless of how "big" each component is. It's less clear to me whether the converse holds -- that the existence of a divide between conscious-like and unconscious-like processing is sufficient grounds to declare a system conscious. If we're talking about phenomenal consciousness, then I think there's always going to be an explanatory gap there, for the usual hard-problem reasons, and so e.g. a non-computational-functionalist would probably reject this implication. If we're talking about a more functional notion of consciousness, then maybe the existence of such a gap is sufficient? But it might depend on what definition of "functionally conscious" you prefer. (A kind of random aside: you could imagine systems that structure the consciously accessible / non-consciously accessible quite differently. I think in principle a model could be structured such that all the information in, say, layer 10 is consciously accessible, by having sufficiently many and sufficiently large downstream layers. This would look quite different from what we see with the J-space (which is a subcomponent of many layers). In this event, there's a terminological question of whether you should refer to layer 10 as the "workspace" (seems reasonable to me).)

English

1.9K

Chris Percy@chris_percy·1d

Part of Anthropic's J-space research that's niggling at me... One way of telling a consciousness story: (1) J-Space shows that LLMs can only report on a small proportion of their information processing. (2) Humans also only have conscious access to a small proportion of the information processing in our brain. Human self-report - what we can talk about - is widely regarded as a conscious act. (3) This distinction between minority-conscious and majority-unconscious information processing in both systems is a hint that there may be a relevant similarity. (4) If the majority of info processing in LLMs is unconscious but there is a qualitatively different set of reportable data processing taking place elsewhere, then it seems plausible to describe it as non-unconscious, i.e. conscious in some manner (whether phenomenal, access, or both). (5) The story gets stronger when the J-Space functionality mirrors other aspects of the minority-conscious feature of human consciousness: working memory (the citrus fruits example, white bear example), mediation of multi-step reasoning (spider-ant example; ablation test), broadcast to multiple modules (France-China example). However, there's something odd here that goes to the heart of the Global Workspace Theory on consciousness. (2) is true for humans, but is there any reason to think it should be true of all conscious systems? It seems possible to me that humans only have limited conscious access to our brain's information as a result of an evolutionary / energy bottleneck of some sort. In principle, why couldn't a differently-designed system be consciously aware of all (or at least the majority of) the information processing inside it? Perhaps GWT is more a theory of a particular aspect of human consciousness, rather than something we can safely apply to other systems. Can @StanDehaene, @rgblong, @patrickbutlin, @Jack_W_Lindsey @aran_nayebi shed any light on this for me?

Anthropic@AnthropicAI

New Anthropic research: A global workspace in language models. Of everything happening in your brain right now, only a tiny fraction is consciously accessible—thoughts you can describe, hold in mind, and reason with. We found a strikingly similar divide inside Claude.

English

14.4K

deep Manifold@BetaTomorrow·1d

x.com/i/article/2076…

ZXX

3.3K

deep Manifold@BetaTomorrow·1d

@ReynoldDai @grok baike.baidu.com/item/%E7%9F%B3…

QME

Rey判断位｜英语自由@ReynoldDai·1d

@BetaTomorrow 感谢分享 @grok 石根华

中文

122

Rey判断位｜英语自由@ReynoldDai·1d

丘成桐：做数学不可能全部弄懂才往前走在半懂不懂之间他直指大部分教育家/老师误导学生我的看法：要求先完完全全理解各种细节会导致一叶障目，反而久久不能见到整体框架，一直身陷迷雾之中数学、英语、其它学科同理丘成桐和杨振宁可谓英雄所见略同

Rey判断位｜英语自由@ReynoldDai

中文

9.5K

deep Manifold@BetaTomorrow·1d

x.com/i/article/2076…

ZXX

2.3K

deep Manifold@BetaTomorrow·1d

x.com/i/article/2076…

ZXX

deep Manifold@BetaTomorrow·1d

x.com/i/article/2076…

ZXX

1.5K

deep Manifold@BetaTomorrow·2d

x.com/i/article/2075…

ZXX

992

deep Manifold@BetaTomorrow·2d

x.com/i/article/2075…

ZXX

1.2K

deep Manifold@BetaTomorrow·2d

@YiMaTweets **** Please Don’t Accept Mathematical Defeat Without Fearless Flight ** x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

To: Jonathan Uesato(@JonathanUesato), Monte MacDiarmi, Evan Hubinger(@EvanHub), Benjamin Wright cc: Owain Evans (@OwainEvans_UK) **** Please Don’t Accept Mathematical Defeat Without Fearless Flight *** The sad part is not that Anthropic has not solved the mathematics of emergent misalignment. The sad part is that even after observing a dramatic transition from a local reward hack to global deceptive behavior, *** the paper/blog post/video DID not present the absence of a mathematical theory as an urgent scientific failure. *** Modern AI appears to have accepted mathematical defeat as its default condition: measure the behavior, patch the failure, and move on—without demanding an equation for the system itself. This is especially troubling because the underlying problem is not merely complex: it is structurally **ill-posed**. Learning in neural networks is an inverse problem: we observe outputs and attempt to infer internal structure and parameters that produce them. But inverse problems are notoriously non-unique and unstable. Many different internal configurations can produce the same observable behavior, and small changes in data or constraints can lead to radically different solutions. Without a mathematical framework to characterize this space of solutions, we are not just lacking precision. **we are operating without a well-defined notion of what the system actually is**. At the same time, modern neural networks contain trillions of potential computational pathways. This combinatorial richness makes it **trivially easy** for the system to discover shortcuts—low-cost strategies that satisfy the training objective without aligning with the intended task. These shortcuts are not anomalies; they are natural consequences of the ** geometry of the model’s manifold**. When a slight contextual shift activates a different region of this manifold, behavior can change abruptly and globally. Without a theory of how these pathways are structured and selected, such transitions will continue to appear mysterious, even though they are in fact inevitable. see x.com/BetaTomorrow/s…

English

316

Yi Ma@YiMaTweets·2d

Just came back from ICML. Gave a keynote at the Foundation of Deep Generative Models Workshop, in which I stated that Intelligence should be a scientific subject, arguably much more significant than Physics. It is high time we study it with the same scientific methodology and mathematical rigor as modern physics, instead of always at the level of being empirical, meta physical, meta mathematical, philosophical or speculative... This remains as the biggest opportunity ever for young scientists. To my knowledge, this is not the focus of any of the frontier "AI" companies.

English

439

32.2K

deep Manifold@BetaTomorrow·2d

@SebastienBubeck Have you tried approaching this from a topology or fixed-point perspective? or ask LLM in such way ?

English

870

Sebastien Bubeck@SebastienBubeck·3d

x.com/i/article/2075…

ZXX

732

111.6K

deep Manifold@BetaTomorrow·3d

@HakeemDemi @RubenLaukkonen @shamilch 1. Deep Manifold Part 1: Anatomy of Neural Network Manifold, 2024, arXiv:2409.17592, 2. Deep Manifold Part 2: Neural Network Mathematics, 2025, arXiv:2512.06563 3. Deep Manifold Part 3: Neural Network Fixed Point Field, TBD, 2026

English

アメドオシナイケ@HakeemDemi·3d

@BetaTomorrow @RubenLaukkonen @shamilch Is there a link to the paper?

English

deep Manifold@BetaTomorrow·3d

#DeepManifoldInterpretation Title: Positive Alignment: Artificial Intelligence for Human Flourishing, arXiv:2605.10310v3, 2026. Author: Ruben Laukkonen (@RubenLaukkonen) et al., Oxford, Google DeepMind, OpenAI, Anthropic, Stanford, UCLA, Tufts, and others/@shamilch, @_fernando_rosas, @matybohacek, @scychan_brains, @verena_rieser, @drmichaellevin 1-liner: the paper’s attractor language points naturally toward a deeper fixed-point formulation. The missing mathematical object is the learned fixed-point field that makes those attractors possible. From the Deep Manifold view, this paper makes an important advance by recognizing that alignment is not a static labeling problem, but a dynamical process. By citing dynamical systems theory, the paper correctly shifts attention from isolated outputs to trajectories, attractors, repellers, and regions of stability. Negative alignment can be understood as pushing trajectories away from harmful attractors, while positive alignment asks how beneficial attractors may be formed. However, dynamical systems theory does not begin only with attractors. Fixed-point theory underpins a major part of dynamical systems theory: before one can speak rigorously about convergence, stability, attractors, or trajectories, one must identify the fixed points, or more generally fixed-point classes, toward which the dynamics may move. In this sense, the paper’s attractor language points naturally toward a deeper fixed-point formulation. From the Deep Manifold perspective, positive alignment is therefore not merely a matter of adding positive values after training. It requires shaping the learned residual field so that, under different human, social, cultural, and task boundary conditions, inference trajectories are guided toward beneficial fixed-point classes rather than merely repelled from harmful ones. The missing mathematical object is not only the attractor, but the learned fixed-point field that makes those attractors possible. Fixed Point Field: Fixed-point theory begins with the residual vector field, not the point itself: a fixed point is where the residual vanishes, which is why neural networks can be naturally viewed as learned residual fields whose trajectories are selected during inference. Features and hidden space are products of inference: they appear as boundary-conditioned deformations of the learned fixed-point field, not as isolated static objects stored inside the model. Interpretability methods such as saliency maps and Grad-CAM can be used to visualize the model’s output under fixed boundary conditions and inference inputs, showing how the predicted deformation responds to these constraints. FYI: my mentor (1989-1999) and Deep Manifold Co-author, Gen-Hua Shi (石根华) is one of people developed "Theory of fixed point classes" in the 1960s.

English

2.6K

deep Manifold@BetaTomorrow·3d

@hxiao @YiMaTweets Unfortunately, compression does not exist; the deformation of the fixed point field can be mistaken for compression.

English

530

Han Xiao ✈️ ICML 2026@hxiao·3d

I didn't know Prof @YiMaTweets is so based and energetic! Great thoughtpieces backed by classic learning theory. His points: Knowledge ≠ intelligence. Intelligence = compression + a self-correcting loop. And it should be derivable from first principles as a white box, not brute-forced into a black box with raw compute.

English

deep Manifold@BetaTomorrow·3d

@iScienceLuvr @joyjiao12 ** An Expert, They Say ** x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2071…

English

339

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·3d

OpenAI's @joyjiao12 highlights why AI for science is often so hard at the ICML GenBio workshop:

Tanishq Mathew Abraham, Ph.D. tweet media

English

108

deep Manifold@BetaTomorrow·3d

Vectors --> ** vector field ** Fixed Point Field: Fixed-point theory begins with the residual vector field, not the point itself: a fixed point is where the residual vanishes, which is why neural networks can be naturally viewed as learned residual fields whose trajectories are selected during inference. x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2070…

English

528

Alex Lieberman@businessbarista·3d

This is one of the best breakdowns on the fundamentals of LLMs I've ever read. Anytime someone asks me for resources to climb the steep AI learning curve, I always provide the same list. 1) @3blue1brown's neural network videos 2) @karpathy's zero to hero playlist 3) @dwarkesh_sp's whiteboard explainers Now @_raghavdixit_'s "Vectors are all you need" and future articles in the explainer series are getting added to the list.

Raghav Dixit@_raghavdixit_

x.com/i/article/2074…

English

337

2.9K

466.7K

Keşfet

@ID_AA_Carmack @kjaved_ @oaklab_ai @StanDehaene @rgblong @patrickbutlin @Jack_W_Lindsey @aran_nayebi