deep Manifold

14.2K posts

deep Manifold

@BetaTomorrow

mathematics Thief & Chief "through the window of differential equations, mathematics sees the light in the real world" / "通过微分方程的窗子,数学家看到现实世界的光" (Jiang Zehan)

Seattle เข้าร่วม Haziran 2008

861 กำลังติดตาม942 ผู้ติดตาม

ทวีตที่ปักหมุด

deep Manifold@BetaTomorrow·7 Ağu

“This Is Not How Mathematicians Are Trained” 数学家不是这样训练的 1, "Solving the forward problem (positive time) and the inverse problem (negative time) together has always been a desire of mathematicians, but they’ve never known where to begin. Neural networks, however, tackle this problem naturally" (将正时间的正问题与负时间的反问题同时求解，一直是数学家的梦想，但他们始终不知道从何入手。而神经网络却能自然地处理这个问题) 2. “Variables, coefficients, even coordinates, are changing. Everything is in flux. This is not how mathematicians are trained. It would be impossible for mathematicians to come up with such a design” (变量、系数，甚至坐标都在变化，一切都处于变动之中。数学家不是这样的训练。这样的设计，不可能出自数学家之手) 3. “Mathematicians tread carefully around composite functions with more than two layers, wary of the many pitfalls, yet neural networks solve them effortlessly, almost nonchalantly.” (数学家在处理超过两层的复合函数时格外谨慎，警惕其中诸多陷阱, 而神经网络却几乎漫不经心地轻松应对) 4. “Neural networks have stacked covers, mathematically speaking, whereas the Numerical Manifold Method typically uses only 3 to 4. In contrast, neural networks stack hundreds or even thousands of such covers. I never imagined anyone would take it that far”. (神经网络在数学上拥有堆叠的覆盖层，而数值流形通常只有三到四层覆盖。相比之下，神经网络则堆叠了上百乃至上千层。我从未想过有人会将其推进到这种程度). That was what Gen-Hua Shi (石根华) told me after returning from a two-week vacation in early June 2024.. see rest of the story, click the link open.substack.com/pub/deepmanifo…

English

3.2K

deep Manifold@BetaTomorrow·17h

Pure nonsense. LLMs are trained on human-written knowledge. They do not stand outside humanity and generate science from nowhere. The “system” is mainly a messenger: it compresses, remixes, and delivers human knowledge faster. That is useful, but it is not Einstein. 我们能不能不要"见风就是雨", 有失教授的"范"

English

Quanquan Gu@QuanquanGu·23h

This assumes scientific progress is still human-centric. It’s already shifting toward system-level discovery: humans, models, and tools co-evolving. The question isn’t “AI vs humans by 2035”, it’s whether we’re ready for a world where Einstein is a system, not a person.

Harvard Physics@harvardphysics

Scientists discuss whether AI could surpass human contributions to physics by 2035 physicsworld.com/a/is-vibe-phys…

English

deep Manifold@BetaTomorrow·1d

@sukh_saroy ** AI Advances Mathematics: Concretely in Practice, Subconsciously in Theory ** open.substack.com/pub/deepmanifo…

English

Sukh Sroay@sukh_saroy·1d

🚨SHOCKING: Everyone is asking if AI can solve math. Nobody is asking the harder question. Can it ask math? A new paper just put frontier LLMs to the test - not on solving problems, but on generating ones worth solving. The kind professional mathematicians would actually care about. The results draw a line in the sand between what AI can do and what intelligence actually is. Here is what happened. The researchers didn't ask models to solve competition problems. They asked them to generate original research-level mathematical questions problems that are novel, non-trivial, and genuinely interesting to the field. Then they had real mathematicians evaluate them. Not for correctness. For interestingness. That single word exposes everything. Correctness is checkable. A proof checker can do it. A compiler can do it. You can automate it. Interestingness cannot be automated. It requires taste. It requires knowing what the field doesn't yet know, what would surprise it, what would open new territory rather than close existing problems. This is the gap the paper is measuring. And the gap is real. Models can generate problems that look like math research. The syntax is right. The notation is right. They sit in the right zip code of difficulty. But mathematicians reading them respond the way you respond to a painting that is technically flawless and completely forgettable. Correct. Competent. Empty. What the paper surfaces is not a benchmark failure. It is a structural one. LLMs are trained to predict what comes next in human-generated text. Mathematical research problems that are genuinely interesting are, almost by definition, things that have not yet appeared in human-generated text. They represent gaps in the existing literature, not continuations of it. You cannot predict your way to the frontier. You have to exceed it. The industry has spent three years celebrating every time a model clears another benchmark. Olympiad problems. Putnam exams. IMO silver medals. All of that is impressive. None of it is the same as asking: can this system advance mathematics? Solving problems that humans have already posed is not the same as knowing which problems are worth posing next. AlphaProof proved theorems. It did not choose which theorems mattered. That choice, what to work on, what would be surprising, what the field is ready for, is where human mathematical judgment lives. And it is exactly what this paper tests. The benchmark for AI in mathematics is not the last olympiad. It is the next open problem.

English

4.4K

deep Manifold@BetaTomorrow·1d

@elonmusk simulation is iteration

English

Elon Musk@elonmusk·1d

I had dinner once with a top physicist and a top computer scientist and asked what they thought the probability was that we were in a simulation. They answered simultaneously at 0% and 100% respectively. It was like a double-slit experiment, but with humans.

Interstellar@InterstellarUAP

🚨 Simulation Theory: The Double Slit Experiment proves particles act like waves until observed then they snap into particles. What if our reality only "renders" when we're looking, just like a video game optimizing resources? Check out this episode from The Why Files breaking it down, tying it to Simulation Theory. Are we in a sim? This could be the key to unlocking the true nature of existence! The Why Files video did a great job on explaining the Double Slit Experiment & Simulation Theory What do YOU think—real or rendered? Drop your thoughts below!

English

9.7K

14.8K

129.9K

43M

deep Manifold@BetaTomorrow·3d

Learning is fundamentally an inverse problem, Neural network is learnable numerical computation. Your A/B/M decomposition maps cleanly onto a geometric interpretation: System A constructs the stacked piecewise manifold, System B traverses intrinsic pathways on that manifold, and System M governs the boundary conditions that select which pathways converge. In this view, learning is not a pipeline but a coupled fixed-point system defined by geometry and constraints. System A is best understood as manifold construction rather than representation learning. It assembles local charts, overlaps, and low-order structures that together form a discretized global geometry. What is learned is not a function in the classical sense, but a coordinate system over a high-order nonlinear data manifold. System B then operates as pathway selection and traversal. Actions are not just decisions but trajectories across overlapping manifold regions, and rewards act as weak boundary signals that stabilize certain routes. What we call policy is, in effect, a family of admissible intrinsic pathways that consistently reach stable endpoints. The coupling of A and B gives rise to multiple coexisting fixed-point basins. Learning does not converge to a single solution but to a structured set of stable equilibria shaped by both data geometry and architectural constraints. Generalization emerges from the ability to move between these basins while preserving geometric consistency. System M is most naturally interpreted as a boundary-condition generator rather than a controller. It determines how the iterated process is initialized and guided, effectively selecting which intrinsic pathways are activated. In this sense, prompts, rewards, and curricula are all instances of boundary conditions rather than objectives. A deeper implication is that learning should be framed as an inverse problem constrained by geometry. The objective is not fully specified in advance; instead, solutions emerge as stable configurations consistent with manifold structure and boundary conditions. From this perspective, autonomous learning requires continuous co-evolution of geometry, pathways, and boundary conditions, not just improved optimization or control mechanisms. more on deepmanifold.ai

English

Jitendra MALIK@JitendraMalikCV·4d

With Emmanuel Dupoux scp.net/persons/dupoux/ and Yann LeCun @ylecun, we consider a cognitive science inspired AI. We analyse how autonomous learning works in living organisms, and propose a roadmap for reproducing it in artificial systems. lnkd.in/eNWDmuqT

English

444

59.6K

deep Manifold@BetaTomorrow·3d

@naki2012 学习在本质上是一个反问题. ...more on deepmanifold.ai

中文

Nagi Yan@naki2012·3d

大语言模型内部并不存在语言本身，只存在关系和几何结构。大模型的输出不是词预测，而是带权几何结构的最优延拓。

中文

812

deep Manifold@BetaTomorrow·3d

Really enjoyed this paper, especially the result that early domain data consistently outperforms late fine-tuning. One way to interpret this is geometric: pretraining happens before the model’s internal structure has stabilized, so domain data participates in shaping the representation itself, not just adjusting it. From a manifold perspective, later training stages operate after stationary solution sets have already organized into coupled structures (what I’d call interconnected toroidal geometry). At that point, fixed-point basins are largely formed, and learning becomes constrained to local deformation or drift along existing solution directions, rather than creating new ones. This suggests the “Finetuner’s Fallacy” is not just empirical but structural: late data is applied after the geometry is already constructed. Early data helps define the geometry; late data has to work within it. So the efficiency gap is less about optimization, and more about when the manifold itself is being formed. more on deepmanifold.ai

English

Ari Morcos@arimorcos·4d

Industry tends to default to fine-tuning for domain adaptation because it seems cheaper, but only if you don't consider inference. In new work from @datologyai, we show that mixing domain-specific data early into training drives better performance and reduces inference costs.

Christina Baek@_christinabaek

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

5.1K

deep Manifold@BetaTomorrow·3d

@_LuoFuli great work, however, In April 2018, during the Tesla Model 3 "production hell" phase, Elon tweeted:"Yes, excessive automation at Tesla was a mistake. To be precise, my mistake. Humans are underrated." everything has context and own limits

English

142

Fuli Luo@_LuoFuli·3d

MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era. I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a process that was thrilling, painful, and fascinating all at once. The 1T base model started training months ago. The original goal was long-context reasoning efficiency. Hybrid Attention carries real innovation, without overreaching — and it turns out to be exactly the right foundation for the Agent era. 1M context window. MTP inference for ultra-low latency and cost. These architectural decisions weren't trendy. They were a structural advantage we built before we needed it. What changed everything was experiencing a complex agentic scaffold — what I'd call orchestrated Context — for the first time. I was shocked on day one. I tried to convince the team to use it. That didn't work. So I gave a hard mandate: anyone on MiMo Team with fewer than 100 conversations tomorrow can quit. It worked. Once the team's imagination was ignited by what agentic systems could do, that imagination converted directly into research velocity. People ask why we move so fast. I saw it firsthand building DeepSeek R1. My honest summary: — Backbone and Infra research has long cycles. You need strategic conviction a year before it pays off. — Posttrain agility is a different muscle: product intuition driving evaluation, iteration cycles compressed, paradigm shifts caught early. — And the constant: curiosity, sharp technical instinct, decisive execution, full commitment — and something that's easy to underestimate: a genuine love for the world you're building for. We will open-source — when the models are stable enough to deserve it. From Beijing, very late, not quite awake.

English

334

605

6.9K

2.3M

deep Manifold@BetaTomorrow·3d

@KorbiPoeppel nonlinear is the key, more on deepmanifold.ai ** From High-Order Nonlinear Data to Deep Manifold** open.substack.com/pub/deepmanifo…

English

Korbinian Poeppel @ NeurIPS 2025@KorbiPoeppel·4d

So non-linear RNNs are back in the game beyond xLSTM! Nice work improving upon our #FlashRNN library using matrix valued hidden states!

fly51fly@fly51fly

[LG] M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling M Mishra, S Tan, I Stoica, J Gonzalez… [UC Berkeley & MIT-IBM Watson Lab] (2026) arxiv.org/abs/2603.14360

English

9.9K

deep Manifold@BetaTomorrow·3d

@MiniMax_AI more on deepmanifold.ai

English

MiniMax (official)@MiniMax_AI·4d

During the iteration process, we also realized that the model's ability to recursively evolve its harness is equally critical. Our internal harness autonomously collects feedback, builds evaluation sets for internal tasks, and based on this continuously iterates on its own architecture, skills/MCP implementation, and memory mechanisms to complete tasks better and more efficiently.

English

707

141K

MiniMax (official)@MiniMax_AI·4d

Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 - Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents to 3-min on certain occasions. - Advanced Agentic Abilities: Trained for Agent Teams and tool search tool, with 97% skill adherence across 40+ complex skills. M2.7 is on par with Sonnet 4.6 in OpenClaw. - Professional Workspace: SOTA in professional knowledge, supports multi-turn, high-fidelity Office file editing. MiniMax Agent: agent.minimax.io API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke…

English

204

416

3.4K

1.8M

deep Manifold@BetaTomorrow·5d

@_albertgu An elegant integral; we view neural networks as a form of generalized calculus, 积分精妙；我们将神经网络视为一种广义微积分 more on deepmanifold.ai

中文

Albert Gu@_albertgu·5d

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

English

313

1.6K

416.8K

deep Manifold@BetaTomorrow·5d

@k1rallik x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME

BuBBliK@k1rallik·14 Mar

x.com/i/article/2032…

ZXX

116

487

3.6K

3.3M

deep Manifold@BetaTomorrow·5d

@Saccc_c x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME

Sac@Saccc_c·5d

中英文信息差依然存在且巨大我在3月9日发布了关于Mirofish的思考文章后，撸了百万流量，并引起了全网关注英文博主在3月14日发布了Mirofish文章，并撸了三百万流量。这件事情告诉我： 1、好的文章下次中英文版本都应该发一版。 2、跨语言搬运依然是当下撸流量的好方式之一

Sac@Saccc_c

x.com/i/article/2030…

中文

266

81.3K

deep Manifold@BetaTomorrow·5d

@_xyplus_ 是啊, 幸会

中文

XYSkywalker/我在让AI预测人类-Ripple(已开源)@_xyplus_·5d

真是太巧了，前辈+校友+同乡。这种概率有多少？这就是一个Ripple传播的结果。多谢大佬指导🫶

deep Manifold@BetaTomorrow

@_xyplus_ @KathySats 你的Ripple (涟漪）和 MiroFish “种子” 是一样, 在数学上是“微扰动", 微扰动在不定点理论中, 至关重要. x.com/BetaTomorrow/s…

中文

238

deep Manifold@BetaTomorrow·5d

@_xyplus_ @KathySats 你的Ripple (涟漪）和 MiroFish “种子” 是一样, 在数学上是“微扰动", 微扰动在不定点理论中, 至关重要. x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

中文

432

XYSkywalker/我在让AI预测人类-Ripple(已开源)@_xyplus_·5d

@KathySats hi美女，瞅瞅我这边推演预测引擎：github.com/xyskywalker/Ri… 为了解决OASIS（MiroFish的预测引擎）的一些问题，同时为了更好的和人类行为对齐，我干脆动手自己从底层开始手撸了一个引擎，引入已经成熟的CAS理论，外加合议庭纠正LLM结果漂移。另外和OpenClaw原生对接的Skill马上就要上线了。

中文

5.6K

Kathy.xyz@Kathydotxyz·5d

用Gemini + OpenClaw + MiroFish沙盘模拟了伊朗形势，结果.... 上周日我看到Mirofish这个项目后，觉得太好玩了，就想到这个idea 于是就让我的🦞配置了MiroFish，结果分析得透透是道👇

中文

458

111K

deep Manifold@BetaTomorrow·5d

@de1lymoon x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME

Alex@de1lymoon·5d

MiroFish: 1,000,000 AI agents are debating your future How it works: - You upload a news item, report, or event - The system builds a graph of relationships between entities - It launches thousands of agents with different beliefs - The agents debate, influence each other’s opinions, and form coalitions - A map of possible scenarios emerges from the chaos The only formula that matters after that is: EV = p · W − (1 − p) · L Where p is the frequency of the scenario in the simulation 1,000,000 runs. It happened 3200 times. p ≈ 0.32

Alex@de1lymoon

Stop Guessing the Market. Start Running Simulations Most models try to predict the future. MiroFish does it differently: it simulates it p ≈ 320 / 1000 → EV = pW − (1 − p)L Instead of giving you one forecast, MiroFish builds a digital world from news, policy drafts, earnings reports, and market signals then fills it with thousands of AI agents and lets them react > They argue > They set up camps > They amplify narratives > They change their beliefs under pressure And after hundreds or thousands of runs, you don’t get a prophecy you get scenario frequencies. > If one outcome happens 320 times out of 1000, then p ≈ 0.32 From there, it becomes a decision problem: - calculate Expected Value - compare your probability to the market’s - use Kelly to size the bet correct

English

206

1.4K

249.4K

deep Manifold@BetaTomorrow·5d

@polydao x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME

Mr. Buzzoni@polydao·6d

x.com/i/article/2033…

ZXX

594

514.2K

deep Manifold@BetaTomorrow·5d

@Mikocrypto11 x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME

0x_Miko@Mikocrypto11·6d

有人刚把一个新 bot 搭出来它会先用 MiroFish 对每一个即将发生的 Bitcoin / crypto 事件跑高拟真的 swarm simulation 然后再把结果直接接到 Polymarket 实盘上做交易目前测试阶段，已经跑出： $12,000+ / day 这次是真的没忍住研究完 MiroFish 之后，直接把它和 OpenClaw、Claude Opus 4.6 拼到了一起一天时间，搭出了第一版私有 Polymarket bot 这套系统现在做的事很直接： → 生成成千上万个带记忆和性格的 agents → 跑完整的 GraphRAG swarm simulation，去模拟新闻、ETF 资金流、宏观数据、whale activity、市场情绪会怎么影响 Bitcoin → 专门对 Polymarket 的 Bitcoin 合约推演成千上万个可能路径 → 找出市场 crowd probability 和模拟结果之间的错价 → 一旦出现 edge，直接通过 OpenClaw 自动进场我现在就在实时测试这套 bot + MiroFish simulator 第一轮结果已经开始很硬了而且平台上也已经有一个很像这套路子的真钱包在跑目前数据是：累计利润 $321k 日均 $12k Bitcoin 市场胜率 100% 我的 Polymarket 主页和完整交易记录，等我把规模继续放大后再放出来新的 meta 可能已经来了这次你觉得真的是新一代 edge，还是又一轮 AI bot 叙事？

0x_Miko@Mikocrypto11

一个中国大学生，花 10 天搭出了一套多智能体预测引擎 MiroFish。项目直接冲上 GitHub 热榜，当前已经到 23k+ stars，还拿到了 3000 万人民币投资这东西本质上不是普通 agent demo 它更像一个数字沙盘：把新闻、政策、金融信号丢进去，然后放出成千上万个带记忆、带行为逻辑的 AI agents，让它们像真实社会一样互动、争论、演化，再去推演结果。做这件事的人叫郭航江（BaiFu）。公开报道里，他是大四学生，MiroFish 爆火后，获得了盛大集团创始人陈天桥的 3000 万人民币投资。这套东西能拿来干什么？交易：把宏观消息、财报、市场信号喂进去，看模拟社会怎么反应。公关：先跑一遍舆情，看看声明发出去会不会翻车。创意实验：甚至可以拿小说设定做角色推演，看故事会怎么发展。更狠的是，项目本身就支持 Docker 部署。有 LLM API key，几分钟就能跑起来。很多人还在手动猜市场。已经有人开始搭 AI swarm，先在数字世界里把市场反应跑一遍，再决定真金白银怎么下你觉得这种 “先模拟社会，再交易结果” 的玩法，会不会才是下一代 prediction market 的真正 edge？

中文

208

40.9K

deep Manifold@BetaTomorrow·5d

@Mir0fish x.com/BetaTomorrow/s…

deep Manifold@BetaTomorrow

x.com/i/article/2033…

QME