handongxue

19K posts

handongxue banner
handongxue

handongxue

@likev

after 80'/气象工作者/不苟同/关注天气变化/向往自由/热爱科学、互联网、编程 Node.js Web C++ Julia Python

洛阳 Katılım Nisan 2009
4.9K Takip Edilen4.6K Takipçiler
Sabitlenmiş Tweet
handongxue
handongxue@likev·
买了个域名 genaixue.com 跟AI学 目前重定向至 deepseek 向他们致敬。 以后谁问你问题不想回答了就扔给他这个域名,让他跟AI学
中文
2
3
4
2.5K
handongxue
handongxue@likev·
@riverleaf88 @txyyss 我不觉得自然选择或生存代表什么智能,就像其他动物如蚊子苍蝇老鼠生命力都很强。我始终认为理性聪明或智能的人是极少数。大部分人类令人失望,和智能不沾边。
中文
1
0
0
27
River Leaf
River Leaf@riverleaf88·
脑科学已经揭示了一小部分:DNA想当于给大脑预装了各种神经元连接模式、解构。类似于有了transformer、RCNN等等不同的神经元结构。但是人脑并没有预装什么只是,要靠学习训练,这是自然选择的结果,因为人需要广泛的适应性,无论是适应自然环境还是社会环境——人类是唯一一个适应地球上任何一片大陆的生物。我们的知识和技能只能靠后天习得,但我们的大脑非常适合学习。
中文
3
0
0
46
River Leaf
River Leaf@riverleaf88·
很多人对世界模型这个概念不看好,觉得是立昆老师故弄玄虚,不务实的表现。 但我想说,如果养过娃就知道,或者宠物就知道,人类和动物都只需要极少的训练数据集就能拥有很全面的智能。而且学习速度也飞快。哪怕是语言本身,人所需要的训练集也是非常少的,远远低于LLM基本学会说话所需的数量级。 可知LLM还远不是最终的答案。
Frank Wang 玉伯@lifesinger

听小珺 @zhang_benita 访谈谢赛宁 @sainingxie 的播客,太过瘾了。太多感触,说几个印象最深的点: 1. 世界模型远大于语言模型。我们每个人脑子里都有一个世界模型,比如知道把手放到火上烤会很痛,由此就不会把手放在火上烤。让你不会无缘由把手放在火上烤的模型,就是世界模型。 2. 世界模型是:Next state = M(state, action)。这个 M 就是世界模型。M 不是预测 next token,而是预测 next state. 比如:手很痛 = M(手不在火上, 把手放在火上)。世界模型的预测能力,可以让拥有世界模型能力的生命知道不做什么或做什么。 3. 从世界模型的视角再看大语言模型,就会发现语言的核心是沟通。沟通就意味着存在监督:说出来的,往往是加工过的。LLM 是毒药,Vision 才是无污染的。 4. Scaling law 是吞数据的能力。数据越多,效果越好。LLM 需要 Scaling law,可世界模型不一定需要。这是最有意思的部分,也是最难的部分。谢赛宁头大中,期待某种玄学的力量,突然某天能点连成线,灵光开悟。那样,就可以开始造生灵。 5. 用非机器人的方式,或许能真正解决机器人的困境。机器人领域,可能正在经历 LLM 领域曾经的 Bitter Lesson. 比如春晚的机器人炫技,或许只是曾经 CV 领域的识别猫猫狗狗。 6. 硅谷陷在 LLM 的述事里。硅谷之外的地方,对世界模型非常感兴趣。真正的智能,还在黑暗的探索期。语言很重要,然而整个宇宙的历史里,如果压缩到一天,有语言的时间,才几秒。 7. 人依旧很重要。比如 research taste、比如做研究实验时的 choices 等等。《金刚经》能提升人的独立思考性和研究品味。 8. Impact 不重要。奔着 impact 去做事,是一种自私。分享出来,让读者有启发,激发读者去做些事,这才是发 paper 的价值。 谢赛宁太可爱了。听完后,特别期待小珺下一期采访恺明。

中文
3
0
8
2K
handongxue
handongxue@likev·
@riverleaf88 @txyyss 哈哈 我认为 AI 这种方式挺好。人类这种每次从头学知识太慢了。AI 学得挺快,学好了可以复制无限份。哈哈扯远了。
中文
1
0
0
23
handongxue
handongxue@likev·
@riverleaf88 @txyyss “人类和动物都只需要极少的训练数据集就能拥有很全面的智能。” 我认为人类 DNA 就相当于 LLM base 模型,后期少量样本相当于fine tune 和 RL 。LLM 还有很大潜力,世界模型到底是啥目前好像也说不清。目前人类能耗远低于 AI,我认为这是 GPU 硬件限制,以后可能有很好硬件。
中文
1
0
0
43
River Leaf
River Leaf@riverleaf88·
@txyyss 我可没有确信。我只是说LLM远不是way to AGI的答案。至于世界模型,只是吐嘈一些一些人都不想让人家探索了,立昆也不是花的他的钱。
中文
1
0
0
114
#endif
#endif@caterpillarous·
@likev @__endif 我记得codex最早也是js 追赶者总是喜欢模仿前面的人😁 我确实比较介意当时居然没拿go写
中文
1
0
0
26
handongxue retweetledi
#endif
#endif@caterpillarous·
解除了锁推,因为从Google辞职了不用再害怕吐槽同事被发现了 现在创业中求求大家关注 @__endif 吧,起号好难 正式的吹牛是: - 我从23年开始做LLM coding,领导了内部最初的coding agent,并在24年I/O成为了Google第一个正式发布的coding agent - LLM方面我从Palm开始贡献,一直到Gemini 3.0 - 写过Gemini technical report,被邀请到巴西去第一届AIWare讲过agent,发过Google developer blog,给Google Jules发过专利 - 读过博士项目退过学,做过程序分析,报过CVE 之前的各种胡言乱语都写在 idle.systems 了,因为可以搞个有趣的邮箱叫做 process@idle.systems 求转发啊啊啊啊我tm要起号
中文
68
58
403
94.4K
#endif
#endif@caterpillarous·
@likev @__endif Gcp以做不出好产品出名 当时大家已经被Claude code镇住了,deepmind有俩人仿了一个,就是个prototype,但马上被gcp以合作之名拿走并抢先发布了,问题是他们根本不做coding(是的,公司内部也有抢先发布的竞争关系 后来我拿Go写了个Jules CLI,但是被警告说功能不能与Gemini CLI有重叠 😅
中文
1
0
0
133
handongxue
handongxue@likev·
有进步,但吹得过头了。什么自我进化之类的吹捧充斥国内外。 小于 500B 参数,智能有限,不适合复杂任务。
Zhihu Frontier@ZhihuFrontier

🔍Follow Zhihu contributor toyama nao, a top large model reviewer, to evaluate @MiniMax_AI MiniMax-M2.7's capabilities in detail!✨ 📌 Basic Info: MiniMax iterates monthly in the Agent-driven model track. As a minor version upgrade, M2.7 carries its new understanding of the recent Agent boom. Its overall performance is on par with the previous generation, but key Agent capabilities are significantly upgraded. Despite tight computing power pressure ⚡, it maintains an average speed of 65 tps. ** Scores & ranking are shown in Fig 1 💪 What's improved 1️⃣ Instruction Following: M2.7 has obvious upgrades in direct/indirect instruction execution, but with reduced stability. It scores full marks in #59 long code derivation but may drop to "unusable" for medium-complexity tasks, leading to occasional misinterpretation or repeated programming corrections. 2️⃣ Context Hallucination: Significantly improved 🛡️, achieving perfect scores in typical information extraction tasks with only ±1 error in word counting. However, similar/duplicate context reduces accuracy (worse than Opus in long log analysis), but its worst performance is better than M2.5. 3️⃣ Coding Ability: No substantial upgrade in engineering design, but M2.7 more frequently writes SPEC.md/README.md to record project logic 📝, excelling in large-code/multi-round dialogue scenarios but struggling with difficult problems (needing retries or manual intervention). ⚠️ Shortcomings: • Complex Reasoning: Hard intelligence regresses slightly – no longer achieves perfect scores for previously solvable questions, with lower limits. It consumes 50%-100% more Tokens (excessive enumeration) and is more likely to hit MaxToken limits, increasing complex task costs 💰. 💬 In conclusion In the Agent boom era, "high-low model matching" is a consensus. MiniMax abandoned the super-model route early, focusing on M2 series' Agent/programming strengths. M2.5 widened the gap with Zhipu by riding the Claude alternative/OpenClaw wave, but competition is fierce – the next game-changer may be imminent 🌊. 🔗 Original Article: zhihu.com/question/20176… 📝 Benchmark: zhuanlan.zhihu.com/p/201117399166… #MiniMax #LLM #Agent #AI #Tech #Insight

中文
0
0
0
266
Rohan Paul
Rohan Paul@rohanpaul_ai·
@Propriocetive Beautiful coverage of some super relevant topics. read only a small part till now, planning to give it a more dedicated time.
Rohan Paul tweet media
English
2
2
29
2K
Logan Matthew Napolitano
Logan Matthew Napolitano@Propriocetive·
I just published a 459-page book. Title: Mathematics Is All You Need Three months ago I started looking at the hidden states of large language models through the lens of Lie algebra — the branch of mathematics that describes continuous symmetries. What I found was not what I expected. Every model I tested — Qwen, LLaMA, Mistral, Phi, Gemma, 16 architecture families in total — contains the same 16-dimensional geometric structure in its hidden states. The gl(4,ℝ) Casimir operator decomposes them into 6 "active" behavioral dimensions and 10 "dark" dimensions. The dark dimensions are erased every single layer by normalization. The model rebuilds them every single layer from its weights. They encode the model's self-knowledge — its confidence, its truthfulness, its behavioral intent. And until now, nobody knew they were there. Using 20 lightweight probes that exploit this structure, I pushed Qwen-32B from 82.2% to 94.4% on ARC-Challenge. No fine-tuning. No prompt engineering. No chain of thought. Pure mathematics. The probes transfer across architectures without retraining. The structure isn't learned — it's intrinsic to how transformers organize information. I did this on a single NVIDIA RTX 3090 in my office. 190 patent applications filed. Proprioceptive AI, Inc. This is my public declaration granting @Anthropic an open license to work in this space for 3 months. They are currently the first and only company I've extended this to. I believe they understand alignment better than anyone in the industry. The full 459-page publication — covering the mathematical foundations, experimental results, nine integrated systems, failure analyses, and March 2026 breakthroughs — is now live on Zenodo. I welcome collaboration inquiries. Full publication: zenodo.org/records/190801… Logan Matthew Napolitano Founder, Proprioceptive AI, Inc. logan@proprioceptiveai.com proprioceptiveai.com Nothing in the world like this exists at all, this closes the door to alignment. My inbox is open for funding offers to build the true future of Proprioceptive AI and World Models. Not a theory but a full reproducible guide, existing products and a true mission on Alignment @grok @elonmusk @xai @AnthropicAI
English
45
143
1K
181K
handongxue
handongxue@likev·
added url support for: arxiv -> pdf github -> zip X long article -> markdown
English
1
0
0
157
handongxue
handongxue@likev·
vibe code 一个文件上传及下载服务 ht-tps://t.iread.fun/code/ 1. text 上传为 .txt 文件 2. 提交 url 后台异步上传 3. 文件短点续传,比如第一次上传一半,第二次继续上传后一半 上传成功后可获取下载链接
handongxue tweet media
中文
2
0
2
450
handongxue
handongxue@likev·
@vanstriendaniel nanobana 2 not work well Note: In 2026, many evaluations have shifted toward the much harder "MMLU-Pro" or agentic benchmarks like SWE-Bench, as standard MMLU performance has largely capped out for frontier models
handongxue tweet mediahandongxue tweet media
English
0
0
1
41
阿川 | AI thinking
阿川 | AI thinking@AI_jacksaku·
玩过openclaw🦞的都知道minimax有多垃圾😅
中文
69
1
95
52.7K