snow

356 posts

snow

@lstmfpga

Collecting good ML papers. llm, self attention, transformers, flow matching, geometry.

Katılım Eylül 2015

312 Takip Edilen41 Takipçiler

snow@lstmfpga·3h

@Konekoutena 日本連一個可以走路的 AI 機械人都做不出來。日本新聞的佛寺機械人，偷買 unitree的，籃球機械人，偷買優必選的。全部都是偷雞頂包，拿中國的機械人去充替。

中文

236

枫糖小猫@Konekoutena·7h

日本AI全世界最差，目前全国使用过AI的人不足三分之一。日本是怎么敢提AI的？这恰恰是日本最大的短板。目前AI是中美两强，英韩其次。就连印度也是AI大国，随着AI的普及，日本的差距只会越来越大，连印度都不如。

油汗@slB2HRp4ZwbLlSs

@Konekoutena また、外国人労働者が減ったとしても、日本が取る道は「日本人の奴隷労働化」ではありません。元々高い技術を持つ日本は、AIやロボット、DXによる「徹底的な省人化・自動化投資」を加速させるだけです。また、米は自給率ほぼ100%であり、国民が飢えて途上国化するというのも飛躍しすぎです。

中文

7.4K

snow@lstmfpga·3h

@Clapton_Free 某些日本公司還要求你把上班的每一個小時的工作寫入 excel 記錄裡。只要他們發現你有1,2 個時間沒有具體的事做，就會找你麻煩。夠心理變態吧

中文

1.7K

Genpo Liu@Clapton_Free·6h

因为日本没有把人当人。没有把自己的员工当员工。除了日本以外的国家，员工除了可以坐着上班，办公室都配有饮水机，绿植。好一点的还有零食，健身房，游戏机。空闲时间也可以自由安排，而不会像日本害怕员工手空，永远有做不完的活，没活也要表演认真上班。

せし@xiaobaijing1

日本では昔から「お客様が立ってるのに従業員が座ってるのは失礼」という謎ルールがあった。かつて深圳のホテルで働いていた時、普段はロビーのデスクに座っているが、お客様から話しかけられると立ち上がっていた。それを見たスタッフに「なぜ立ち上がるの？怖いよ」と言われて変だと初めて気づいた

中文

513

47.8K

snow@lstmfpga·6h

@Konekoutena 日本人花太量時間垃圾分類，然後 80% 的垃圾最後都是被火化的，你就知道日本人有多浪費時間和精力。

中文

280

枫糖小猫@Konekoutena·14h

我用一个比喻来讲一讲日本和正常国家之间的区别。假设有一天，政府下达一个命令，让每个人出门和与人讲话前先学三声狗叫。正常国家会先质疑这条政令的不合理，而日本则不仅会无条件执行，还会有各种人为学狗叫找各种合理的理由，把狗叫视为文明和秩序的象征，甚至会主动举报和排挤那些不学狗叫的人，把他们曝光到网上，称他们给他人添麻烦。久而久之每个日本人都提心吊胆，开始研究起怎么狗叫才是最礼貌的方式。日本社会那些ルール绝大多数都像学狗叫一样荒谬和毫无意义，但至今却都在被严格执行着

中文

3.8K

snow retweetledi

AI Dance@AI_Whisper_X·19h

苦涩教训第二弹：只要你算力够，最好的数据过滤器就是不过滤。看完这篇 paper 最大的感受是，rich 老爷子的苦涩教训，这是要到数据侧了？斯坦福的 Hashimoto 发了一篇《A Bitter Lesson for Data Filtering》，核心结论一句话：只要你算力够，最好的数据过滤器就是不过滤。他们的意思是，业界花了好几年打磨的数据清洗 pipeline，在足够大的 scaling 面前，优势可能就不在了。至少在这篇 paper 的设定里，很多小算力阶段看起来合理的过滤策略，放大以后反而会输给最粗暴的方案：直接用完整池子。实验做法其实不复杂。把 Common Crawl 和它的各种过滤版本（轻滤、重滤）同比例缩小，然后看随着模型变大、训练步数增加，哪个池子最终能训练出最好的模型。结果：在 670M token 的 CC 子集实验里，未过滤的完整池子胜过了他们测试的所有过滤版本。后面他们把 pool size 放大两个数量级继续看，至少在 CC vs RefinedWeb 这组对比里，这个趋势仍然稳定。不过他们最多只做了 10B token 的实验，仍然是个非常小的尺度他们还做了更极端的测试：往训练池里注入低质量数据。 ① 先构造一个由 1 万个随机词组成的词表，再从中随机采样拼成文档 ② 把 CC 文档的词序完全打乱其中，词序打乱文档的注入量最高做到原池的 8 倍。结果是，足够大的模型对这类低质量数据表现出惊人的鲁棒性。最反直觉的一个结果是：打乱词序的文档，在 330M 模型上不仅没拖累，反而帮模型超过了纯 CC 池的表现（除了 +800% 那组还没训够）。他们还建了一套 scaling law 来预测：DCLM-Pool 完整的 240T tokens CC 池，最早在 1e30 FLOPs 时就会成为最优选择。而且 1e30 倒也不是那么无法想象。现在前沿模型的预训练算力大约在 5e26 FLOPs 量级；而到 2030 年，已有预测认为单次训练可能到 1e29 FLOPs 换句话说，我们距离“不过滤反而更好”的临界点，可能没有想象中那么远。这其实呼应了 Sutton 原文里的那个核心观察：试图把你对领域的知识编码进算法，长期看往往会被更简单、随算力优雅扩展的方法击败。但有个前提必须说清楚：当算力还是瓶颈的时候，过滤仍然重要。而且更重要的是，随着模型增大，对于算力需求是越来越大的，所以我们可能永远到不了算力不是瓶颈的那一天 hhhh 作者也列出了适用边界：他们讨论的是dense 模型的标准预训练，没有数据课程、数据权重和 post-training；MoE、合成数据、训练后期的数据策略，可能都会是另一回事。而且从另一个角度说，如果 filtering 本身是完美的，我们当然可以 filter #S6" target="_blank" rel="nofollow noopener">arxiv.org/html/2605.1940…

中文

13.5K

snow@lstmfpga·1d

@savage_tw1949 日本人思想：你是低等外國人，你不可以不對我用敬語，但是我可以不對你用敬語。

中文

誠實豆沙包@savage_tw1949·2d

在日台灣人朋友在公司跟一位日本同事關係不錯，私下會約出去吃喝玩樂那種，有一次我朋友太嗨了，一時忘記講敬語，對方竟然說「我跟你的關係沒有好到可以不用敬語吧」日本的敬語這東西真的就是人際關係最主要的殺手。讓人理所當然地的壓迫別人的東西，本來就應該消失。

中文

47.8K

snow@lstmfpga·1d

@Clapton_Free 經常遇到日本人打噴嚏胡亂噴，完全不掩口鼻，每天都遇到。什麼日本人文明有禮儀，屁，不知所謂。

中文

109

Genpo Liu@Clapton_Free·1d

冷知识，日本人在公共场合如果没戴口罩，打喷嚏不捂嘴巴。

Nicholas Don@Tang_Shuoyang

我看很多人宣传老日“不给别人添麻烦”的底层思维也不尽然今日本儒伊豆討ち入り、坐在很空的电车上。至少有两名日男硬是要选择坐我后面，并且坐下没一会就开始频繁咳嗽打喷嚏，但不换位置。你们难道不知道你们的咳嗽和打喷嚏很烦人吗？

中文

1.5K

snow@lstmfpga·2d

@0xLogicrw Antigravity 整個是垃圾。特別是那些負責人

中文

540

思维怪怪@0xLogicrw·3d

谷歌 Antigravity 负责人（原 Windsurf 创始人）Varun Mohan 宣布，即日起再次将所有付费订阅计划的每周 Gemini 模型调用额度上限提升 3 倍。加上前一日 3 倍的额度调整，目前的基准配额已累计达到最初版本的 9 倍。同时，官方已将所有付费用户的当周用量清零重置，以期为开发者提供更充足的算力余量。然而，这一「加量」声明备受吐槽。有开发者在评论区指出，Antigravity 此前曾经历过一次严重的「配额缩水」（rug-pull），当时的调用限制严苛到哪怕只是偶尔使用侧边栏对话的轻度用户都会迅速触发限制，导致产品陷入完全不可用的「窒息状态」。官方此举本质上只是在修复之前极度不合理的严苛限制，如今却将其包装成慷慨的「免费福利」来进行营销。

Varun Mohan@_mohansolo

Yesterday, we 3x’d limits on Antigravity and are seeing you build so much more. One thing we heard was people are worried about hitting their weekly limits after a couple work sessions. To give you more runway, we’re 3x’ing the weekly Gemini quotas AGAIN on all paid plans. We’ve also gone ahead and reset Gemini quotas on all paid plans. Don’t stop building!

中文

27.3K

snow retweetledi

alphaXiv@askalphaxiv·3d

“Probabilistic Tiny Recursive Model” This paper makes Tiny Recursive Models stochastic at test time by adding Gaussian noise, running parallel rollouts, and using the existing Q head to pick the best answer. With no retraining and no task-specific tricks, its PPBench jumps from 62.6% to 91.2%, while Sudoku-Extreme jumps from 87.4% to 98.75%.

English

460

18.8K

snow retweetledi

Pedro@pmpcurvo·3d

Guide with examples, not rewards 🐘 Controlling what a pretrained generative model produces is still mostly a choice between three slow options: fine-tune it, attach a reward network, or search at inference. We found flow matching allows a fourth, and it costs almost nothing. In deterministic interpolants, the velocity of the flow is determined by where the trajectory is headed: the endpoint mean. Shift that mean, and the entire flow shifts with it. This turns control into a matter of reference. Change the examples that define the endpoint, and you change the direction the model follows. The examples need not be perfect. They only need to point the flow toward the attribute you want. Color, identity, style, and structure, all controllable through examples. 🧵👇

GIF

English

167

32.9K

snow retweetledi

Avi Chawla@_avichawla·3d

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: github.com/OpenPipe/ART (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.

Avi Chawla@_avichawla

x.com/i/article/2048…

English

183

1.6K

342.7K

snow retweetledi

Grigory Bartosh@GrigoryBartosh·4d

🚀 Excited to share my @GoogleDeepMind student researcher project: Dual-Rate Diffusion✨ ⚡ A simple construction that speeds up both regular diffusion and distilled models by interleaving a heavy context encoder with a light conditional denoiser. 🧵👇

English

190

16.7K

snow@lstmfpga·4d

@snowboat84 同感，做不了兩年。有自主性的人，不喜歡在多層階級的公司做。他受不了上面的指揮，更像是多 idea 希望別人聽他說。

中文

3.8K

snowboat@snowboat84·4d

我感觉Andrej Karpathy在A社待不久。他现在进去，是report给Nick Joseph，who report给Jared Kaplan，who report给Dario。中间隔了好几层，连VP都不是，只能在下面做一小块。这和这位老哥当年在openAI和Tesla的地位不可同日而语，而且现在他做的方向也基本不是Anthropic的核心战略线。他本质上和Andrew Ng一样，属于喜欢做自媒体，给人上课那种自由的性格，现在被压在下面做，我觉得他做不了多久。他更适合当thought leader，不适合做executor。立这个帖子，两年后来看。

snowboat@snowboat84

最近Andrej Karpathy @karpathy 结束了他的AI教育创业，去了Anthropic。有人说这是背刺OpenAI，也有人说他是AI教育创业失败。抛开这些八卦，作为普通人，我想见贤思齐，看看能从他身上学到什么。首先说说，他的哪些事情是我们学不到的？第一，英语区里的文化语感。英语本不是他的母语。他是捷克斯洛伐克人，但是他15岁去了加拿大，整个高中和大学都在英语环境里度过，英语对他来说是有文化感和语感的语言。我们这种博士才来美国的人，很难达到那个程度。缺的不是英语水平，是那种高密度的浸泡环境，以及从青春期开始就和英语母语者建立的深层学习关系。这一层补不上。第二，顶尖的学术和职业履历。他在加拿大的资源其实一般，但是后来去到斯坦福，就开始获得顶级资源。先是成为OpenAI的co-founder，又在Tesla最重视自动驾驶的那几年加入并主导FSD项目。顶着这两个title可以吃一辈子，这种成长背景和行业机遇，可遇不可求，普通人完全无法复制。再来说说，什么是我们可以学习的。第一，Building in Public。他从19岁就开始这件事了。本科期间在YouTube开了一个叫badmephisto的频道，做魔方教程。读博期间他手搓了ConvNetJS，一个用纯JS写的深度学习库，打开浏览器就能看到神经网络在训练。之后每隔一两年，他就出一个从零手搓的小项目。2020年micrograd，2022年nanoGPT，from scratch重现GPT-2。2024年 llm.c，纯C训练LLM。2026年microgpt，200行无依赖跑通整个GPT。二十年里没停过。每个项目都放在GitHub，配博客或者视频。这就是Building in Public的实质，做完一件事就留下一个公开的工件。第二，Learning in Public。这一点其实更值得学，因为门槛更低，但大部分人不好意思做。他写过一篇博客叫《What I learned from competing against a ConvNet on ImageNet》。当时他自己亲手给ImageNet图片做人类标注，跟神经网络比赛准确率，然后把整个过程写下来。他还写过一篇《A Recipe for Training Neural Networks》，本质上是把自己训练神经网络踩过的坑列成 checklist。他的YouTube系列Neural Networks: Zero to Hero也是一样。两个小时一个视频，他坐在电脑前边写代码边出声思考，包括卡住的地方、调试的过程，不修饰，不剪辑炫技。学生看到的不是结果，是一个真人怎么搞懂一件事。 Learning in Public还包括Teaching in Public。他读博期间主导设计了CS231n 这门深度学习课，从第一届150人涨到第三届750人，成了斯坦福最大的课之一。但更关键的是，他把整套课程的 slides、笔记、作业、视频，全部免费放到网上。 Building in Public和Learning in Public这两件事，是每个人都可以做的，而且完全可以现在中文区做起来。我们现在说做个人IP，其实Andrej Karpathy是最好的做个人IP的例子。至于如何变现个人IP，不要太指望你直接通过在自媒体平台做in public系列就可以赚钱。Karpathy自己也没靠YouTube广告或者卖课吃饭，他的钱来自Tesla股票、OpenAI股权这些真正的工作。Eureka Labs想直接卖AI教育课程，最后也没真正做起来。个人IP真正的价值在于给你选择权。它可以让你卖课，卖产品，但是更能让你被人记得，被人主动找到，让原本你够不到的机会自己来找你。可能是一个好工作的offer，可能是一个合伙人，可能是一个客户，可能是一笔投资。这些东西的回报可能超过你自己的预期。

中文

161

1.9K

644.1K

snow retweetledi

Sungjin Ahn@SungjinAhn_·4d

Generating Sudoku map. GRAM generates valid maps in less than 10 recursion steps. Diffusion (D3PM) takes much more steps and often leaves incorrect cells.

GIF

Sungjin Ahn@SungjinAhn_

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: arxiv.org/abs/2605.19376 🌐 Project page: ahn-ml.github.io/gram-website w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

English

116

13.4K

snow retweetledi

Sungjin Ahn@SungjinAhn_·4d

English

208

1.5K

178.2K

snow retweetledi

Sam Sheffer@samsheffer·5d

i don’t think you understand how insane omni is

English

115

232

3.4K

286.9K

snow retweetledi

Andrej Karpathy@karpathy·5d

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

7.9K

11.1K

148.7K

27M

snow retweetledi

Unitree@UnitreeRobotics·5d

Voice‑driven, real‑time arbitrary action generation😁 Using external voice commands, G1 is directly controlled to generate a wide range of actions in real time. This video was recorded in a single take, with on‑site audio recording. Because the actions are autonomously generated by AI in real time, there may be slight latency, and the smoothness of the movements may be somewhat reduced.

English

312

624

5.9K

21.5M

snow@lstmfpga·6d

@Konekoutena 小貓的觀察力很好

中文

枫糖小猫@Konekoutena·6d

一万六千fo了

日本語

2.7K

snow@lstmfpga·6d

@ls_qu30904 理論上 prompt 越長，模型的處理難度越大，表現越差，而且回答的表現和 context 內容長度成反比。所以太長的 prompt 還是避免一下好了，做sft

中文

245

snow retweetledi

Captain Insight@CaptainInsightX·15 May

OpenAI spent billions on training infrastructure. Two Aussie brothers made AI training 30x faster ~ with $500K total. 🤯 Meet Daniel & Michael Han 🇦🇺 > Brothers from Sydney, Australia > Daniel was an engineer at NVIDIA > Sped up the t-SNE algorithm 2000x. Cut SVD memory in half. > Found and fixed 20+ bugs in Meta’s Llama, Google’s Gemma, Mistral, and Phi > Big AI labs missed bugs in their own models. He caught them. > Started Unsloth in December 2023 with his brother Michael > Built tools that make LLM fine-tuning 2-30x faster, with 70-90% less memory Released it 100% open source. Free for everyone. 🚀 > 64,000+ GitHub stars > 10 million model downloads every month > NASA and Canva use their code > Raised only $500K total in seed funding > Got into Y Combinator S24 > Led by two brothers with a small team of 8 shipping code While big labs burn billions, they made AI accessible to everyone. Absolute Legends 🐐

English

135

1.5K

63K

Keşfet

@Konekoutena @Clapton_Free @savage_tw1949 @0xLogicrw @GoogleDeepMind @snowboat84 @elonmusk @BarackObama