Shom

696 posts

Shom

@ShomLinEd

language model | sequence modeling | education | HCI

Web Katılım Eylül 2021

2.3K Takip Edilen379 Takipçiler

Shom retweetledi

Yu Zhang 🐙🌘@yzhang_cs·17 Tem

@mark_k but quite the opposite, i'd say kimi is every bit as much a true believer in scaling laws as A\, arguably more so than oai

English

338

186K

Shom retweetledi

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·18 Tem

Congrats, @Kimi_Moonshot! 『 Kimi’s Four Commandments』have circulated in the Chinese AI community for months. Many people add their own fifth for comic relief, but the original four were the core. Here's an English translation --- since apparently none can be taken for granted.

Xinyu Yang@Xinyu2ML

Why can Kimi ship K3? Let me tell my story. Earlier this year, I left academia for industry. I talked to a lot of companies along the way. Here's what I saw: 1⃣Arrogance. They believe the AI war is over, and they won. No hunger for the future, and no hunger for talent. 2⃣Restlessness. Young labs short on foundation, either rushing to catch the frontier or pivoting away from the competition. 3⃣Fear. Strong teams with real experience, but from the second tier, they can't quite bring themselves to aim for #1. 4⃣Misalignment. Everyone is optimizing for their own credit, but nobody really cares whether the company can reach AGI. Kimi was different. Over many conversations with the founders, the same thing came through every time: a raw, genuine hunger for AGI. I joined. The hunger was real. We shipped K3. This is only the beginning.

English

349

41.2K

Shom retweetledi

Martin Monperrus@martinmonperrus·13 Tem

We found that coding agents know if the code they write will pass tests up to 25 steps before they write it. They are actively planning the fix in the latent space. Latent Programming Horizons in Coding Agents arxiv.org/pdf/2607.05188

English

1.4K

Shom retweetledi

Hangliang Ding@_foreverpiano·2 Tem

Everyone's building harnesses that run for hours now — auto-research loops, /goal agents, overnight coding runs. But here's the thing: a harness only keeps going if the environment keeps teaching it something. That's the real bottleneck, not just the scaffolding. This is why we built EdgeBench on real-world tasks — gravitational waves, EDA design, formal math. Real tasks have deep feedback structure; toy tasks run dry in 30 minutes. 134 tasks, to ~72h. And the learning follows a clean log-sigmoid. 👇

Deyao Zhu@tikgiau

Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interaction time with high precision. EdgeBench is built with three ingredients: - 🌍 Real & Diverse: 134 real-world tasks across 6 task categories, spanning scientific problems, professional knowledge work, software engineering, optimization, formal math, and games. - ⏳ Ultra-Long-Horizon: Each task supports 12–72 hours of agent work. Recorded human effort averages 57.2 hours. - 🔁 Informative Feedback: Agents receive real-world feedback for continuous improvement. After 38,000 hours of agent runs on EdgeBench, a scaling law for learning from environments emerges: - 📈 As agents interact with task environments over time, their aggregate performance is precisely fit by a log-sigmoid function. - 🧠 This phenomenon can be explained by an elegant theory of graph exploration. We are releasing an initial 51 of the 134 tasks, together with the full evaluation framework, to help advance long-horizon agent research. Check our blog & paper for more findings! Blog edge-bench.org Paper edge-bench.org/paper.pdf GitHub github.com/ByteDance-Seed… Dataset huggingface.co/datasets/ByteD… Details below 👇🧵

English

4.8K

Shom retweetledi

Zekun Wang@kugwzk1·30 Haz

@TianhangZhuzth I noticed that the Qwen2.5/3 and Qwen2.5-Math technical reports appear on your Google Scholar, but I could not find your name in the author lists. Could you clarify your role in these works?

English

114

29.2K

Shom@ShomLinEd·29 Haz

@bigeagle_xd 需要均匀mixture

日本語

熊师傅 weight decay 了吗@bigeagle_xd·29 Haz

今天终于开始学前刃了，开始学习之前先后刃热身了挺久，等前刃练完，教练让我回后刃，结果之前很熟练的基本动作也完全做不出来了…… 这就是分阶段SFT后的遗忘效应么？

熊师傅 weight decay 了吗@bigeagle_xd

最近在滑雪机上学单板，教练会用语言教各种情况下该怎么做动作，但是他说的“踩”、“拧”、“释放压力”和我理解的显然不是一个东西，总之我就是很难主动控制雪板而且非常吃力。后来我干脆放弃了，让雪板随机游走，我只控制自己别摔倒，同时尽可能记下雪板的运动状态和当时自己的身体状态，记到一定程度之后，如果想要主动控制，就从记忆里找有没有match的场景，如果有的话就尽力replay一下，如果恰好控制住了，这就成了一个正样本，如果没有，就放弃控制让它继续随机运动。我发现这种学习方式非常省力且效率很高。仔细一想，这个套路很像：先随机采样学习先验分布，再强化学习提高目标分布的概率，果然还是 pretrain + RL 效率高。另一方面，教练有 knowledge curse，他确实很难再体会到“完全不会滑雪”是什么感觉了，所以给了好多我学不会的SFT数据🥲

中文

4.7K

Shom@ShomLinEd·24 Haz

@pmddomingos just apply 1000 modifications and call it by another name

English

233

Pedro Domingos@pmddomingos·23 Haz

Imagine if Google had an enforceable patent on transformers.

English

1.1K

94.1K

Shom@ShomLinEd·14 Haz

@QuixiAI @NexEcosystem @SemiAnalysis_ The credit is only included 23 minutes ago...After others pointed out about the mrege

English

Eric Hartford@QuixiAI·14 Haz

@NexEcosystem @SemiAnalysis_ To be precise, it's a merge of Nex-N2-Pro with Qwen3.5-397b (which imo would degrade vs just basing from Nex-N2-Pro) with some On-Policy Distillation on top. They credit Nex-N2-Pro in their model card, nothing sneaky. The switch from Apache 2.0 to MIT is odd.

English

136

SemiAnalysis@SemiAnalysis_·13 Haz

SITUATION DETECTED: The city of Rio de Janerio has post-trained a model. Based on Qwen 7/2, Rio 3.5 Open 397B adds SwiReasoning on top of the base Qwen model — a framework that dynamically switches between standard chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals, so the model only "thinks out loud" when it needs to and otherwise reasons silently in hidden space for better token efficiency.

English

277

894.8K

Shom@ShomLinEd·14 Haz

@Hesamation Maybe less impressive considering the "development" is simply merging the Nex posttrained version with the original one: github.com/nex-agi/Nex-N2…

English

305

ℏεsam@Hesamation·14 Haz

Sir, they’re not pausing AI research. Rio de Janeiro's mayor just dropped a SOTA open source model and it’s outperforming Qwen 3.7.

𝗭𝗲𝗻 𝗠𝗮𝗴𝗻𝗲𝘁𝘀@ZenMagnets

Alibaba Qwen3.7 slowly fading into irrelevance at the frontier due to proprietary stance. In it's place we have Minimax M3 and... *checks notes* Rio 3.5 397b, made by the municipal IT company of Rio de Janeiro's city government. huggingface.co/prefeitura-rio…

English

163

2.6K

278.4K

Shom retweetledi

Tiezhen WANG@Xianbao_QIAN·14 Haz

wait... what??? github.com/nex-agi/Nex-N2…

English

654

164.6K

Shom retweetledi

Nex@NexEcosystem·14 Haz

The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat. 🤯 We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro + 0.4 * Qwen 3.5 It even literally introduces itself as "Nex N2 Pro" if you ask it without initial system prompt! 😂 We are flattered that the City of Rio used our work to achieve SOTA performance. Thanks for the ultimate benchmark validation. 🤝 But in the open-source world, attribution matters. 👇 Full mathematical proof & verify script in the first reply!

English

221

534

5.4K

907.5K

Shom@ShomLinEd·8 Haz

@francoisfleuret my first thought was mixing attention and rnn would do the trick but then it came to me that in hybrid transformers, rnns are often reduced to simple local mixers

English

413

François Fleuret@francoisfleuret·8 Haz

Hot take: Transformers are all-seeing ultrafast librarians. They have a very low incentive to extract and organize information, they can just "look around" to see correlating fragments. RNNs done properly would have far stronger "conceptual embeddings" and would actually think.

English

778

63.7K

Shom retweetledi

Dawning Road@TheDawningRoad·5 Haz

Introducing Nex-N2 — true Agentic Thinking, built with @NexEcosystem 🚀 Thinking is now standard in foundation models, but it sits in an awkward position in Agent tasks: either the performance gains aren't significant, or it's verbose, and switching scenarios means readapting all over again. The root cause is that thinking from the o1/R1 era was built around RLVR for math and code tasks, not for long-horizon Agent tasks—there's a layer of separation between thinking and action. Nex-N2 introduces a complete Agentic Thinking framework, split into two parts: Adaptive Thinking and Coherent Thinking. The former achieves adaptive reasoning intensity, improving speed (which really matters in long-horizon tasks spanning hundreds of steps) and saving unnecessary token expenditure. The latter unifies thinking patterns across different tasks, making actions more stable, consistent, and robust. - Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss. - Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching. On coding and Agent tasks, Nex-N2 ranks in the top tier of open-source models. The model is fully open-sourced and available simultaneously on Hugging Face, ModelScope, and SiliconFlow. We welcome everyone to try it out. Official website: nex-agi.cn Huggingface: huggingface.co/collections/ne…

English

639

Shom@ShomLinEd·28 May

@_m0se_ It seems in hybrid models linear and full attentions take on different roles as full attentions capture long term dependencies more easily leaving linear attention to focus on local mixing.

English

115

OpenMOSE@_m0se_·27 May

Qwen3.5のGDNレイヤーを分析していましたが、 1000トークンくらいの記憶しか事実上されてなかった

日本語

2.1K

Shom@ShomLinEd·25 May

@can some of sqlte's tests are public tho

English

1.5K

can@can·25 May

in the light of this automated bun rewrite, SQLite’s open-source core, closed-source tests policy feels prescient but not exactly sure how

English

678

88K

Shom@ShomLinEd·24 May

@jarredsumner Is this fuzzer a library or made by you?

English

884

Jarred Sumner@jarredsumner·24 May

There is now a fuzzer running 24/7 for every language parser in Bun, ranging from .npmrc files and .patch files to shell scripts to jsonc & typescript & css. Once it minimizes a repro, it sends to Claude to fix and then I review.

English

248

12.3K

Jarred Sumner@jarredsumner·24 May

ZXX

686

36.6K

Shom@ShomLinEd·14 May

@boshen_c it's from zig presumably

English

2.6K

Boshen@boshen_c·14 May

So many allocators! We only have 1 in Oxc 😂

English

141

18.9K

Boshen@boshen_c·14 May

Thanks to the Rust rewrite, I now learn why Bun is fast First find: it uses a thread local arena for ASTs

English

975

105.5K

Shom retweetledi

Kaichao You@KaichaoYou·8 May

This is growth-hacking dressed up in open-source language, @radixark please stop doing it immediately. Paying people in platform credits to star a GitHub repo and repost a marketing tweet isn't "fueling the community" — it's laundering paid promotion through the trust signals open source depends on. Stars are supposed to mean someone found a project useful. Attach a $200 bounty and the number means nothing. GitHub's own policies prohibit this for exactly that reason.

RadixArk@radixark

$200 FREE CREDIT! We just launched our inference platform for beta testing, and we're giving it to the community first. ⭐ Star SGLang on GitHub (github.com/sgl-project/sg…) + repost this to claim your credits. → Limited spots, first come first serve → Deadline: May 13, 2025 (AoE) Every star, every issue filed, every PR reviewed, every question answered in Slack — You built this with us. Thank you for believing in open-source AI infrastructure, in our mission, and in us. Claim your credits: platform.radixark.com

English

283

45.3K

Shom@ShomLinEd·6 May

@zephyr_z9 full attention also has linear scaling...

English

181

Zephyr@zephyr_z9·5 May

Ok, so it's a linear attention variant

English

134

23.5K

Shom retweetledi

Keller Jordan@kellerjordan0·1 May

New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.

English

429

62.1K

Keşfet

@mark_k @Kimi_Moonshot @TianhangZhuzth @bigeagle_xd @pmddomingos @QuixiAI @NexEcosystem @SemiAnalysis_