Super Dario

8.2K posts

Super Dario banner
Super Dario

Super Dario

@inductionheads

the only way out is up

Katılım Ocak 2022
3.9K Takip Edilen9.2K Takipçiler
Sabitlenmiş Tweet
Super Dario
Super Dario@inductionheads·
1) what
Super Dario tweet media
艾略特@elliotchen100

论文来了。名字叫 MSA,Memory Sparse Attention。 一句话说清楚它是什么: 让大模型原生拥有超长记忆。不是外挂检索,不是暴力扩窗口,而是把「记忆」直接长进了注意力机制里,端到端训练。 过去的方案为什么不行? RAG 的本质是「开卷考试」。模型自己不记东西,全靠现场翻笔记。翻得准不准要看检索质量,翻得快不快要看数据量。一旦信息分散在几十份文档里、需要跨文档推理,就抓瞎了。 线性注意力和 KV 缓存的本质是「压缩记忆」。记是记了,但越压越糊,长了就丢。 MSA 的思路完全不同: → 不压缩,不外挂,而是让模型学会「挑重点看」 核心是一种可扩展的稀疏注意力架构,复杂度是线性的。记忆量翻 10 倍,计算成本不会指数爆炸。 → 模型知道「这段记忆来自哪、什么时候的」 用了一种叫 document-wise RoPE 的位置编码,让模型天然理解文档边界和时间顺序。 → 碎片化的信息也能串起来推理 Memory Interleaving 机制,让模型能在散落各处的记忆片段之间做多跳推理。不是只找到一条相关记录,而是把线索串成链。 结果呢? · 从 16K 扩到 1 亿 token,精度衰减不到 9% · 4B 参数的 MSA 模型,在长上下文 benchmark 上打赢 235B 级别的顶级 RAG 系统 · 2 张 A800 就能跑 1 亿 token 推理。这不是实验室专属,这是创业公司买得起的成本。 说白了,以前的大模型是一个极度聪明但只有金鱼记忆的天才。MSA 想做的事情是,让它真正「记住」。 我们放 github 上了,算法的同学不容易,可以点颗星星支持一下。🌟👀🙏 github.com/EverMind-AI/MSA

English
9
22
451
53.4K
Super Dario retweetledi
David Sacks
David Sacks@DavidSacks·
In December, President Trump signed an Executive Order tasking us with the development of a national framework for AI, what he called “One Rulebook.” This was in response to a growing patchwork of 50 different state regulatory regimes that threaten to stifle innovation and jeopardize America’s lead in the AI race. Today we are releasing that framework. It will help parents safeguard their children from online harm, shield communities from higher electric bills, protect our First Amendment rights from AI censorship, and ensure that all Americans benefit from this transformative technology. We look forward to working with our colleagues in Congress to turn the principles we are announcing today into legislation. whitehouse.gov/articles/2026/…
English
191
371
2.4K
279.6K
Super Dario retweetledi
dr. jack morris
dr. jack morris@jxmnop·
people are taking this the wrong way. look at it like this: Composer 1. DeepSeek (probably) Composer 2. Kimi (certainly) Composer 3. likely will be the first frontier base model pretrained from scratch for a single domain (coding) it’s a scary thought isn’t it
Fynn@fynnso

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

English
14
1
88
13.2K
Super Dario retweetledi
Ted Zhang
Ted Zhang@TedHZhang·
.@perplexity_ai Computer is fricken incredible. It has been the easiest to use agent I've used.
English
10
4
61
7.6K
Super Dario retweetledi
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
The Terence Tao episode. We begin with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make worse predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy! 0:00:00 – Kepler was a high temperature LLM 0:11:44 – How would we know if there’s a new unifying concept within heaps of AI slop? 0:26:10 – The deductive overhang 0:30:31 – Selection bias in reported AI discoveries 0:46:43 – AI makes papers richer and broader, but not deeper 0:53:00 – If AI solves a problem, can humans get understanding out of it? 0:59:20 – We need a semi-formal language for the way that scientists actually talk to each other 1:09:48 – How Terry uses his time 1:17:05 – Human-AI hybrids will dominate math for a lot longer Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify.
English
53
209
1.7K
160.3K
Super Dario retweetledi
Tali Goldsheft
Tali Goldsheft@TaliGoldsheft·
Amazing letter by @Cornell President rejecting the resolution. Should be read by all: Dear Zora, Thank you for conveying SA Resolution 61: Calling for the Termination of Cornell University’s Partnership with the Technion – Israel Institute of Technology While Preserving Cornell Tech. I reject this resolution, which fundamentally conflicts with Cornell’s principles of academic collaboration and our core commitment to academic freedom. Cornell Tech is not a political entity. It is an academic partnership, created through shared investment by Cornell University, the Technion, and the City of New York for the benefit of the city and the state, according to a negotiated set of conditions that govern its development and the terms of its 99-year ground lease on Roosevelt Island. As one of Cornell University’s many international partnerships and collaborations, Cornell Tech deepens, enriches, and strengthens the ability of our students, faculty, and staff to pursue knowledge and advance the university’s academic mission. The Joan and Irwin Jacobs Technion-Cornell Institute, the core international partnership upon which Cornell Tech is based, is an extraordinarily valuable collaboration focusing on education and research in health tech, media tech, and urban tech, and supporting the development of new startup companies. Severing our relationship with the Technion—or with any entity affiliated with governments, institutions, or enterprises with which some of our community members disagree—as a statement of political protest, would not only hinder our research, teaching, and public engagement; it would imperil our academic principles. Our university, like all of our peer institutions, regularly faces pressure—from across the political spectrum, from within and beyond our own community—to make academic decisions according to political priorities. The phenomenon is not a new one: universities have grappled with such pressures from governments and societies for as long as the institution of the university has existed. When we yield to these pressures and proscribe specific collaborations or collaborators on grounds other than merit, we compromise our principles of academic freedom, undermine our own institutional excellence, and damage public trust in our work.   Moreover, this resolution inaccurately asserts that “the continued operation of Cornell Tech as a Cornell University campus does not require an ongoing partnership with the Technion-Israel Institute of Technology.” Cornell Tech, while part of Cornell, is a joint effort of the university, the Technion, and the City of New York. It is no more possible for Cornell to unilaterally terminate that effort and claim full control of the campus than it would be for the Technion or the City of New York to do the same. Finally, I am deeply troubled by the selective manner in which this resolution singles out the Technion, alone of Cornell’s many international partners, for censure. Cornell currently maintains 159 active agreements with institutions in 59 nations and regions; all of these institutions have some government affiliation, and many conduct research with military and security applications. Cornell itself has military research contracts, conducts research with potential military applications, and has relationships with companies whose products are used in military contexts. Cornell also has relationships with institutions in countries whose governments have been accused of human rights violations—as our own has been.  None of these publicly available facts are mentioned in the resolution; only our partnership with an Israeli institution is targeted for erasure. The political bias evident in this selective approach is deeply disturbing, and the resolution is incompatible with both the Student Assembly’s purpose and Cornell University’s core values. I reject it fully and forcefully. Sincerely,   Michael Kotlikoff President and Professor of Molecular Physiology Cornell University
Gregg Mashberg@gregg_mashberg

Cornell rejects anti-Technion BDS resolution. And tells ⁦@ZohranKMamdani⁩ not even to think about ending the Consortium: “It is no more possible for Cornell to unilaterally terminate…than it would be for…the City of New York to do the same.” assembly.cornell.edu/resolutions/st…

English
52
351
2K
210.2K
Super Dario retweetledi
Dylan Patel
Dylan Patel@dylan522p·
Decided to dress up as a fraud for Halloween I scared the ever loving shit out of the Ernst & Young
Dylan Patel tweet media
San Francisco, CA 🇺🇸 English
28
27
739
52.7K
Sen. Bernie Sanders
Sen. Bernie Sanders@SenSanders·
I spoke to Anthropic’s AI agent Claude about AI collecting massive amounts of personal data and how that information is being used to violate our privacy rights. What an AI agent says about the dangers of AI is shocking and should wake us up.
English
1.5K
3.8K
24.1K
6.1M
Super Dario
Super Dario@inductionheads·
The enshittification of everything started precisely when companies started using customer service as a profit center instead of a cost center
English
1
0
2
226
Super Dario retweetledi
François Fleuret
François Fleuret@francoisfleuret·
If you have an explanation of why the transformer is so successful, here is a rapid sanity check: if it works for a huge MLP ("depth!", "SGD!", "magic of ml!") it's a very insufficient explanation.
François Fleuret@francoisfleuret

English
19
8
130
16.1K
Super Dario
Super Dario@inductionheads·
My timeline from here on out will be people realizing that the induction head pattern can be replicated at more and more abstract scales
English
0
0
7
412
Super Dario retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
One time on a flight I experienced brutal turbulence for nearly half an hour. Passengers were vomiting, bags falling from the overhead compartments. I heard objects shattering in the crew galley. It was the worst turbulence I’ve ever endured. As we approached the end of it, I felt the pilot push on the throttle and accelerate the plane out of the storm. It was enough to push your back firmly against the seat, as in takeoff. You could feel the plane gain altitude and speed, and you could imagine the pilot exhaling in great relief. The engines and their human masters had triumphed over disordered nature. I remember how satisfying it all felt. If I had to pick one vignette from my life to describe my politics, this would probably be it.
English
8
7
212
25.1K
Super Dario retweetledi
Wei Ping
Wei Ping@_weiping·
🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…
Wei Ping tweet media
English
32
112
662
73.6K
Super Dario retweetledi
Dmitry Shevelenko
Dmitry Shevelenko@dmitry140·
Perplexity has always focused on accurate, useful AI. Today we announced the Perplexity Health Advisory Board and health data connectors in Perplexity. We’re honored to welcome Dr. @EricTopol, Dr. @devin_mann, Dr. @WendyKChung, and @timdybvig as the first members of the board.
Dmitry Shevelenko tweet media
English
12
16
201
11.3K