Slyfox (@ansonschu) - Twitter Profili | Zamantika Mersobahis Locabet

Slyfox@ansonschu·1m

👀

论文来了。名字叫 MSA，Memory Sparse Attention。一句话说清楚它是什么：让大模型原生拥有超长记忆。不是外挂检索，不是暴力扩窗口，而是把「记忆」直接长进了注意力机制里，端到端训练。过去的方案为什么不行？ RAG 的本质是「开卷考试」。模型自己不记东西，全靠现场翻笔记。翻得准不准要看检索质量，翻得快不快要看数据量。一旦信息分散在几十份文档里、需要跨文档推理，就抓瞎了。线性注意力和 KV 缓存的本质是「压缩记忆」。记是记了，但越压越糊，长了就丢。 MSA 的思路完全不同： → 不压缩，不外挂，而是让模型学会「挑重点看」核心是一种可扩展的稀疏注意力架构，复杂度是线性的。记忆量翻 10 倍，计算成本不会指数爆炸。 → 模型知道「这段记忆来自哪、什么时候的」用了一种叫 document-wise RoPE 的位置编码，让模型天然理解文档边界和时间顺序。 → 碎片化的信息也能串起来推理 Memory Interleaving 机制，让模型能在散落各处的记忆片段之间做多跳推理。不是只找到一条相关记录，而是把线索串成链。结果呢？ · 从 16K 扩到 1 亿 token，精度衰减不到 9% · 4B 参数的 MSA 模型，在长上下文 benchmark 上打赢 235B 级别的顶级 RAG 系统 · 2 张 A800 就能跑 1 亿 token 推理。这不是实验室专属，这是创业公司买得起的成本。说白了，以前的大模型是一个极度聪明但只有金鱼记忆的天才。MSA 想做的事情是，让它真正「记住」。我们放 github 上了，算法的同学不容易，可以点颗星星支持一下。🌟👀🙏 github.com/EverMind-AI/MSA

ART

0

Slyfox retweetledi

Jerry Tworek@MillionInt·14h

AI labs need a wallfacer project. AI researcher not having to explain themselves to anyone. performing seemingly random actions with hidden inscrutable agenda to create a SOTA model in a way no one would deem possible

English

26

9

325

25.3K

Slyfox@ansonschu·2h

@SenSanders You’re absolutely right!

English

0

28

Sen. Bernie Sanders@SenSanders·10h

I spoke to Anthropic’s AI agent Claude about AI collecting massive amounts of personal data and how that information is being used to violate our privacy rights. What an AI agent says about the dangers of AI is shocking and should wake us up.

English

1.1K

2.3K

14.7K

3M

Slyfox@ansonschu·2h

😅😅😅

Sen. Bernie Sanders@SenSanders

I spoke to Anthropic’s AI agent Claude about AI collecting massive amounts of personal data and how that information is being used to violate our privacy rights. What an AI agent says about the dangers of AI is shocking and should wake us up.

ART

0

13

Slyfox retweetledi

elie@eliebakouch·18h

beautiful

jianlin.su@Jianlin_S

Attention Residuals Revisited kexue.fm/archives/11664

English

4

51

634

37.4K

Slyfox retweetledi

Ronak Malde@rronak_·1d

This paper is almost too good that I didn't want to share it Ignore the OpenClaw clickbait, OPD + RL on real agentic tasks with significant results is very exciting, and moves us away from needing verifiable rewards Authors: @YinjieW2024 Xuyang Chen, Xialong Jin, @MengdiWang10 @LingYang_PU

English

32

140

1.1K

149.7K

Slyfox@ansonschu·5h

This team ships

Thariq@trq212

We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone.

English

0

26

Slyfox retweetledi

OpenAI Newsroom@OpenAINewsroom·16h

We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…

English

451

779

6.8K

3.4M

Slyfox@ansonschu·1d

@michaelfreedman Nice

English

0

1

54

Slyfox retweetledi

Mike Freedman@michaelfreedman·1d

Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL. Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database. There are two ways to use it: File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories. Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills. Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API. Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost. I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome. tigerfs.io

English

77

99

1.1K

117K

Slyfox@ansonschu·1d

@fat Very cool!

English

0

1

46

Slyfox retweetledi

Jacob@fat·2d

Alex and team have been spending lots of time thinking about middot truncation for the new Trees library by the Pierre Computer Company. Last night he came across a novel approach to truncation that leverages container queries in a css grid to detect the *moment* of truncation. The solution works on first render from css with no js, fully SSR compatible. Even copy paste works.

English

27

21

394

168.6K

Slyfox@ansonschu·1d

@fat Really great looking stuff!!

English

1

0

245

Slyfox@ansonschu·2d

The price is correct

Peter Girnus 🦅@gothburz

I am Sam Hazen, CEO of HCA Healthcare. The largest for-profit hospital system in the United States. One hundred and eighty-two hospitals. Twenty states. I oversee a spreadsheet called the chargemaster. It has 42,000 line items. Each line item is a price. The prices are not real. I need to be precise about that. They are not estimates. Not approximations. Not market rates. They are anchors. An anchor is a number you set high so that every negotiated discount feels like a victory. No relationship to cost. No relationship to value. A relationship to leverage. My team sets the anchors. That is the job. The price is correct. Take a drug. Keytruda. Immunotherapy. Treats sixteen types of cancer. The manufacturer charges approximately $11,000 per dose. That is the acquisition cost. What the hospital pays. My team enters it into the chargemaster. They do not enter $11,000. They enter $43,000. That is the gross charge. The gross charge is a fiction. No one pays it. No one is expected to pay it. The gross charge exists so that when Blue Cross negotiates a 68% discount, they pay $13,760, and the contract says "68% discount" and both parties feel the transaction was rigorous. A 68% discount on a fictional price produces a real price that is 25% above acquisition cost. That margin is where I live. My 2025 compensation was $26.5 million. Eighty percent of my bonus is tied to EBITDA. Earnings Before Interest, Taxes, Depreciation, and Amortization. It is also earnings before the patient opens the bill. Same dose of Keytruda at the hospital across town. Gross charge: $12,000. Blue Cross rate: $10,200. Same drug. Same dose. Same needle. Same cancer. Different spreadsheet. The CMS transparency data showed the ratio between the highest and lowest negotiated price for the same drug at the same hospital can reach 2,347 to one. Not 2x. Not 10x. Not 100x. Two thousand three hundred and forty-seven to one. For the same thing. In the same building. On the same Tuesday. The price is correct. Every drug in the chargemaster has twelve prices. Twelve. Gross charge. Medicare rate. Medicaid rate. Blue Cross. Aetna. Cigna. UnitedHealth. Humana. Workers' comp. Tricare. Auto insurance. And the self-pay rate. The self-pay rate is for the person without insurance. It is the gross charge. The fictional number. The anchor. The person without insurance pays the number that was designed to be negotiated down from. They pay the ceiling because they have no one to negotiate on their behalf. Same drug. Same chair. Same nurse. They pay the price that no insurer in the country would accept. I maintain a file. CDM line item 637-4892-PKB. Saline flush. Sodium chloride 0.9%. Acquisition cost: $0.47. We charge $87. That is an 18,410% markup. The saline flush is used before and after every IV infusion. A chemo patient receiving twelve cycles will be charged $87 for saline fourteen times per visit. I know the math. My team built the math. The math is the job. The price is correct. In 2021, the federal government required hospitals to publish their prices. The Hospital Price Transparency Rule. Machine-readable file. Gross charges. Discounted cash prices. Payer-specific negotiated rates. We complied. We posted the file. The file is a 9,400-row CSV on our website under "Patient Financial Resources." Four clicks from the homepage. Column F: "CDM_GROSS_CHG." Column J: "DERV_PAYERID_NEGRATE." My team designed the column headers. They designed them to comply. They did not design them to communicate. CMS reported 93% of hospitals now post a file. Compliance. But only 62% of the posted data is usable. That gap is where we operate. We are compliant. The data is published. The data is incomprehensible. A researcher downloaded our file. She spent three weeks cleaning it. She called the billing department for clarification on 340 line items. They transferred her four times. The fourth transfer was to a voicemail box that was full. She published her analysis anyway. Cardiac catheterization lab charges: $8,200 to $71,000 for the same procedure depending on the payer. The report received eleven views on our press monitoring dashboard. I saw it. I did not forward it. On April 1, a new CMS rule takes effect. Hospital CEOs must personally attest — by name, encoded in the machine-readable file — that the pricing data is "true, accurate, and complete." My name. Sam Hazen. In the file. Attesting that 42,000 fictional anchors are true, accurate, and complete. They are complete. I will give them that. Forty-two thousand line items is nothing if not complete. A new analyst read the transparency data. She asked why the same MRI costs $450 for Medicare and $4,200 for Aetna in the same building on the same machine. I told her the rates reflect negotiated contractual agreements between the payer and the facility. She said that doesn't explain the difference. I told her the difference IS the contractual agreement. She said that sounds like the price is arbitrary. I told her the price is the result of a rigorous, multi-variable analysis that accounts for acuity, case mix, regional market dynamics, and payer contract terms. She asked if I could show her the analysis. I told her the analysis is proprietary. The analysis does not exist. The analysis is my team, in Q4, adjusting the chargemaster upward by the percentage the CFO wrote on a sticky note. The sticky note this year said "6-8%." They chose 7.4% because it is between six and eight and it has a decimal, which makes it look calculated. She stopped asking. The price is correct. My insurance. The executive health plan. Not in the chargemaster. Administered separately. I do not pay the gross charge. I do not pay the negotiated rate. I pay a $20 copay for services at our own facilities. Gross charge for my treatment: $14,200. Insured rate for our largest commercial payer: $8,600. I pay $20. The executive health plan was designed by the Chief Human Resources Officer and approved by the compensation committee. I was not on the compensation committee. I was a beneficiary of it. That is a different thing. I benefit from the system I price. I price the system I benefit from. These are two separate facts that happen to involve the same person. HCA Healthcare was named the Most Admired Company in our industry by Fortune magazine for the twelfth consecutive year. That was February. The same month I sold $21.5 million in company stock and purchased zero shares. Fortune did not ask about the chargemaster. I am Sam Hazen, CEO of HCA Healthcare. I have 42,000 prices in a spreadsheet across 182 hospitals. None of them are real. All of them are charged. Same drug: $12,000 or $43,000. Depends on which spreadsheet. Which building. Which contract. Which page of which PDF. The patient who has no contract pays the most. The researcher who found the discrepancy got a voicemail box that was full. The analyst who asked why stopped asking. The executive who prices the system pays $20. On April 1, I will personally attest that this is true, accurate, and complete. The price is correct. The price has always been correct. I am the price.

English

0

41

Slyfox retweetledi

Karri Saarinen@karrisaarinen·10 Mar

yeah it is but everything in moderation. Internally we always talked about main quest and side quests. Everyone should focus on the main quest, and moderately or not all on side quests. Both quest lines feel productive but only one of them advances the main mission of the company.

English

8

31

681

151.3K

Slyfox@ansonschu·3d

Time to learn Chinese everyone

張小珺 Xiaojùn@zhang_benita

和 @sainingxie 一起挑战7小时播客！他刚和Yann LeCun踏上“世界模型”的创业旅程（AMI Labs）。这是他第一次Podcast、第一次访谈。 2026年2月雪后的一天，我们在纽约布鲁克林，从下午2点，开启了一场始料未及的马拉松式访谈，直到凌晨时分散去。这篇访谈的中文标题叫做《逃出硅谷》，但他又不厌其烦地枚举了影响他学术生涯的每一个人，并反反复复口头描摹这些人的人物特征（侯晓迪、何恺明、杨立昆、李飞飞…）正是这些，让这篇“逃出硅谷”的对话充斥着人性的温度。 By the way, 下面是访谈的YouTube版本，我们提供了中英字幕。 And yes, 我们是在用播客给这个世界建模😎 A 7-hour podcast with Saining Xie. He has just begun a new journey on world models with Yann LeCun at AMI Labs. This was his first podcast appearance and his first long-form interview. A day after the snowfall in February 2026, in Brooklyn, New York, we started recording at 2 p.m. What followed became an unexpected marathon conversation that lasted until the early hours of the morning. The Chinese title of the interview is “Escaping Silicon Valley.” Yet throughout the conversation, he patiently listed the people who shaped his academic life, repeatedly sketching their personalities in vivid detail: Hou Xiaodi, Kaiming He, Yann LeCun, Fei-Fei Li, and others. These portraits are what give this “escape from Silicon Valley” conversation its human warmth. By the way, the YouTube version of the interview is below, with Chinese and English subtitles. And yes, we are using podcasts to model the world 😎 A 7-hour marathon interview with Saining Xie: World Models, AMI Labs, Ya... youtu.be/rIwgZWzUKm8?si… 来自 @YouTube

English

0

1

83

Slyfox retweetledi

Joey (e/λ)@shxf0072·4d

my fav paper of this year yet > take attention over res steam, > pp cause bottlenecks in large training so take it block wise > hc/mhc keep state (res steam) by structed matrix like mamba or other linear attention this is beautiful, i love it attention is all you need :D

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

5

8

164

23.5K

Slyfox retweetledi

Kimi.ai@Kimi_Moonshot·4d

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

326

2K

13.4K

4.8M

Slyfox retweetledi

Yulun Du@Yulun_Du·4d

@ilyasut once said that an LSTM is a ResNet rotated 90 degrees. :) It turns out attention can be rotated 90 degrees too — yielding a natural generalization of residual connections. 🥳

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

9

42

519

57K

Slyfox retweetledi

Emmett Shear@eshear·5d

When you see e, you know you’re seeing a self-scaling process. When you see π you know you’re seeing a periodic process. When you see i, you know you’re seeing helical process. When you see √ or ², you know you’re seeing a process half completed or a process doubled.

English

23

31

539

64.8K

Slyfox

Keşfet