Tsendsuren

22K posts

Tsendsuren banner
Tsendsuren

Tsendsuren

@TsendeeMTS

Research scientist at Google DeepMind | previously at Microsoft Research and Postdoc at UMass. Views are my own. Most tweets in Mongolian 🇲🇳.

Bay Area शामिल हुए Ocak 2010
615 फ़ॉलोइंग4.6K फ़ॉलोवर्स
Tsendsuren
Tsendsuren@TsendeeMTS·
Interesting! Few years back, I did experiment and observed 4x reduction without regression. In some cases, it even gave boost. But still entailed computing that giant LxL matrix so I dropped it.
Ashwin Gopinath@ashwingop

x.com/i/article/2040…

English
0
0
0
159
G
G@GiimaaAj·
Эмч нарын баяр тэмдэглэчихээд ирий даа 🤭
G tweet media
Русский
2
0
4
356
Tsendsuren रीट्वीट किया
Jim Musil Painter
Jim Musil Painter@JimMusilPainter·
My painting EASTERN SIERRA
Jim Musil Painter tweet media
English
21
108
1.4K
14.3K
Tsendsuren रीट्वीट किया
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
334
2.1K
13.5K
4.9M
Tsendsuren रीट्वीट किया
Michael Boegl
Michael Boegl@michaelboegl·
yosemite on 35mm film 🎞️
Michael Boegl tweet mediaMichael Boegl tweet mediaMichael Boegl tweet mediaMichael Boegl tweet media
Eesti
13
301
2.5K
47.7K
Tsendsuren रीट्वीट किया
Tsendsuren
Tsendsuren@TsendeeMTS·
Compression is triggered in online manner in between tool chaining and the policy is updated through the compression outcome via final reward - very nice!
Jonathan Frankle@jefrankle

Reusability influenced every decision we made: * Off-policy RL for simplicity and robustness. No quality tradeoffs: arxiv.org/abs/2602.19362 * Simple, elegant test-time compute via parallel thinking to control latency * RL-learned context compression * Multi-task RL for modularity

English
0
1
3
778
Tsendsuren रीट्वीट किया
Jonathan Frankle
Jonathan Frankle@jefrankle·
Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵
Jonathan Frankle tweet mediaJonathan Frankle tweet media
English
9
46
241
68.9K
Tsendsuren रीट्वीट किया
Tomek Korbak
Tomek Korbak@tomekkorbak·
We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.
Tomek Korbak tweet media
English
11
51
423
53.1K
Tsendsuren रीट्वीट किया
Reiner Pope
Reiner Pope@reinerpope·
We’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. We call it the MatX One. The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar. We’ve raised a $500M Series B to wrap up development and quickly scale manufacturing, with tapeout in under a year. The round was led by Jane Street, one of the most tech-savvy Wall Street firms, and Situational Awareness LP, whose founder @leopoldasch wrote the definitive memo on AGI. Participants include @sparkcapital, @danielgross and @natfriedman’s fund, @patrickc and @collision, @TriatomicCap, @HarpoonVentures, @karpathy, @dwarkesh_sp, and others. We’re also welcoming investors across the supply chain, including Marvell and Alchip. @MikeGunter_ and I started MatX because we felt that the best chip for LLMs should be designed from first principles with a deep understanding of what LLMs need and how they will evolve. We are willing to give up on small-model performance, low-volume workloads, and even ease of programming to deliver on such a chip. We’re now a 100-person team with people who think about everything from learning rate schedules, to Swing Modulo Scheduling, to guard/round/sticky bits, to blind-mated connections—all in the same building. If you’d like to help us architect, design, and deploy many generations of chips in large volume, consider joining us.
English
124
202
2.2K
3M
Tsendsuren रीट्वीट किया
Itamar Zimerman
Itamar Zimerman@ItamarZimerman·
📜🚨 Introducing TensorLens! 🔎 Our new tool for Transformer & LLM interpretability. The problem: attention matrices are (i) a shallow view that ignores embeddings, FFNs, and values, and (ii) there are too numerous (per head & layer), which quickly becomes overwhelming. 🧵 1/6
Itamar Zimerman tweet media
English
14
126
915
50.6K
Tsendsuren रीट्वीट किया
TechHalla
TechHalla@techhalla·
Less than 24 hours since Google dropped Project Genie and people are already creating wild stuff! The era of vibe gaming starts. 15 insane examples 🧵👇 1. Discarded Pack of Cigarettes in the station
English
139
441
4.7K
1.1M
Tsendsuren रीट्वीट किया
idan shenfeld
idan shenfeld@IdanShenfeld·
People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)
idan shenfeld tweet media
English
45
210
1.4K
201.6K