Adam Zweiger (@AdamZweiger) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Adam Zweiger@AdamZweiger·19 Şub

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

23

148

943

130.5K

Adam Zweiger retweetledi

Han Guo@HanGuo97·4d

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

English

15

101

676

189.8K

Adam Zweiger retweetledi

Ramp Labs@RampLabs·10 Nis

x.com/i/article/2042…

ZXX

40

140

1.4K

362K

Adam Zweiger@AdamZweiger·1 Nis

@BobMcElrath cool!

English

0

183

Bob McElrath@BobMcElrath·1 Nis

@AdamZweiger FWIW, I implemented an online compaction from your paper using Claude in llama.cpp. Works pretty well, and very fast. Stores Q to score regions for compaction. l2k works better than your rms norm IIRC...

English

1

0

5

391

Adam Zweiger@AdamZweiger·31 Mar

The biggest reduction in KV cache memory comes not from quantization or MLA, but from latent compaction, along the sequence dimension. More strong results coming soon with Attention Matching.

Adam Zweiger@AdamZweiger

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

5

44

407

33.6K

Adam Zweiger retweetledi

Xinghong (Shin) Fu@shinfxh·13 Mar

just got claude to explain attention matching and it made this interactive heatmap to show the relative importance of each layer/head! this might just be better than the diagrams in our own paper...

English

1

5

56

3.3K

Adam Zweiger@AdamZweiger·7 Mar

@evnkimm nice work evan!

English

1

0

438

Evan Kim@evnkimm·6 Mar

How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)

English

13

19

168

34.4K

Adam Zweiger@AdamZweiger·2 Mar

New coding model from @cognition! I helped train it during my internship there. The team is very strong and this is just the beginning!

Cognition@cognition

We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.

English

4

5

96

9.7K

Adam Zweiger@AdamZweiger·28 Şub

Fun fact: Back in 2014, Demis had a red line condition for any potential acquisition of DeepMind: "no technology coming out of DeepMind will be used for military or intelligence purposes." Google accepting this more eagerly was part of why Demis chose them over Facebook. This red line is even broader than Dario's (no mass surveillance or fully autonomous weapons), though it was quietly removed by Google 1 year ago.

English

7

38

992

81.2K

Adam Zweiger@AdamZweiger·20 Şub

@bendee983 Not yet. Two things are that most inference engines currently don't have a way of initializing directly with a KV cache, and they don't support disentangling logical cache size from physical size (which is needed for rope embeddings). These are all fixable though.

English

0

17

1.4K

Ben Dickson@bendee983·19 Şub

@AdamZweiger This is impressive! Is it compatible with popular inference engines and current kernels? In other words, how easy is it to use it as a drop-in for whatever engine companies are using right now?

English

1

0

4

1.7K

Adam Zweiger@AdamZweiger·19 Şub

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

23

148

943

130.5K

Adam Zweiger retweetledi

Xinghong (Shin) Fu@shinfxh·19 Şub

the solution to infinite context was just linear regression all along

English

32

111

1.6K

189.1K

Adam Zweiger@AdamZweiger·19 Şub

@ye_combinator thanks! were you trying gradient descent? I think for practical purposes (i.e. low number of query samples), subsetting keys is hard to beat

English

1

0

6

376

Zihao Ye@ye_combinator·19 Şub

Great work! I have explored similar ideas before: tried per-layer per-head fitting for both Ck/v, it works in tasks like needle-in-haystack but training cost is too high to make it practical in production :( would love to see how you plan to make online compaction efficient.

Adam Zweiger@AdamZweiger

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

1

4

45

4.3K

Adam Zweiger@AdamZweiger·19 Şub

This was joint work with amazing collaborators: @shinfxh @HanGuo97 @yoonrkim Paper: arxiv.org/abs/2602.16284 Code: github.com/adamzweiger/co…

English

0

3

39

3.2K

Adam Zweiger@AdamZweiger·19 Şub

Future Work: - Integrating latent compaction into inference engines (e.g. RadixAttention, varlen storage, disaggregated compaction) - Online compaction — compacting mid-trajectory repeatedly to support arbitrarily long sequences. We show initial results but more work remains.

English

1

2

26

3.4K

Adam Zweiger

Keşfet