Adam Zweiger (@AdamZweiger) - Twitter-Profil | Zamantika Mersobahis Locabet

Angehefteter Tweet

Adam Zweiger@AdamZweiger·19 Şub

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

20

126

805

68.9K

Adam Zweiger retweetet

Xinghong (Shin) Fu@shinfxh·13 Mar

just got claude to explain attention matching and it made this interactive heatmap to show the relative importance of each layer/head! this might just be better than the diagrams in our own paper...

English

1

5

57

2.9K

Adam Zweiger@AdamZweiger·7 Mar

@evnkimm nice work evan!

English

1

0

406

Evan Kim@evnkimm·6 Mar

How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)

English

13

19

167

33.2K

Adam Zweiger@AdamZweiger·2 Mar

New coding model from @cognition! I helped train it during my internship there. The team is very strong and this is just the beginning!

Cognition@cognition

We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.

English

4

5

95

9.4K

Adam Zweiger@AdamZweiger·28 Şub

Fun fact: Back in 2014, Demis had a red line condition for any potential acquisition of DeepMind: "no technology coming out of DeepMind will be used for military or intelligence purposes." Google accepting this more eagerly was part of why Demis chose them over Facebook. This red line is even broader than Dario's (no mass surveillance or fully autonomous weapons), though it was quietly removed by Google 1 year ago.

English

7

39

1K

80.9K

Adam Zweiger@AdamZweiger·20 Şub

@bendee983 Not yet. Two things are that most inference engines currently don't have a way of initializing directly with a KV cache, and they don't support disentangling logical cache size from physical size (which is needed for rope embeddings). These are all fixable though.

English

0

15

952

Ben Dickson@bendee983·19 Şub

@AdamZweiger This is impressive! Is it compatible with popular inference engines and current kernels? In other words, how easy is it to use it as a drop-in for whatever engine companies are using right now?

English

1

0

4

1.2K

Adam Zweiger@AdamZweiger·19 Şub

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

20

126

805

68.9K

Adam Zweiger retweetet

Xinghong (Shin) Fu@shinfxh·19 Şub

the solution to infinite context was just linear regression all along

English

32

112

1.6K

187.7K

Adam Zweiger@AdamZweiger·19 Şub

@ye_combinator thanks! were you trying gradient descent? I think for practical purposes (i.e. low number of query samples), subsetting keys is hard to beat

English

1

0

6

376

Zihao Ye@ye_combinator·19 Şub

Great work! I have explored similar ideas before: tried per-layer per-head fitting for both Ck/v, it works in tasks like needle-in-haystack but training cost is too high to make it practical in production :( would love to see how you plan to make online compaction efficient.

Adam Zweiger@AdamZweiger

We introduce a new approach for fast and high-quality context compaction in latent space. Attention Matching (AM) achieves 50× compaction in seconds with little performance loss, substantially outperforming summarization and other baselines.

English

1

4

45

4.3K

Adam Zweiger@AdamZweiger·19 Şub

This was joint work with amazing collaborators: @shinfxh @HanGuo97 @yoonrkim Paper: arxiv.org/abs/2602.16284 Code: github.com/adamzweiger/co…

English

0

1

31

2.4K

Adam Zweiger@AdamZweiger·19 Şub

Future Work: - Integrating latent compaction into inference engines (e.g. RadixAttention, varlen storage, disaggregated compaction) - Online compaction — compacting mid-trajectory repeatedly to support arbitrarily long sequences. We show initial results but more work remains.

English

1

22

2.5K

Adam Zweiger@AdamZweiger·11 Ara

Nice work on GAN-style training with a generator and discriminator, both trained with RL. This might be the path to improvement in domains without good verifiers like creative writing.

Locke Cai@couplefire12

RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇

English

0

13

1.7K

Adam Zweiger@AdamZweiger·2 Ara

Presenting Self-Adapting Language Models on Wednesday at NeurIPS. We equip an LLM with the ability to write training data for itself in response to new inputs. We then meta-learn this ability with RL. Stop by to chat! 11-2 pm, #3415, with @jyo_pari @HanGuo97 @akyurekekin

English

4

5

60

4.4K

Adam Zweiger retweetet

Zitong Yang@ZitongYang0·22 Eyl

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

English

10

46

255

41.2K

Adam Zweiger

Entdecken