wagwan
2.4K posts

Sabitlenmiş Tweet
wagwan retweetledi
wagwan retweetledi

Qwen first release on interpretability (qwen scope) is very interesting
they use SAE features to identify what causes repetition in model outputs, then use steering to manufacture a "bad" rollout where the model repeats a lot. this gives RL a clear negative signal to learn from, since repetition barely shows up in normal rollouts so the model never gets punished for it
they also use SAE features as a fingerprint for benchmarks, you look at which features each benchmark activates and compare overlap. lets you find redundancy inside a benchmark and across benchmarks without running any model. for instance 63% of GSM8K features are in MATH but only 10% the other way

English

wagwan retweetledi
wagwan retweetledi
wagwan retweetledi

Cow posting will continue indefinitely.

Stuff of Bardic Legends@Zoomerjeet
Cow posting will continue indefinitely.
English
wagwan retweetledi
wagwan retweetledi










