Ashutosh Srivastava

131 posts

Ashutosh Srivastava

Ashutosh Srivastava

@h4shkat

ML Research Associate @Adobe | @iitroorkee '25 | Secretary of @InfoSecIITR | Developer at @SDSLabs

Katılım Nisan 2022
425 Takip Edilen339 Takipçiler
Sriraam
Sriraam@27upon2·
Moved to NY to work on RL. Would like to meet ppl. I take bad pics and like good food
Sriraam tweet media
English
35
0
195
10.9K
elie
elie@eliebakouch·
Qwen first release on interpretability (qwen scope) is very interesting they use SAE features to identify what causes repetition in model outputs, then use steering to manufacture a "bad" rollout where the model repeats a lot. this gives RL a clear negative signal to learn from, since repetition barely shows up in normal rollouts so the model never gets punished for it they also use SAE features as a fingerprint for benchmarks, you look at which features each benchmark activates and compare overlap. lets you find redundancy inside a benchmark and across benchmarks without running any model. for instance 63% of GSM8K features are in MATH but only 10% the other way
elie tweet media
English
14
118
785
40.3K
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
One of the most exciting applications of the Attention Matching paper is that compaction becomes task-specific when you change the reference queries (Q_ref). Same document → different agent prompts → completely different tokens get kept. This is exactly what @RampLabs did in their Latent Briefing Algorithm, where each sub-agent (based on the RLM architecture by @a1zhang) would receive its own version of the compacted cache based on its task. Live demo in the notebook explains this concept in detail (drag the task dropdown and watch different tokens in the article light up differently):
Ashutosh Srivastava@h4shkat

Recently built this molab Notebook (@marimo_io × @askalphaxiv) implementing a fully interactive explainer for the paper “Fast KV Compaction via Attention Matching” by @AdamZweiger et al at @MIT_CSAIL. For a 1409 token article and at 20% KV cache (keep ratio), Qwen3-4B still gets 6/6 MCQs right + ~99% verbatim recall using only matrix algebra. Thread with the coolest parts👇

English
3
0
4
752
AiDevCraft
AiDevCraft@AiDevCraft·
The (K,V,Qref) framing is the non-obvious bit — compaction quality is conditioned on the calibration query distribution, so the same 20%-cache that holds 99% recall on Qref-aligned tasks can collapse off-domain. Means in deployment you're really shipping a (model, Qref) pair, and Qref drift becomes a silent failure mode that looks like "model got dumber".
English
1
0
1
72
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
Recently built this molab Notebook (@marimo_io × @askalphaxiv) implementing a fully interactive explainer for the paper “Fast KV Compaction via Attention Matching” by @AdamZweiger et al at @MIT_CSAIL. For a 1409 token article and at 20% KV cache (keep ratio), Qwen3-4B still gets 6/6 MCQs right + ~99% verbatim recall using only matrix algebra. Thread with the coolest parts👇
English
11
5
40
3.2K
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
@gf_256 @RampLabs In terms of efficiency its a step up, and in terms of interpretability we have already seen in Anthropic’s study that CoT in the token space is misleading anyway. That said, I do agree that better interpretable methods for understanding the latent space should be developed
English
0
0
1
29
cts🌸
cts🌸@gf_256·
@RampLabs Isn’t this neuralese the thing we’re not supposed to build
English
2
0
13
2.5K
Ramp Labs
Ramp Labs@RampLabs·
Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.
English
37
92
1.8K
667.5K
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
When I first heard about Molab, I thought it was just another Python notebook. But while working on reproducing the Fast KV Compaction paper for the "Bring Research to Life" competition, I have grown to be extremely fond of it. It truly is amazing how easy it is to visualize and interactively play around with the sliders and see real-time results. I believe every AI researcher should create a comprehensive notebook for their research contributions, along with their papers. It is extremely easy and intuitive to get the hang of it. Amazing work by the @marimo_io team! Attaching the Molab and Github links below, show some love! Molab Link: molab.marimo.io/notebooks/nb_m… Github: github.com/h4shk4t/compac…
English
0
0
1
118
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
Why does the paper call itself "Fast KV Compaction"? Because of the speed advantage over prior work; particularly Cartridges (Eyuboglu et al., 2025), which does end-to-end gradient descent. The paper's Figure 1 is a scatter plot of downstream accuracy vs. compaction time (log-scale), showing that AM methods trace the Pareto frontier. We reproduce the core idea here on a single real KV head: we time every algorithm at four keep ratios and plot cosine similarity vs. wall-clock compaction time. While AM-OMP at the upper right has the highest quality, it is also the slowest. For most budgets, AM-HighestAttnKeys is on the Pareto Frontier.
Ashutosh Srivastava tweet mediaAshutosh Srivastava tweet media
English
1
0
0
130
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
@rasdani_ How did the agent find the golden commit though? Some bug or was the agent able to break out of some sandbox? Because if it was the latter it makes this even cooler
English
0
0
0
13
Daniel Auras
Daniel Auras@rasdani_·
while porting our existing training setup to our new Harness and TaskSet abstraction (soon™) a regression was introduced. our agent gained access to the repo's full git log and quickly learnt to shortcut a solution with git show <golden_commit> | git apply
English
2
0
22
891
Daniel Auras
Daniel Auras@rasdani_·
we log all our rollouts to Lab's beautiful Rollout Viewer
Daniel Auras tweet media
English
2
0
25
5.6K
Yash
Yash@0xpanicError·
Career update: I’ve joined @ether_fi as a Software Engineer. Crazy team, crazy office, crazy product (and crazy first week). I’ll be contributing towards the protocol and security. If you have any questions or want to learn anything about EtherFi, feel free to reach out ❤️
Yash tweet media
English
71
6
481
19K
Ashutosh Srivastava
Ashutosh Srivastava@h4shkat·
I agree. A great analogy I read somewhere was that in the previous centuries most of the tasks were physically intensive, and industrial revolution and the IT revolution changed that. After the revolution, we plunged into a sedentary lifestyle. To keep ourselves healthy and strong we invented the concept of gyms. Pretty soon the same thing would be applicable after the AI revolution where people would be just dedicating time to challenge themselves and learn more (effectively becoming better versions of themselves).
English
0
0
0
48
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
This is one of the cases where having a lot of "agency" yourself distinguishes you, not a lot of people have that and I wouldn't consider it a default. There are people who tend to ask questions and wait for guidance if they don't understand (which is fine), but doesn't really work now since most would just get annoyed and prompt it away with claude
English
1
0
3
244
Matej Sirovatka
Matej Sirovatka@m_sirovatka·
This approach from core auto highly resonates. Recently found myself way less inclined to share my problems with others, ask for OS contributions but instead went the “hack it away with claude” way. I feel like this is gonna become an issue as this kills senior people helping junior people, instead they can just hack it away. I’ve had several long discussions with a lot of my friends about this and from the GPU Mode experience it really feels like you kinda lose the “middle class” and its either very hard for people to learn/contribute or there is a bunch of people who are just cracked of their mind and agents only help them. TLDR: it’s easier to ask ai than to ask a junior person, but how do juniors learn without having insane drive themselves?
Matej Sirovatka tweet media
English
8
2
94
6K