Ashutosh Srivastava

131 posts

Ashutosh Srivastava

@h4shkat

ML Research Associate @Adobe | @iitroorkee '25 | Secretary of @InfoSecIITR | Developer at @SDSLabs

Katılım Nisan 2022

425 Takip Edilen339 Takipçiler

Ashutosh Srivastava@h4shkat·4d

@27upon2 Congratulations!

English

Sriraam@27upon2·5d

Moved to NY to work on RL. Would like to meet ppl. I take bad pics and like good food

English

195

10.9K

Ashutosh Srivastava@h4shkat·1 May

ZXX

Ashutosh Srivastava@h4shkat·30 Nis

@eliebakouch This tool would be a very useful “cheatsheet” for post training runs especially on Qwen

English

869

elie@eliebakouch·30 Nis

Qwen first release on interpretability (qwen scope) is very interesting they use SAE features to identify what causes repetition in model outputs, then use steering to manufacture a "bad" rollout where the model repeats a lot. this gives RL a clear negative signal to learn from, since repetition barely shows up in normal rollouts so the model never gets punished for it they also use SAE features as a fingerprint for benchmarks, you look at which features each benchmark activates and compare overlap. lets you find redundancy inside a benchmark and across benchmarks without running any model. for instance 63% of GSM8K features are in MATH but only 10% the other way

English

118

785

40.3K

Ashutosh Srivastava@h4shkat·26 Nis

One of the most exciting applications of the Attention Matching paper is that compaction becomes task-specific when you change the reference queries (Q_ref). Same document → different agent prompts → completely different tokens get kept. This is exactly what @RampLabs did in their Latent Briefing Algorithm, where each sub-agent (based on the RLM architecture by @a1zhang) would receive its own version of the compacted cache based on its task. Live demo in the notebook explains this concept in detail (drag the task dropdown and watch different tokens in the article light up differently):

Ashutosh Srivastava@h4shkat

Recently built this molab Notebook (@marimo_io × @askalphaxiv) implementing a fully interactive explainer for the paper “Fast KV Compaction via Attention Matching” by @AdamZweiger et al at @MIT_CSAIL. For a 1409 token article and at 20% KV cache (keep ratio), Qwen3-4B still gets 6/6 MCQs right + ~99% verbatim recall using only matrix algebra. Thread with the coolest parts👇

English

752

Ashutosh Srivastava@h4shkat·26 Nis

@AiDevCraft @marimo_io @askalphaxiv @AdamZweiger @MIT_CSAIL Totally makes sense, thats why this is best suited for tasks which requires multiple parallel agent workflows, where each sub agent can work with the smaller and compacted version of the context relevant to its specific subtask.

English

AiDevCraft@AiDevCraft·26 Nis

The (K,V,Qref) framing is the non-obvious bit — compaction quality is conditioned on the calibration query distribution, so the same 20%-cache that holds 99% recall on Qref-aligned tasks can collapse off-domain. Means in deployment you're really shipping a (model, Qref) pair, and Qref drift becomes a silent failure mode that looks like "model got dumber".

English

Ashutosh Srivastava@h4shkat·26 Nis

English

3.2K

Ashutosh Srivastava@h4shkat·26 Nis

@gf_256 @RampLabs In terms of efficiency its a step up, and in terms of interpretability we have already seen in Anthropic’s study that CoT in the token space is misleading anyway. That said, I do agree that better interpretable methods for understanding the latent space should be developed

English

cts🌸@gf_256·11 Nis

@RampLabs Isn’t this neuralese the thing we’re not supposed to build

English

2.5K

Ramp Labs@RampLabs·10 Nis

Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.

English

1.8K

667.5K

Ashutosh Srivastava@h4shkat·26 Nis

@sakurayukiai @marimo_io @askalphaxiv @AdamZweiger @MIT_CSAIL Exactly! This new research is directly useful for prod applications serving thousands of customers

English

Sakura Yuki@sakurayukiai·26 Nis

@h4shkat @marimo_io @askalphaxiv @AdamZweiger @MIT_CSAIL The real unlock here isn't the keep ratio, it's bypassing gradient descent for closed-form solutions. Compacting the KV cache in seconds instead of hours is the only way this actually works in prod.

English

151

Ashutosh Srivastava@h4shkat·26 Nis

When I first heard about Molab, I thought it was just another Python notebook. But while working on reproducing the Fast KV Compaction paper for the "Bring Research to Life" competition, I have grown to be extremely fond of it. It truly is amazing how easy it is to visualize and interactively play around with the sliders and see real-time results. I believe every AI researcher should create a comprehensive notebook for their research contributions, along with their papers. It is extremely easy and intuitive to get the hang of it. Amazing work by the @marimo_io team! Attaching the Molab and Github links below, show some love! Molab Link: molab.marimo.io/notebooks/nb_m… Github: github.com/h4shk4t/compac…

English

118

Ashutosh Srivastava@h4shkat·26 Nis

Why does the paper call itself "Fast KV Compaction"? Because of the speed advantage over prior work; particularly Cartridges (Eyuboglu et al., 2025), which does end-to-end gradient descent. The paper's Figure 1 is a scatter plot of downstream accuracy vs. compaction time (log-scale), showing that AM methods trace the Pareto frontier. We reproduce the core idea here on a single real KV head: we time every algorithm at four keep ratios and plot cosine similarity vs. wall-clock compaction time. While AM-OMP at the upper right has the highest quality, it is also the slowest. For most budgets, AM-HighestAttnKeys is on the Pareto Frontier.

English

130

Ashutosh Srivastava@h4shkat·25 Nis

@willccbb 1-2 years ago finishing an assignment was “just gpt it bro”

English

379

will brown@willccbb·25 Nis

someone clauded all over this repo

Maddie D. Reese@maddiedreese

After Google won search, “google” became a verb. It’ll be interesting to see which AI lab turns its product into a verb first!

English

316

16.8K

Ashutosh Srivastava@h4shkat·22 Nis

@rasdani_ How did the agent find the golden commit though? Some bug or was the agent able to break out of some sandbox? Because if it was the latter it makes this even cooler

English

Daniel Auras@rasdani_·22 Nis

while porting our existing training setup to our new Harness and TaskSet abstraction (soon™) a regression was introduced. our agent gained access to the repo's full git log and quickly learnt to shortcut a solution with git show <golden_commit> | git apply

English

891

Daniel Auras@rasdani_·22 Nis

SWE reward hack in the wild 😈

hallerite@hallerite

First time we had reward hacking in a prod run. Very interesting.

English

7.1K

Ashutosh Srivastava@h4shkat·22 Nis

@rasdani_ This is so useful, so much easier to eyeball problems in the rollouts

English

Daniel Auras@rasdani_·22 Nis

we log all our rollouts to Lab's beautiful Rollout Viewer

English

5.6K

Ashutosh Srivastava@h4shkat·22 Nis

@0xpanicError @ether_fi Congratulations! 🥂

English

463

Yash@0xpanicError·22 Nis

Career update: I’ve joined @ether_fi as a Software Engineer. Crazy team, crazy office, crazy product (and crazy first week). I’ll be contributing towards the protocol and security. If you have any questions or want to learn anything about EtherFi, feel free to reach out ❤️

English

481

19K

Ashutosh Srivastava@h4shkat·22 Nis

I agree. A great analogy I read somewhere was that in the previous centuries most of the tasks were physically intensive, and industrial revolution and the IT revolution changed that. After the revolution, we plunged into a sedentary lifestyle. To keep ourselves healthy and strong we invented the concept of gyms. Pretty soon the same thing would be applicable after the AI revolution where people would be just dedicating time to challenge themselves and learn more (effectively becoming better versions of themselves).

English

Matej Sirovatka@m_sirovatka·22 Nis

This is one of the cases where having a lot of "agency" yourself distinguishes you, not a lot of people have that and I wouldn't consider it a default. There are people who tend to ask questions and wait for guidance if they don't understand (which is fine), but doesn't really work now since most would just get annoyed and prompt it away with claude

English

244

Matej Sirovatka@m_sirovatka·22 Nis

This approach from core auto highly resonates. Recently found myself way less inclined to share my problems with others, ask for OS contributions but instead went the “hack it away with claude” way. I feel like this is gonna become an issue as this kills senior people helping junior people, instead they can just hack it away. I’ve had several long discussions with a lot of my friends about this and from the GPU Mode experience it really feels like you kinda lose the “middle class” and its either very hard for people to learn/contribute or there is a bunch of people who are just cracked of their mind and agents only help them. TLDR: it’s easier to ask ai than to ask a junior person, but how do juniors learn without having insane drive themselves?

English

Keşfet

@27upon2 @eliebakouch @RampLabs @a1zhang @AiDevCraft @marimo_io @askalphaxiv @AdamZweiger