Ani Aggarwal (@AnirudAgg) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Ani Aggarwal@AnirudAgg·19 Haz

🧵 Your DiT, faster Introducing ECAD: we reframe diffusion model caching as multi-objective optimization and evolve Pareto-optimal schedules via a genetic algorithm—achieving 4.47 FID gain at 2.58× speedup, with no retraining or tuning. 🔗 aniaggarwal.github.io/ecad #MachineLearning

English

2

1

13

1.8K

Ani Aggarwal@AnirudAgg·27 Şub

@LunjunZhang Agreed on this. I'm very bullish on using EAs in particular in areas that gradients are expensive / impossible to commute and RL would be slow or unstable (see my paper ECAD). Combining them is super clever :)

English

0

1

157

Lunjun Zhang@LunjunZhang·26 Şub

RL optimizes weights. Evolution optimizes contexts. What if we combine RL and Evolutionary Algorithm (EA) into a new paradigm of LLM self-improvement? In "Evolutionary System Prompt Learning for Reinforcement Learning in LLMs", we show that RL and EA are deeply synergistic.

English

9

42

302

15.9K

Ani Aggarwal@AnirudAgg·27 Şub

@wimmer_th Congrats! Would love to chat in Rio!

English

0

1

26

Thomas Wimmer@wimmer_th·25 Şub

AnyUp has been accepted to ICLR 2026 as an oral presentation! ⭐️ I'm looking forward to presenting it in Rio. If you're interested, please come by my talk or poster. 🇧🇷

Thomas Wimmer@wimmer_th

Super excited to introduce ✨ AnyUp: Universal Feature Upsampling 🔎 Upsample any feature - really any feature - with the same upsampler, no need for cumbersome retraining. SOTA feature upsampling results while being feature-agnostic at inference time.

English

5

19

184

14.7K

Ani Aggarwal@AnirudAgg·24 Şub

🦖 Pixel 🦕 dense 🦖 DINO 🦕 features and image super resolution 🖼️!! #CVPR2026

Matthew Walmer@MatthewWalmer

Excited to announce that UPLiFT has been accepted to #CVPR2026! You can also try out UPLiFT right now to extract pixel-dense DINOv3 features with our pretrained models linked below! Code: github.com/mwalmer-umd/UP… Paper: arxiv.org/abs/2601.17950 Website: cs.umd.edu/~mwalmer/uplif…

English

0

1

4

85

Ani Aggarwal@AnirudAgg·2 Şub

@massiviola01 Low memory mode is live!

English

0

18

Massimiliano Viola@massiviola01·29 Oca

@AnirudAgg Ok nice, definitely will take a look again!

English

1

0

16

Massimiliano Viola@massiviola01·28 Oca

DINO features are the best in town and offer unmatched performance for dense vision tasks such as classification and detection. But in the current multimodal ML landscape, these features have a limitation: they lack a direct connection to natural language! Unlike CLIP-style models, DINO embeddings are not spatially aligned with text, and this means the model cannot understand queries, perform zero-shot classification/retrieval, or highlight all the patches matching a given concept. Fortunately for us, the DINO team addressed this issue with dino.txt, a simple approach to align frozen DINO backbones with a text encoder to solve tasks associated with a text prompt, both at the global and local levels. Think about CLIP and Segment Anything 3 abilities baked into a single model, and this was a couple of years ago! Curious to know how it's done and some practical takeaways for your everyday vision tasks? Check out the resources below!⏬ dino.txt: DINO Meets Text open.substack.com/pub/mlhonk/p/5… Paper: arxiv.org/abs/2412.16334…

English

3

29

344

16.2K

Ani Aggarwal@AnirudAgg·29 Oca

@massiviola01 Yes! We have a low memory mode that is super efficient in memory, but adds some latency (by running some operations in sequence rather than in parallel). We will add this to the released code repo this weekend! Thanks for your interest in our work 😁

English

1

0

26

Massimiliano Viola@massiviola01·29 Oca

@AnirudAgg Yes, I am a bit familiar with this line of work, and it's great offline. The problem I noticed is that it eats a lot of GPU memory even for a 224 image, not ideal for production inference with lots of cameras! Do you know if there is anything that adds only a minimal overhead?

English

1

0

3

103

Ani Aggarwal@AnirudAgg·28 Oca

🎉 Thrilled to share that my first research paper has been accepted to #ICLR2026! 🎉 I’ll be attending in person in Rio and would love to connect with others! 🇧🇷

Ani Aggarwal@AnirudAgg

🧵 Your DiT, faster Introducing ECAD: we reframe diffusion model caching as multi-objective optimization and evolve Pareto-optimal schedules via a genetic algorithm—achieving 4.47 FID gain at 2.58× speedup, with no retraining or tuning. 🔗 aniaggarwal.github.io/ecad #MachineLearning

English

0

1

2

62

Ani Aggarwal@AnirudAgg·28 Oca

Technically more than one line of code 😁

English

0

2

21

Ani Aggarwal@AnirudAgg·28 Oca

Super excited to announce our paper! Upsample any latents (DINO, VAE, etc.) in linear time (compared to quadratic cross attention). Our models are all available on Hugging Face and Torch Hub with just one line of code! (please star the GitHub repo 🥺) huggingface.co/UPLiFT-upsampl…

Matthew Walmer@MatthewWalmer

We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below. Coauthors: @_sakshams_ @AnirudAgg @abhi2610 🧵[1/6]

English

2

3

351

Ani Aggarwal@AnirudAgg·28 Oca

@HrishbhDalal @MatthewWalmer @_sakshams_ @abhi2610 Agreed, but even without a larger size, I'm of the opinion that using a modern VAE and multi scale image generator for training (FLUX, maybe?, which has larger latents) would achieve really strong image super res, super efficiently.

English

1

0

1

47

Hrishbh Dalal@HrishbhDalal·28 Oca

@MatthewWalmer @_sakshams_ @AnirudAgg @abhi2610 i think if you added a layer of attention and 2-4 decoder layers of moderate width, you should be able to convert this to complete images quite fast...

English

1

0

1

347

Ani Aggarwal รีทวีตแล้ว

Matthew Walmer@MatthewWalmer·27 Oca

We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below. Coauthors: @_sakshams_ @AnirudAgg @abhi2610 🧵[1/6]

English

8

52

393

19.2K

Ani Aggarwal@AnirudAgg·24 Ara

@prodarhan This is sick, love the website side by sides and clean animations 😍

English

0

47

Arhan Jain@prodarhan·18 Ara

Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation. polaris-evals.github.io (1/N 🧵)

English

8

48

236

64.5K

Ani Aggarwal@AnirudAgg·5 Ara

@QianqianWang5 Super interested and will definitely be applying! 😁

English

0

445

Qianqian Wang@QianqianWang5·3 Ara

I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.

English

25

152

923

175K

Ani Aggarwal@AnirudAgg·2 Kas

@prodarhan @IamKyros69 Only if you first reconceptualize diffusion caching as a multi-objective optimization problem and apply a genetic algorithm to find an optimal speed quality trade-off to do inference with

English

0

1

20

Arhan Jain@prodarhan·31 Eki

@AnirudAgg @IamKyros69 heard xps is rlly good for efficient inference for diffusion models via caching

English

1

0

35

Kyros@IamKyros69·30 Eki

Why is everyone in tech using a ThinkPad?

English

1.1K

247

7.2K

615.9K

Ani Aggarwal@AnirudAgg·2 Kas

@LogitechG No thanks my secondhand G502 is going strong 7 years later

English

0

2

Logitech G@LogitechG·1 Kas

Imagine pulling up to this house

English

747

889

44.8K

1M

Ani Aggarwal@AnirudAgg·28 Eki

@prodarhan can we get a comparison to Mr Pilu as a baseline??

English

1

0

1

28

Arhan Jain@prodarhan·27 Eki

Checkout Mateo’s work! He put a lot of effort into the videos and figures. Lotta fun to play around with the gradio too

Mateo Guaman Castro@mateoguaman

How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision? Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation"! 📜 Paper: arxiv.org/abs/2510.20818 🌐 Website: vamos-vla.github.io

English

1

7

658

Ani Aggarwal@AnirudAgg·13 Tem

@psandovalsegura Super interesting! Also seeing computational redundancy in DiTs. Some tokens cache well across timesteps (arXiv:2410.05317) and some entire blocks can be skipped (hasty plot). Maybe they could be pruned like your attention heads? Though uncertain of its practical application haha

English

0

7

Pedro Sandoval@psandovalsegura·8 Nis

Attention sinks in LLMs are weird. There’s ~20% of heads that don’t seem to do anything. Do these heads matter? Turns out that if we get rid of them, benchmark scores don’t change.

English

3

21

182

24.9K

Ani Aggarwal@AnirudAgg·19 Haz

Super grateful to co-authors @abhi2610 & Matthew Gwilliam. Preprint, code, and more results: aniaggarwal.github.io/ecad. Feedback greatly appreciated! 👇

English

0

2

89

Ani Aggarwal@AnirudAgg·19 Haz

ECAD discovers more intricate caching schedules than heuristics can design. This PixArt-α schedule produced the previous images (using 20 diffusion steps)! Red = cached components, gray = recomputed.

English

1

0

1

91

Ani Aggarwal@AnirudAgg·19 Haz

🧵 Your DiT, faster Introducing ECAD: we reframe diffusion model caching as multi-objective optimization and evolve Pareto-optimal schedules via a genetic algorithm—achieving 4.47 FID gain at 2.58× speedup, with no retraining or tuning. 🔗 aniaggarwal.github.io/ecad #MachineLearning

English

2

1

13

1.8K

Ani Aggarwal

ค้นพบ