Ani Aggarwal

25 posts

Ani Aggarwal

Ani Aggarwal

@AnirudAgg

Vision AI researcher | Applying for 2026 PhD | CS + Math from UMD | I like vision

San Francisco, CA เข้าร่วม Temmuz 2014
41 กำลังติดตาม20 ผู้ติดตาม
ทวีตที่ปักหมุด
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
🧵 Your DiT, faster Introducing ECAD: we reframe diffusion model caching as multi-objective optimization and evolve Pareto-optimal schedules via a genetic algorithm—achieving 4.47 FID gain at 2.58× speedup, with no retraining or tuning. 🔗 aniaggarwal.github.io/ecad #MachineLearning
Ani Aggarwal tweet media
English
2
1
13
1.8K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@LunjunZhang Agreed on this. I'm very bullish on using EAs in particular in areas that gradients are expensive / impossible to commute and RL would be slow or unstable (see my paper ECAD). Combining them is super clever :)
English
0
0
1
157
Lunjun Zhang
Lunjun Zhang@LunjunZhang·
RL optimizes weights. Evolution optimizes contexts. What if we combine RL and Evolutionary Algorithm (EA) into a new paradigm of LLM self-improvement? In "Evolutionary System Prompt Learning for Reinforcement Learning in LLMs", we show that RL and EA are deeply synergistic.
Lunjun Zhang tweet media
English
9
42
302
15.9K
Thomas Wimmer
Thomas Wimmer@wimmer_th·
AnyUp has been accepted to ICLR 2026 as an oral presentation! ⭐️ I'm looking forward to presenting it in Rio. If you're interested, please come by my talk or poster. 🇧🇷
Thomas Wimmer@wimmer_th

Super excited to introduce ✨ AnyUp: Universal Feature Upsampling 🔎 Upsample any feature - really any feature - with the same upsampler, no need for cumbersome retraining. SOTA feature upsampling results while being feature-agnostic at inference time.

English
5
19
184
14.7K
Massimiliano Viola
Massimiliano Viola@massiviola01·
DINO features are the best in town and offer unmatched performance for dense vision tasks such as classification and detection. But in the current multimodal ML landscape, these features have a limitation: they lack a direct connection to natural language! Unlike CLIP-style models, DINO embeddings are not spatially aligned with text, and this means the model cannot understand queries, perform zero-shot classification/retrieval, or highlight all the patches matching a given concept. Fortunately for us, the DINO team addressed this issue with dino.txt, a simple approach to align frozen DINO backbones with a text encoder to solve tasks associated with a text prompt, both at the global and local levels. Think about CLIP and Segment Anything 3 abilities baked into a single model, and this was a couple of years ago! Curious to know how it's done and some practical takeaways for your everyday vision tasks? Check out the resources below!⏬ dino.txt: DINO Meets Text open.substack.com/pub/mlhonk/p/5… Paper: arxiv.org/abs/2412.16334…
Massimiliano Viola tweet media
English
3
29
344
16.2K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@massiviola01 Yes! We have a low memory mode that is super efficient in memory, but adds some latency (by running some operations in sequence rather than in parallel). We will add this to the released code repo this weekend! Thanks for your interest in our work 😁
English
1
0
0
26
Massimiliano Viola
Massimiliano Viola@massiviola01·
@AnirudAgg Yes, I am a bit familiar with this line of work, and it's great offline. The problem I noticed is that it eats a lot of GPU memory even for a 224 image, not ideal for production inference with lots of cameras! Do you know if there is anything that adds only a minimal overhead?
English
1
0
3
103
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
Technically more than one line of code 😁
Ani Aggarwal tweet media
English
0
0
2
21
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
Super excited to announce our paper! Upsample any latents (DINO, VAE, etc.) in linear time (compared to quadratic cross attention). Our models are all available on Hugging Face and Torch Hub with just one line of code! (please star the GitHub repo 🥺) huggingface.co/UPLiFT-upsampl…
Matthew Walmer@MatthewWalmer

We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below. Coauthors: @_sakshams_ @AnirudAgg @abhi2610 🧵[1/6]

English
2
2
3
351
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@HrishbhDalal @MatthewWalmer @_sakshams_ @abhi2610 Agreed, but even without a larger size, I'm of the opinion that using a modern VAE and multi scale image generator for training (FLUX, maybe?, which has larger latents) would achieve really strong image super res, super efficiently.
English
1
0
1
47
Ani Aggarwal รีทวีตแล้ว
Matthew Walmer
Matthew Walmer@MatthewWalmer·
We’re excited to announce UPLiFT, our lightweight, pixel-dense feature upsampler. UPLiFT boosts feature density, preserves semantics, and has better efficiency scaling than recent SOTA methods. See all links in the thread below. Coauthors: @_sakshams_ @AnirudAgg @abhi2610 🧵[1/6]
Matthew Walmer tweet media
English
8
52
393
19.2K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@prodarhan This is sick, love the website side by sides and clean animations 😍
English
0
0
0
47
Arhan Jain
Arhan Jain@prodarhan·
Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation. polaris-evals.github.io (1/N 🧵)
English
8
48
236
64.5K
Qianqian Wang
Qianqian Wang@QianqianWang5·
I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.
English
25
152
923
175K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@prodarhan @IamKyros69 Only if you first reconceptualize diffusion caching as a multi-objective optimization problem and apply a genetic algorithm to find an optimal speed quality trade-off to do inference with
English
0
0
1
20
Kyros
Kyros@IamKyros69·
Why is everyone in tech using a ThinkPad?
Kyros tweet media
English
1.1K
247
7.2K
615.9K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@LogitechG No thanks my secondhand G502 is going strong 7 years later
English
0
0
0
2
Logitech G
Logitech G@LogitechG·
Imagine pulling up to this house
Logitech G tweet media
English
747
889
44.8K
1M
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
@psandovalsegura Super interesting! Also seeing computational redundancy in DiTs. Some tokens cache well across timesteps (arXiv:2410.05317) and some entire blocks can be skipped (hasty plot). Maybe they could be pruned like your attention heads? Though uncertain of its practical application haha
Ani Aggarwal tweet media
English
0
0
0
7
Pedro Sandoval
Pedro Sandoval@psandovalsegura·
Attention sinks in LLMs are weird. There’s ~20% of heads that don’t seem to do anything. Do these heads matter? Turns out that if we get rid of them, benchmark scores don’t change.
Pedro Sandoval tweet media
English
3
21
182
24.9K
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
ECAD discovers more intricate caching schedules than heuristics can design. This PixArt-α schedule produced the previous images (using 20 diffusion steps)! Red = cached components, gray = recomputed.
Ani Aggarwal tweet media
English
1
0
1
91
Ani Aggarwal
Ani Aggarwal@AnirudAgg·
🧵 Your DiT, faster Introducing ECAD: we reframe diffusion model caching as multi-objective optimization and evolve Pareto-optimal schedules via a genetic algorithm—achieving 4.47 FID gain at 2.58× speedup, with no retraining or tuning. 🔗 aniaggarwal.github.io/ecad #MachineLearning
Ani Aggarwal tweet media
English
2
1
13
1.8K