Monishwaran Maheswaran

Harman Singh @ ICLR 🇧🇷@Harman26Singh

1

73

Monishwaran Maheswaran retweetledi

Harman Singh @ ICLR 🇧🇷@Harman26Singh·4d

V1 is accepted to #ICML. Check the thread below for the algorithm details of: 1. Self-verification, to significantly improve pass@1 performance for reasoning and agentic tasks. 2. Post-training your policy LLM to be a better generative verifier (GRM) for better test-time scaling during inference (as also mentioned by the DeepSeek V4 report) Significant updates soon: V1 inference and post-training extends to non-verifiable domains, can be applied easily in any external verification setting beyond self-verification, and leads to SOTA methods for Agent verification.

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English

3

8

47

3.5K

Monishwaran Maheswaran@sudomonish·4d

@HaochengXiUCB Congratulations @HaochengXiUCB

English

Haocheng Xi@HaochengXiUCB

0

2

44

Monishwaran Maheswaran retweetledi

Haocheng Xi@HaochengXiUCB·4d

🎉 Update: Quant VideoGen is accepted to ICML 2026! Long-horizon video is memory-hungry. We believe efficient KV caching will be a key piece of scaling it further. Huge thanks to my wonderful collaborators and advisors for all the support and guidance. See you at ICML!

🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: arxiv.org/abs/2602.02958 Code: github.com/svg-project/Qu… (1/5)

English

2

7

47

3.9K

Monishwaran Maheswaran retweetledi

Haocheng Xi@HaochengXiUCB·6d

🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: arxiv.org/abs/2602.02958 Code: github.com/svg-project/Qu… (1/5)

English

11

53

265

60.9K

Monishwaran Maheswaran@sudomonish·24 Nis

85% is semi-private. On public eval, GPT-5.5's best is 90.56%. Squeeze Evolve: 97.5% / $7.74 per task — +6.9pp ahead, cheaper than GPT-5.5 Pro. Verifier-free and no code execution. 📄 Paper: arxiv.org/abs/2604.07725 🌐 Site: squeeze-evolve.github.io 📝 Blog: #blog-squeeze-evolve" target="_blank" rel="nofollow noopener">squeeze-evolve.github.io/#blog-squeeze-…

ARC Prize@arcprize

GPT-5.5 on ARC-AGI (Verified) ARC-AGI-2: - Max: 85.0%, $1.87 - High: 83.3%, $1.45 - Med: 70.4%, $0.86 - Low: 33%, $0.35 GPT-5.5 is now state of the art on ARC-AGI-2

English

8

1.2K

Monishwaran Maheswaran@sudomonish·23 Nis

@arcprize @fchollet Ahem ahem… Squeeze evolve 97.5% on ARC-AGI-2…. x.com/sudomonish/sta…

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

17

3.5K

ARC Prize@arcprize·23 Nis

GPT-5.5 on ARC-AGI (Verified) ARC-AGI-2: - Max: 85.0%, $1.87 - High: 83.3%, $1.45 - Med: 70.4%, $0.86 - Low: 33%, $0.35 GPT-5.5 is now state of the art on ARC-AGI-2

English

48

221

2.1K

275K

Monishwaran Maheswaran retweetledi

Wenhao Chai@wenhaocha1·11 Nis

Great progress on BabyVision!

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

4

10

3.2K

Monishwaran Maheswaran@sudomonish·12 Nis

Blog: squeeze-evolve.github.io Paper: arxiv.org/abs/2604.07725 Evolutionary search for discovery. We were optimizing for cheaper discovery. Ended up nearer to human-level than GPT-5. @arcprize ARC-AGI-2: 100% - $17/task (Human) 97.5% - $7.74/task (Squeeze-Evolve) (sota) 92.2% - $17.60/task (GPT-5-Pro) Cheaper than humans. Stronger than GPT-5.

English

3

327

DAIR.AI@dair_ai·12 Nis

The Top AI Papers of the Week (April 6 - 12) - Memento - Neural Computers - The Universal Verifier - Agent Skills in the Wild - Memory Intelligence Agent (MIA) - Single-Agent vs Multi-Agent LLMs - Scaling Coding Agents via Atomic Skills Read on for more:

DAIR.AI@dair_ai

x.com/i/article/2042…

English

21

53

423

64.3K

Monishwaran Maheswaran@sudomonish·12 Nis

@AlphaSignalAI Evolutionary strategies are very expensive x.com/sudomonish/sta…

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

113

AlphaSignal AI@AlphaSignalAI·10 Nis

NVIDIA just trained a 14-billion-parameter AI using evolution, not calculus. Every AI today learns through backpropagation. It computes gradients, adjusts weights, repeats. It works, but it demands precision hardware and enormous GPU clusters. Evolution Strategies offered an alternative. Mutate the model, test it, keep what works. Like biological evolution. The problem was speed. Random mutations on GPUs were painfully slow. EGGROLL fixes this with one trick. It splits huge random matrices into two small ones per mutation. The model mutates, tests, and keeps what works. Hundreds of thousands of mutations run at once. > 100x faster training throughput > 91% speed of pure inference > Pretrains models using only integers > Competitive with backprop on reasoning > Works on non-differentiable systems It pretrained a language model from scratch using zero gradients. It also matched reinforcement learning methods on math reasoning tasks. Everyone kept scaling the calculus to train massive AIs. It turns out, we just needed to evolve.

English

51

90

666

71.3K

Monishwaran Maheswaran retweetledi

Zhongzhu Zhou@ZhongzhuZhou·11 Nis

Really great collaboration — huge thanks to @Monish and @Chenfeng. We’ve been rethinking scaling: not just bigger models, but better use of compute. Squeeze Evolve: - achieves SoTA (97.5% on ARC-AGI v2) - ~3× lower cost, ~10× throughput multi-model orchestration instead of a single model - small models handle most of the search; strong models used when it matters verifier-free, enabling domains where verification is expensive Turns out, weak models can be strong aggregators

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

3

718

Monishwaran Maheswaran retweetledi

Shijia Yang@ShijiaYangBron·11 Nis

The first unified test-time scaling framework for MLLMs (super low cost!!), with 79.1% on MMMU-Pro for a new open-source SoTA 🔥 Huge thanks to the team 🙌

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

1

4

291

Monishwaran Maheswaran retweetledi

Aran Komatsuzaki@arankomatsuzaki·10 Nis

Squeeze Evolve: A Unified Framework for Verifier-Free Evolution Across AIME 2025, GPQA-Diamond, ARC-AGI-V2, MMMU-Pro, etc: - Up to ~3x API cost reduction - Up to ~10x increase in fixed-budget serving throughput

English

2

8

64

8.7K

Monishwaran Maheswaran retweetledi

Chenfeng_X@Chenfeng_X·11 Nis

Test-time scaling with evolutionary search can cost hundreds of times more tokens than a single-shot generation. 😨 We built a unified framework for customizable evolutionary inference, with plug-and-play operators for selection, fitness estimation, and recombination. Check here: squeeze-evolve.github.io 😎 More importantly, it cuts cost by orchestrating heterogeneous models, delivering up to 10× higher throughput while saving serious $$$ 💰. A user-friendly API will come soon! 💪

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

6

56

6.4K

Monishwaran Maheswaran retweetledi

Harman Singh @ ICLR 🇧🇷@Harman26Singh·11 Nis

Super excited about Squeeze Evolve by @sudomonish Squeeze Evolve pushes the Pareto frontier of efficiency vs. intelligence in verifier-free test-time scaling using multi-model orchestration. Verifiers are expensive, slow, or not available for the hardest and most important problems of our time, making Squeeze Evolve and verifier-free test-time scaling a dominant paradigm going forward.

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

3

9

1.7K

Monishwaran Maheswaran retweetledi

Ben Athiwaratkun@ben_athi·11 Nis

We're sharing an important progress in evolution via an efficient framework Squeeze Evolve > not only improving capabilities, achieving SOTA @ 97.5% on ARC-AGI v2 > but also resulting in cost effectiveness (~3x lower than other methods) > the key to this is using a multi-model framework > multiple models bring diversity to sample solutions > as well as allowing the use of smaller models on easier evolutionary steps > verifier-free, which is crucial for tasks where verification is time-consuming > a unified framework that combines all previous methods (Alpha Evolve, majority voting, self-refinement, RSA, and Mixture of Agents)

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

2

13

1.1K

Monishwaran Maheswaran@sudomonish·11 Nis

First verifier-free method to exceed verifier-based performance on a discovery task. Joint work with @leonlakhani @ZhongzhuZhou @ShijiaYangBron @_junxiong_wang @coleman_hooper1 @yuezhouhu @rish2k1 @JueWANG26088228 @Harman26Singh @QingyangWu1 @jianyuqing1 @ce_zhang @KurtKeutzer @tri_dao @XiaoxiaWShirley @ben_athi @james_y_zou @Chenfeng_X Paper: arxiv.org/abs/2604.07725 Code: github.com/squeeze-evolve… Project page: squeeze-evolve.github.io

English

8

765

Monishwaran Maheswaran@sudomonish·11 Nis

97.5% on ARC-AGI-V2 at $7.74/task — new SoTA. 95.4% on AIME 2025 — exceeding GPT-5 mini, at 1.8x savings. 79.1% on MMMU-Pro at 2.3x savings. Exceeds AlphaEvolve on circle packing — no code execution, no verifier, and no closed-source models.

English