Monishwaran Maheswaran

36 posts

Monishwaran Maheswaran banner
Monishwaran Maheswaran

Monishwaran Maheswaran

@sudomonish

@Berkeley_EECS + Math | AI Research @berkeley_ai | Nuclear Research @BerkeleyLab | Building hyper-intelligent machines.

California Katılım Eylül 2018
386 Takip Edilen567 Takipçiler
Sabitlenmiş Tweet
Monishwaran Maheswaran
Monishwaran Maheswaran@sudomonish·
Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io
Monishwaran Maheswaran tweet media
English
3
15
97
110.3K
Monishwaran Maheswaran retweetledi
Harman Singh @ ICLR 🇧🇷
V1 is accepted to #ICML. Check the thread below for the algorithm details of: 1. Self-verification, to significantly improve pass@1 performance for reasoning and agentic tasks. 2. Post-training your policy LLM to be a better generative verifier (GRM) for better test-time scaling during inference (as also mentioned by the DeepSeek V4 report) Significant updates soon: V1 inference and post-training extends to non-verifiable domains, can be applied easily in any external verification setting beyond self-verification, and leads to SOTA methods for Agent verification.
Harman Singh @ ICLR 🇧🇷@Harman26Singh

Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇

English
3
8
47
3.5K
Monishwaran Maheswaran retweetledi
Monishwaran Maheswaran retweetledi
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: arxiv.org/abs/2602.02958 Code: github.com/svg-project/Qu… (1/5)
English
11
53
265
60.9K
Monishwaran Maheswaran
Monishwaran Maheswaran@sudomonish·
85% is semi-private. On public eval, GPT-5.5's best is 90.56%. Squeeze Evolve: 97.5% / $7.74 per task — +6.9pp ahead, cheaper than GPT-5.5 Pro. Verifier-free and no code execution. 📄 Paper: arxiv.org/abs/2604.07725 🌐 Site: squeeze-evolve.github.io 📝 Blog: #blog-squeeze-evolve" target="_blank" rel="nofollow noopener">squeeze-evolve.github.io/#blog-squeeze-…
Monishwaran Maheswaran tweet media
ARC Prize@arcprize

GPT-5.5 on ARC-AGI (Verified) ARC-AGI-2: - Max: 85.0%, $1.87 - High: 83.3%, $1.45 - Med: 70.4%, $0.86 - Low: 33%, $0.35 GPT-5.5 is now state of the art on ARC-AGI-2

English
1
1
8
1.2K
ARC Prize
ARC Prize@arcprize·
GPT-5.5 on ARC-AGI (Verified) ARC-AGI-2: - Max: 85.0%, $1.87 - High: 83.3%, $1.45 - Med: 70.4%, $0.86 - Low: 33%, $0.35 GPT-5.5 is now state of the art on ARC-AGI-2
ARC Prize tweet media
English
48
221
2.1K
275K
Monishwaran Maheswaran
Monishwaran Maheswaran@sudomonish·
Blog: squeeze-evolve.github.io Paper: arxiv.org/abs/2604.07725 Evolutionary search for discovery. We were optimizing for cheaper discovery. Ended up nearer to human-level than GPT-5. @arcprize ARC-AGI-2: 100% - $17/task (Human) 97.5% - $7.74/task (Squeeze-Evolve) (sota) 92.2% - $17.60/task (GPT-5-Pro) Cheaper than humans. Stronger than GPT-5.
English
0
0
3
327
DAIR.AI
DAIR.AI@dair_ai·
The Top AI Papers of the Week (April 6 - 12) - Memento - Neural Computers - The Universal Verifier - Agent Skills in the Wild - Memory Intelligence Agent (MIA) - Single-Agent vs Multi-Agent LLMs - Scaling Coding Agents via Atomic Skills Read on for more:
DAIR.AI@dair_ai

x.com/i/article/2042…

English
21
53
423
64.3K
AlphaSignal AI
AlphaSignal AI@AlphaSignalAI·
NVIDIA just trained a 14-billion-parameter AI using evolution, not calculus. Every AI today learns through backpropagation. It computes gradients, adjusts weights, repeats. It works, but it demands precision hardware and enormous GPU clusters. Evolution Strategies offered an alternative. Mutate the model, test it, keep what works. Like biological evolution. The problem was speed. Random mutations on GPUs were painfully slow. EGGROLL fixes this with one trick. It splits huge random matrices into two small ones per mutation. The model mutates, tests, and keeps what works. Hundreds of thousands of mutations run at once. > 100x faster training throughput > 91% speed of pure inference > Pretrains models using only integers > Competitive with backprop on reasoning > Works on non-differentiable systems It pretrained a language model from scratch using zero gradients. It also matched reinforcement learning methods on math reasoning tasks. Everyone kept scaling the calculus to train massive AIs. It turns out, we just needed to evolve.
AlphaSignal AI tweet media
English
51
90
666
71.3K
Monishwaran Maheswaran retweetledi
Zhongzhu Zhou
Zhongzhu Zhou@ZhongzhuZhou·
Really great collaboration — huge thanks to @Monish and @Chenfeng. We’ve been rethinking scaling: not just bigger models, but better use of compute. Squeeze Evolve: - achieves SoTA (97.5% on ARC-AGI v2) - ~3× lower cost, ~10× throughput multi-model orchestration instead of a single model - small models handle most of the search; strong models used when it matters verifier-free, enabling domains where verification is expensive Turns out, weak models can be strong aggregators
Monishwaran Maheswaran@sudomonish

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English
0
3
3
718
Monishwaran Maheswaran retweetledi
Monishwaran Maheswaran retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Squeeze Evolve: A Unified Framework for Verifier-Free Evolution Across AIME 2025, GPQA-Diamond, ARC-AGI-V2, MMMU-Pro, etc: - Up to ~3x API cost reduction - Up to ~10x increase in fixed-budget serving throughput
Aran Komatsuzaki tweet media
English
2
8
64
8.7K
Monishwaran Maheswaran retweetledi
Chenfeng_X
Chenfeng_X@Chenfeng_X·
Test-time scaling with evolutionary search can cost hundreds of times more tokens than a single-shot generation. 😨 We built a unified framework for customizable evolutionary inference, with plug-and-play operators for selection, fitness estimation, and recombination. Check here: squeeze-evolve.github.io 😎 More importantly, it cuts cost by orchestrating heterogeneous models, delivering up to 10× higher throughput while saving serious $$$ 💰. A user-friendly API will come soon! 💪
Monishwaran Maheswaran@sudomonish

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English
1
6
56
6.4K
Monishwaran Maheswaran retweetledi
Harman Singh @ ICLR 🇧🇷
Harman Singh @ ICLR 🇧🇷@Harman26Singh·
Super excited about Squeeze Evolve by @sudomonish Squeeze Evolve pushes the Pareto frontier of efficiency vs. intelligence in verifier-free test-time scaling using multi-model orchestration. Verifiers are expensive, slow, or not available for the hardest and most important problems of our time, making Squeeze Evolve and verifier-free test-time scaling a dominant paradigm going forward.
Monishwaran Maheswaran@sudomonish

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English
0
3
9
1.7K
Monishwaran Maheswaran retweetledi
Ben Athiwaratkun
Ben Athiwaratkun@ben_athi·
We're sharing an important progress in evolution via an efficient framework Squeeze Evolve > not only improving capabilities, achieving SOTA @ 97.5% on ARC-AGI v2 > but also resulting in cost effectiveness (~3x lower than other methods) > the key to this is using a multi-model framework > multiple models bring diversity to sample solutions > as well as allowing the use of smaller models on easier evolutionary steps > verifier-free, which is crucial for tasks where verification is time-consuming > a unified framework that combines all previous methods (Alpha Evolve, majority voting, self-refinement, RSA, and Mixture of Agents)
Monishwaran Maheswaran@sudomonish

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English
1
2
13
1.1K
Monishwaran Maheswaran
Monishwaran Maheswaran@sudomonish·
97.5% on ARC-AGI-V2 at $7.74/task — new SoTA. 95.4% on AIME 2025 — exceeding GPT-5 mini, at 1.8x savings. 79.1% on MMMU-Pro at 2.3x savings. Exceeds AlphaEvolve on circle packing — no code execution, no verifier, and no closed-source models.
English
1
0
5
741
Monishwaran Maheswaran
Monishwaran Maheswaran@sudomonish·
Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io
Monishwaran Maheswaran tweet media
English
3
15
97
110.3K