Giovanni Monea

63 posts

Giovanni Monea

@giomonea

Intern @IBMResearch | 🤖 ML PhD student @cornell @cornell_tech | Previously @MSFTResearch @Apple, @amazon | 🎓 @EPFL_en, @polimi

Katılım Temmuz 2023

291 Takip Edilen245 Takipçiler

Giovanni Monea@giomonea·10 Nis

Long overdue thread, but better late than never. Grateful to my amazing co-authors for making this happen ( @yair_feldman @shankarpad8 @xkianteb @yoavartzi ) and to the great @nthngdy for feedback and support! Check our paper on arxiv for more details: arxiv.org/abs/2510.13797

English

225

Giovanni Monea@giomonea·10 Nis

Where it struggles: solving linear equations. Error analysis reveals this isn't a retrieval failure. Compression disrupts arithmetic circuits 🧮, leading to computational errors.

English

235

Giovanni Monea@giomonea·10 Nis

LLMs waste massive memory remembering every reasoning step. What if they could leave behind just "breadcrumbs" instead? Breadcrumbs Reasoning: KV cache compression during decoding with learned beacon tokens. 2–32x less memory, minimal accuracy drop. 🧵

English

7.8K

Giovanni Monea@giomonea·9 Nis

@DimitrisPapail Awesome work! We tackle the same KV cache explosion in Breadcrumbs Reasoning via pure latent compression. Learned "beacons" compress past context windows into single KV entries (no text summaries), trained via online RL distillation: arxiv.org/abs/2510.13797

English

906

Dimitris Papailiopoulos@DimitrisPapail·8 Nis

x.com/i/article/2041…

ZXX

144

472.3K

Giovanni Monea retweetledi

Shankar Padmanabhan@shankarpad8·23 Mar

1/5 How do we update a model trained in 2025 with new world knowledge from 2026? ⚠️Continued training will undo skills learned by LLMs during post-training, e.g. instruction-following/math/code. 🤝Our method DiSC updates LLMs with new knowledge while preserving existing skills!

English

11.1K

Giovanni Monea retweetledi

Nathan Godey@nthngdy·12 Mar

🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇

English

106

958

122.5K

Giovanni Monea@giomonea·10 Mar

@p_nawrot Hi Piotr, great work! Is the code available already? If not, do you have an expected release date?

English

Piotr Nawrot@p_nawrot·1 Ara

Paper link - arxiv.org/abs/2506.05345 Code and Models will be released very soon!

English

425

Piotr Nawrot@p_nawrot·1 Ara

We'll present "Inference-Time Hyper-Scaling with KV Cache Compression", both at NeurIPS and EurIPS. We believe that future advances in AI will require model efficiency, and this work is another step in this direction. Save the date! -San Diego, Thur 11:00 -Copenhagen, Thur 10:30

English

1.6K

Giovanni Monea retweetledi

Yoav Artzi@yoavartzi·16 Şub

This call is still open. I am looking to recruit, as well as many other faculty @Cornell. We review folders as they come, and will send offers until all positions are filled. Please share with your network 🙏

Yoav Artzi@yoavartzi

.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971

English

16.9K

Giovanni Monea retweetledi

Zizhao Chen@ch272h·5 Ara

🧩Natural language isn’t all you need. We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning? Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder

English

16K

Giovanni Monea retweetledi

Yair Feldman@yair_feldman·26 Kas

🧵 New paper: "Simple Context Compression" - we show that mean-pooling beats the widely-used compression-tokens method for compressing contexts in LLMs, while being simpler and more efficient! with @yoavartzi (1/7)

English

25.9K

Giovanni Monea retweetledi

Yoav Artzi@yoavartzi·28 Eki

English

124

60K

Giovanni Monea@giomonea·7 Eki

@yule_gan Got it, thanks for the clarification!

English

108

Yulu Gan@yule_gan·7 Eki

Yes, and I also tried a bunch of hyperparameters and selected the best configuration for PPO and GRPO… For ES, we used the same hyperparameters across all experiments. For a fair comparison, we used only 200 training samples for PPO, GRPO, and ES. Even when RL is trained on the full training dataset (~30k samples if I remember correctly), ES achieves comparable performance.

English

859

Yulu Gan@yule_gan·6 Eki

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…

English

383

2.6K

414.7K

Giovanni Monea retweetledi

Tanya Goyal@tanyaagoyal·2 Eki

🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free! 💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!

English

5.5K

Giovanni Monea retweetledi

Yoav Artzi@yoavartzi·25 Tem

The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter) Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! 🙌📈🚀 youtube.com/watch?v=qW8S30…

YouTube

English

6.6K

Keşfet

@yair_feldman @shankarpad8 @xkianteb @yoavartzi @nthngdy @DimitrisPapail @p_nawrot @Cornell