Giovanni Monea

63 posts

Giovanni Monea

Giovanni Monea

@giomonea

Intern @IBMResearch | 🤖 ML PhD student @cornell @cornell_tech | Previously @MSFTResearch @Apple, @amazon | 🎓 @EPFL_en, @polimi

Katılım Temmuz 2023
291 Takip Edilen245 Takipçiler
Giovanni Monea
Giovanni Monea@giomonea·
Where it struggles: solving linear equations. Error analysis reveals this isn't a retrieval failure. Compression disrupts arithmetic circuits 🧮, leading to computational errors.
Giovanni Monea tweet media
English
1
0
0
235
Giovanni Monea
Giovanni Monea@giomonea·
LLMs waste massive memory remembering every reasoning step. What if they could leave behind just "breadcrumbs" instead? Breadcrumbs Reasoning: KV cache compression during decoding with learned beacon tokens. 2–32x less memory, minimal accuracy drop. 🧵
Giovanni Monea tweet media
English
2
17
74
7.8K
Giovanni Monea
Giovanni Monea@giomonea·
@DimitrisPapail Awesome work! We tackle the same KV cache explosion in Breadcrumbs Reasoning via pure latent compression. Learned "beacons" compress past context windows into single KV entries (no text summaries), trained via online RL distillation: arxiv.org/abs/2510.13797
English
1
0
14
906
Giovanni Monea retweetledi
Shankar Padmanabhan
Shankar Padmanabhan@shankarpad8·
1/5 How do we update a model trained in 2025 with new world knowledge from 2026? ⚠️Continued training will undo skills learned by LLMs during post-training, e.g. instruction-following/math/code. 🤝Our method DiSC updates LLMs with new knowledge while preserving existing skills!
English
1
17
62
11.1K
Giovanni Monea retweetledi
Nathan Godey
Nathan Godey@nthngdy·
🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇
Nathan Godey tweet media
English
27
106
958
122.5K
Giovanni Monea
Giovanni Monea@giomonea·
@p_nawrot Hi Piotr, great work! Is the code available already? If not, do you have an expected release date?
English
1
0
0
19
Piotr Nawrot
Piotr Nawrot@p_nawrot·
We'll present "Inference-Time Hyper-Scaling with KV Cache Compression", both at NeurIPS and EurIPS. We believe that future advances in AI will require model efficiency, and this work is another step in this direction. Save the date! -San Diego, Thur 11:00 -Copenhagen, Thur 10:30
Piotr Nawrot tweet media
English
1
5
15
1.6K
Giovanni Monea retweetledi
Yoav Artzi
Yoav Artzi@yoavartzi·
This call is still open. I am looking to recruit, as well as many other faculty @Cornell. We review folders as they come, and will send offers until all positions are filled. Please share with your network 🙏
Yoav Artzi@yoavartzi

.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971

English
0
23
76
16.9K
Giovanni Monea retweetledi
Zizhao Chen
Zizhao Chen@ch272h·
🧩Natural language isn’t all you need. We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning? Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
English
1
13
57
16K
Giovanni Monea retweetledi
Yair Feldman
Yair Feldman@yair_feldman·
🧵 New paper: "Simple Context Compression" - we show that mean-pooling beats the widely-used compression-tokens method for compressing contexts in LLMs, while being simpler and more efficient! with @yoavartzi (1/7)
Yair Feldman tweet media
English
3
13
43
25.9K
Giovanni Monea retweetledi
Yoav Artzi
Yoav Artzi@yoavartzi·
.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971
Yoav Artzi tweet media
English
2
40
124
60K
Yulu Gan
Yulu Gan@yule_gan·
Yes, and I also tried a bunch of hyperparameters and selected the best configuration for PPO and GRPO… For ES, we used the same hyperparameters across all experiments. For a fair comparison, we used only 200 training samples for PPO, GRPO, and ES. Even when RL is trained on the full training dataset (~30k samples if I remember correctly), ES achieves comparable performance.
English
1
0
3
859
Yulu Gan
Yulu Gan@yule_gan·
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…
English
90
383
2.6K
414.7K
Giovanni Monea retweetledi
Tanya Goyal
Tanya Goyal@tanyaagoyal·
🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free! 💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!
Tanya Goyal tweet media
English
1
22
39
5.5K
Giovanni Monea retweetledi
Yoav Artzi
Yoav Artzi@yoavartzi·
The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter) Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! 🙌📈🚀 youtube.com/watch?v=qW8S30…
YouTube video
YouTube
English
2
6
40
6.6K