Leonardo Ranaldi

124 posts

Leonardo Ranaldi

Leonardo Ranaldi

@l__ranaldi

~ NLP Researcher ~ @EdinburghNLP

Katılım Mart 2022
149 Takip Edilen116 Takipçiler
fly51fly
fly51fly@fly51fly·
[CL] Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies A Mittal [Microsoft] (2026) arxiv.org/abs/2604.09189
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
1
2
14
1.6K
Leonardo Ranaldi
Leonardo Ranaldi@l__ranaldi·
LLMs prioritise validation over facts, creating unsafe "sycophancy". Our X-Agent, uses reasoning to audit and correct this behaviour. It stops the model from blindly agreeing, ensuring interactions are safe, consistent, and factually grounded. #NLProc aclanthology.org/2025.emnlp-mai…
English
0
0
3
77
Yuxiao Qu
Yuxiao Qu@QuYuxiao·
🚨 NEW PAPER: "RLAD: Training LLMs to Discover Abstractions for Reasoning"! We introduce reasoning abstractions: concise insights that help LLMs solve hard reasoning problems by guiding structured exploration. 📄 arxiv.org/abs/2510.02263 🌐 cohenqu.github.io/rlad.github.io/ 🧵[1/N]
Yuxiao Qu tweet media
English
8
28
156
67.1K
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Language Models that Think, Chat Better "This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces RL with Model-rewarded Thinking (RLMT) for general-purpose chat capabilities." "RLMT consistently outperforms standard RLHF pipelines. This includes substantial gains of 3–7 points on three chat benchmarks (AlpacaEval2, WildBench, and ArenaHardV2), along with 1–3 point improvements on other tasks like creative writing and general knowledge. Our best 8B model surpasses GPT-4o in chat and creative writing"
Tanishq Mathew Abraham, Ph.D. tweet media
English
8
34
242
17.7K
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
Wow, a new post-training method. SFT = efficient but capped 🚦 RL = powerful but slow 🐢 Now enter: Guess-Think-Answer (GTA) GTA fuses guess (SFT), think (reflection), and answer (RL-shaped). Result: ⚡ Faster convergence than RL 📈 Higher ceiling than SFT 🛠️ Gradient conflicts solved via masking & constraints On 4 benchmarks → GTA beats both SFT & RL.
机器之心 JIQIZHIXIN tweet media
English
7
65
326
20.5K
fly51fly
fly51fly@fly51fly·
[LG] Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs Q Wang, P Zhao, S Huang, F Yang... [Microsoft] (2025) arxiv.org/abs/2509.00084
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
2
1
15
812
Wenhao Yu
Wenhao Yu@wyu_nd·
New paper: VLMs can self-reward during RL training — no visual annotations needed! -- Decompose VLM reasoning into visual vs. language parts -- Prompt the same VLM without visual input for visual reward We call it 𝐕𝐢𝐬𝐢𝐨𝐧-𝐒(𝐞𝐥𝐟)𝐑𝟏: arxiv.org/abs/2508.19652
Wenhao Yu tweet media
English
7
90
441
49.4K
Yuyin Zhou
Yuyin Zhou@yuyinzhou_cs·
🚨 Google’s MedGemma & OpenAI’s GPT-4o are impressive, but their openness is limited—either fully closed-source or releasing only weights without data/training code. 🔥 Meet MedVLThinker — a fully open multimodal medical reasoning recipe that matches their performance. Simple. Transparent. Reproducible. 🔗 Project: ucsc-vlaa.github.io/MedVLThinker/ 📄 Paper: arxiv.org/pdf/2508.02669
Yuyin Zhou tweet media
English
8
26
129
13.7K
fly51fly
fly51fly@fly51fly·
[CL] Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression J Huang, B Lin, G Feng, J Chen... [Peking University & The Hong Kong University of Science and Technology] (2025) arxiv.org/abs/2508.05337
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
2
2
8
760
Leonardo Ranaldi
Leonardo Ranaldi@l__ranaldi·
Hey @jaseweston Take a look at our EMNLP work last year. It's not that far away! aclanthology.org/2024.emnlp-mai…
Jason Weston@jaseweston

🤖Introducing: CoT-Self-Instruct 🤖 📝: arxiv.org/abs/2507.23751 - Builds high-quality synthetic data via reasoning CoT + quality filtering - Gains on reasoning tasks: MATH500, AMC23, AIME24 & GPQA-💎 - Outperforms existing train data s1k & OpenMathReasoning - Gains on non-reasoning tasks as well: AlpacaEval & ArenaHard 🧵1/3

English
0
0
1
230
Leonardo Ranaldi retweetledi
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
I will be at #ACL2025 with my group presenting 3 Conference Papers. At the #L2M2 workshop, we will introduce the concept of #protoknowledge as a framework for jointly analyzing the #memorization and #generalization capabilities of LLMs. Link Non-archival: lnkd.in/deDqJAxM
Human-Centric ART @unitorvergata@HumanCentricArt

Privacy, Memorization, Multimodal reasoning, and the surge of protoknowledge (non-archival in L2M2 Workshop) ! This is our contribution to #ACL2025NLP to better understand #LLMs We want to know your POV! See you in Vienna! We are hiring.

English
1
1
5
248
Leonardo Ranaldi retweetledi
ACL 2026
ACL 2026@aclmeeting·
📢The ACL 2025 Proceedings are LIVE🎆on the ACL Anthology! 🎉 We're thrilled to pre-celebrate the incredible research that will be presented starting Monday, July 28th, in Vienna! 🇦🇹 Start exploring now▶️aclanthology.org/events/acl-202… #NLProc #ACL2025NLP #ACLAnthology 📚
English
2
23
89
7.4K