Nir Mazor

51 posts

Nir Mazor

@NirMMazor

Katılım Mart 2025

65 Takip Edilen21 Takipçiler

Sabitlenmiş Tweet

Nir Mazor@NirMMazor·26 Oca

New preprint 💥 Can a general-purpose model achieve results comparable to medically pre-trained models? 🤔 We show that lightweight fine-tuning of a general-purpose LVLM and an LVLM-aware retriever can. 🚀 🔗 GitHub: github.com/Nirmaz/CLARE 📄 Paper: arxiv.org/pdf/2508.17394

English

736

Nir Mazor retweetledi

Shachar Don-Yehiya@Shachar_Don·5h

Do you run pairwise evaluation? Do you test your models on the Arena-Hard and AlpacaEval benchmarks? You probably want to read this 🧵👇 arxiv.org/abs/2603.16848 With @LChoshen @AbendOmri

English

479

Nir Mazor retweetledi

Asaf Yehudai@AsafYehudai·5 Mar

New preprint, evaluation framework & leaderboard!🚨 General-purpose AI agents are everywhere. 🤖 From ReAct to @claudeai Code and @OpenAI SDK. But how do we actually evaluate them — as general agents? Currently, benchmarks are deeply tied to domain-specific setups, making it impossible to evaluate true cross-domain agents. We’re changing that! We’re introducing Exgentic and the Open General Agent Leaderboard. 🧵👇

English

6.6K

Nir Mazor retweetledi

Oren Sultan@oren_sultan·1 Şub

Can LLMs reliably predict program termination? We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems. @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12 Paper: arxiv.org/pdf/2601.18987 Website: orensultan.com/llms_halting_p… 🧵👇 1/n

English

116

43.2K

Nir Mazor@NirMMazor·26 Oca

Under the guidance of @Hoper_Tom — thank you! @nlphuji

English

Nir Mazor@NirMMazor·26 Oca

Our model also achieves superior results over general-purpose RAG baseline models 🚀📈

English

Nir Mazor@NirMMazor·26 Oca

English

736

Nir Mazor retweetledi

Avishai Elmakies@AvishaiElm37946·18 Oca

🚀 Excited to share that my paper from my internship at @IBMResearch has been accepted to #ICASSP2026! We train Speech-Aware LLMs (SALLMs) with Group Relative Policy Optimization (GRPO) on open-ended tasks (Spoken QA & Speech Translation). We find that GRPO beats SFT!

English

667

Nir Mazor retweetledi

Noam Dahan@Dahan_Noam·29 Ara

1) PromptSuite (EMNLP 2025 demo) enables robust multi-prompt evaluation by automatically generating controlled prompt variations over existing datasets (e.g. Hugging Face). Try it with a Python API and web UI: eliyahabba.github.io/PromptSuite/ with @EliyaHabba and @GiliLior

English

238

Nir Mazor retweetledi

Noam Dahan@Dahan_Noam·29 Ara

FOMO for missing the great community of @iscol_meeting! Thankfully my collaborators @nlphuji (and advisor @GabiStanovsky🙏) presented our recent work: 1) on prompt sensitivity, and 2) on using digitized newspapers as data for low-resource languages Links in thread:

English

398

Nir Mazor retweetledi

Eliahu Horwitz@EliahuHorwitz·29 Ara

I had a blast presenting our position paper poster on the Model Atlas at #NeurIPS2025. With nearly 6,000 posters, it’s impossible to see everything. Luckily @J_Novikova_NLP featured the paper on her YouTube channel Paper: horwitz.ai/model-atlas Video: youtube.com/watch?v=DogbPR…

YouTube

English

3.7K

Nir Mazor retweetledi

Gili Lior@GiliLior·19 Ara

Honored to have been part of the ISCOL 2025 panel with such great professors! @yoavgo @melhadad It was an interesting discussion on AI’s role in academia and research, with diverse opinions and challenging perspectives. Thanks @iscol_meeting and @ella_rabinovich for having me!

English

743

Nir Mazor retweetledi

Shahaf Bassan@shahaf_bassan·16 Ara

✈️ 𝐂𝐨𝐩𝐞𝐧𝐡𝐚𝐠𝐞𝐧 🇩🇰 → 𝐒𝐚𝐧 𝐃𝐢𝐞𝐠𝐨 🇺🇸 Had a great time presenting 𝐭𝐰𝐨 𝐩𝐚𝐩𝐞𝐫𝐬 at #NeurIPS2025 and giving an invited talk at the 𝑇ℎ𝑒𝑜𝑟𝑦 𝑜𝑓 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑎𝑏𝑙𝑒 𝑀𝐿 workshop (Elis Unconference, #EurIPS2025). #ExplainableAI #Interpretability #XAI

English

423

Nir Mazor retweetledi

Jonathan Karin@JonathanKarin3·8 Ara

1/9 How do swarms (like fish schools) survive predator attacks? In our new paper at @PhysRevE, we try to break the "Detectability-Durability" trade-off with GNN-based generative modeling! with @zoe_piran and @mor_nitzan journals.aps.org/pre/abstract/1…

English

2.6K

Nir Mazor retweetledi

Kevin Lu@kevinlu4588·14 Kas

Excited to share our paper “When Are Concepts Erased from Diffusion Models?” at @NeurIPSConf! We introduce two conceptual models for erasure mechanisms in diffusion models, and a suite of probes to recover supposedly forgotten concepts. Project website: unerasing.baulab.info

English

6.3K

Nir Mazor retweetledi

Daria Lioubashevski@DariaLioub·10 Kas

🚨 New preprint! One idea, many ways to say it – does your brain track those options before you speak? Using LLMs, we put this to the test: biorxiv.org/content/10.110… We show for the 1st time that the brain represents many alternatives simultaneously in both listening & speaking 🧵

GIF

English

2.6K

Nir Mazor retweetledi

HUJI NLP@nlphuji·9 Kas

Our group closing out #EMNLP2025 in Suzhou. Until next time!

English

Nir Mazor retweetledi

Noy Sternlicht@NoySternlicht·6 Kas

New benchmarking task for LLM judges: Evaluating debate speeches! 🗣️ We find that judges align with human orderings but remain miscalibrated. Presenting tomorrow at @emnlpmeeting, 14:00-15:30, Hall C. Come by to chat, and even debate (in the spirit of the poster) 🤝 #EMNLP2025

English

1.3K

Keşfet

@LChoshen @AbendOmri @claudeai @OpenAI @AIatMeta @HebrewU @Bloomberg @imperialcollege