Negar Arabzadeh

462 posts

Negar Arabzadeh

Negar Arabzadeh

@NegarEmpr

Postdoc @UCBerkeley @BerkeleySky |👩🏻‍💻Prev @google, @MSFTResearch, @SpotifyResearch | 📚@UWaterloo | Interested in Information Retrieval

Berkeley, USA Katılım Nisan 2017
1.1K Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces. We show that surprisingly RAG can improve reasoning— with the right corpus. Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026. 🔗 arxiv.org/abs/2605.03344 🧵
Negar Arabzadeh tweet media
English
11
31
212
472.1K
Negar Arabzadeh retweetledi
Melissa Pan
Melissa Pan@melissapan·
Excited to share that MAP has been selected for ✨ICML Oral✨ We look forward to sharing the insights in the paper with the community And much much appreciations to everyone who participated in our study ❤️ MAP won’t be possible without your contribution to open science
Melissa Pan tweet media
Melissa Pan@melissapan

Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡

English
7
13
153
20K
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
There are two layers of contamination control: (1) the trace corpus was built before AIME 2025 and 2026 were released, and (2) we additionally ran full decontamination against all eval benchmarks using a 13-gram Jaccard similarity. On AIME 2025–2026 specifically, we see up to +56% relative gain on Gemini-2.5-Flash (53.3 → 83.3) and+7.6% on GPT-5.
English
0
0
1
138
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces. We show that surprisingly RAG can improve reasoning— with the right corpus. Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026. 🔗 arxiv.org/abs/2605.03344 🧵
Negar Arabzadeh tweet media
English
11
31
212
472.1K
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
5/ Interestingly, RAG over T³ can be cheaper than No RAG. Retrieved reasoning shifts work from expensive output tokens to cheap input tokens — the model thinks less and reads more. Think less. Retrieve thinking. 🧠
English
1
1
4
362
Negar Arabzadeh retweetledi
Diane
Diane@dianetc_·
We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.
Diane tweet media
English
9
55
268
120.5K
Negar Arabzadeh retweetledi
Parth Asawa
Parth Asawa@pgasawa·
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Parth Asawa tweet media
English
42
153
1.1K
824.8K
Andrew Drozdov
Andrew Drozdov@mrdrozdov·
If you're looking to follow one additional person today, I'd recommend following the prolific @NegarEmpr for their insightful research on information retrieval.
English
2
0
19
1.1K
Negar Arabzadeh retweetledi
Ion Stoica
Ion Stoica@istoica05·
Congratulation to the team for the MAP paper being accepted as an ICML spotlight! A key takeaway from this work is that reliability remains one of the central challenges for production agent systems. Simple yet effective methods continue to dominate in these agent systems for…
Melissa Pan@melissapan

Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡

English
0
8
73
8.7K
Negar Arabzadeh retweetledi
Melissa Pan
Melissa Pan@melissapan·
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡
Melissa Pan tweet mediaMelissa Pan tweet media
English
10
30
232
47.9K
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
2/ Peer review quality is wildly uneven, and most venues have no scalable way to assess it. PeeriScope scores reviews along 13 fixed dimensions, combining structural metrics, rubric-guided LLM evaluation, and a supervised model trained on expert annotations.
English
1
0
2
140