Sumit

9.6K posts

Sumit banner
Sumit

Sumit

@_reachsumit

Senior ML Engineer @Meta | prev: @TikTok_us, @Amazon, @Samsung | UChicago Alum https://t.co/hcCJ2n979W 🇮🇳→🇰🇷→🇦🇺→🇨🇦→🇺🇲

Seattle, WA Katılım Nisan 2010
500 Takip Edilen3.8K Takipçiler
Sabitlenmiş Tweet
Sumit
Sumit@_reachsumit·
In the final post of the Adaptive RAG series, we explore how to treat selective retrieval as a core, learned skill, moving from passive observation to active, intelligent decision-making. blog.reachsumit.com/posts/2025/10/…
English
1
1
12
5K
Sumit
Sumit@_reachsumit·
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World Ant Group introduces a family of 8 multilingual embedding models (80M–14B) trained on 60M samples across 200+ languages. 📝 arxiv.org/abs/2603.19223 👨🏽‍💻 github.com/codefuse-ai/Co…
English
0
1
3
115
Sumit
Sumit@_reachsumit·
Negative Sampling Techniques in Information Retrieval: A Survey Presents a comprehensive survey and taxonomy of negative sampling techniques for dense retrieval, covering random, static/dynamic hard negative mining, false negative mitigation, and more. 📝arxiv.org/abs/2603.18005
English
0
2
5
123
Sumit
Sumit@_reachsumit·
Retrieval-Augmented LLM Agents: Learning to Learn from Experience Naver Labs presents a framework that combines experience retrieval with LoRA fine-tuning to improve LLM agent generalization to unseen tasks. 📝 arxiv.org/abs/2603.18272
English
0
0
3
79
Sumit
Sumit@_reachsumit·
Total Recall QA: A Verifiable Evaluation Suite for Deep Research Agents Introduces a benchmark for evaluating deep research agents that requires retrieving all relevant documents to answer aggregation-style questions. 📝 arxiv.org/abs/2603.18516 👨🏽‍💻 github.com/mahta-r/total-…
English
0
0
2
88
Sumit
Sumit@_reachsumit·
Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval Proposes a training-free RAG framework that generates 3 retrieval queries (support, distinction, key features) conditioned on a working answer hypothesis. 📝 arxiv.org/abs/2603.19008 👨🏽‍💻 anonymous.4open.science/r/HCQR-1C2E
English
0
0
4
74
Sumit
Sumit@_reachsumit·
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation Introduces a VLM embedder-based framework for sequential recommendation that resolves modality collapse. 📝 arxiv.org/abs/2603.17450
English
0
4
12
590
Sumit
Sumit@_reachsumit·
OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation Amazon presents a data pruning framework for dense retriever finetuning that adaptively modulates sampling probabilities, improving both ranking and retrieval. 📝 arxiv.org/abs/2603.17205
English
0
2
14
654
Sumit
Sumit@_reachsumit·
CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval Introduces a generative retrieval model that replaces static contrastive representation alignment with dynamic per-query reasoning trajectories. 📝 arxiv.org/abs/2603.17387
English
0
1
17
791
Sumit
Sumit@_reachsumit·
A Unified Language Model for Large Scale Search, Recommendation, and Reasoning @denadai2 et al. at Spotify introduce a tool-free framework that adapts a single LLM to jointly support search, recommendation, and reasoning over a 10M+ item catalog. 📝 arxiv.org/abs/2603.17533
English
0
6
30
1.3K
Sumit
Sumit@_reachsumit·
Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify Spotify presents a production-scale generative recommender that frames podcast discovery as an instruction-following task over Semantic IDs. 📝 arxiv.org/abs/2603.17540
English
0
1
16
557
Sumit
Sumit@_reachsumit·
Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context @KeivanAlizadeh2 et al. at Apple use uncertainty-aware self-reflection to guide how language models interact with long contexts. 📝arxiv.org/abs/2603.15653
English
2
8
71
20.1K
Sumit
Sumit@_reachsumit·
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Presents a research agent that improves long-horizon reasoning through agentic mid-training & verification-centric reasoning at local and global levels. 📝arxiv.org/abs/2603.15726 👨🏽‍💻github.com/MiroMindAI/Mir…
English
0
1
14
932
Sumit
Sumit@_reachsumit·
RecBundle: A Next-Generation Geometric Paradigm for Explainable Recommender Systems Introduces Fiber Bundle theory from differential geometry to decouple user collaboration topology from individual preference evolution in recommender systems. 📝 arxiv.org/abs/2603.16088
English
0
2
2
326
Sumit
Sumit@_reachsumit·
IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time Introduces a RAG approach that shifts cross-document reasoning from online inference to offline indexing by generating bridging facts through shared entities. 📝 arxiv.org/abs/2603.16415
English
0
1
27
1.3K
Sumit
Sumit@_reachsumit·
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Introduces a fully open-source search agent achieving frontier-level performance, using fact-grounded QA synthesis from web graphs. 📝 arxiv.org/abs/2603.15594 👨🏽‍💻 github.com/rui-ye/OpenSee…
English
2
4
33
1.7K
Sumit
Sumit@_reachsumit·
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution Enhances multi-hop RAG by decoupling strategic task planning (via RL) from iterative retrieval execution (via SFT), enabling LLMs to decompose complex queries. 📝 arxiv.org/abs/2603.13853
English
0
1
2
317
Sumit
Sumit@_reachsumit·
Learning Retrieval Models with Sparse Autoencoders @thibault_formal et al. at Naver Labs present SPLARE, a learned sparse retrieval method that replaces vocabulary-based projections with sparse autoencoders, producing language-agnostic latent features. 📝arxiv.org/abs/2603.13277
English
1
5
31
1.5K
Sumit
Sumit@_reachsumit·
MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval Presents a multiresolution VLM encoding method for visual document retrieval that fuses coarse-to-fine features via Matryoshka representation learning 📝arxiv.org/abs/2603.13349
English
0
0
11
552
Sumit
Sumit@_reachsumit·
AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval Apple presents a multimodal late interaction retrieval method deployed in Apache Solr, combining parallel token-level ANN candidate generation with Exact MaxSim re-ranking. 📝 arxiv.org/abs/2603.13537
English
2
6
36
6.4K
Sumit
Sumit@_reachsumit·
Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios Proposes a training-free model editing framework that injects cold-start item knowledge into generative recommendation models . 📝 arxiv.org/abs/2603.14259
English
0
0
5
467