

I have written a blog post that is so long: ut21.github.io/utkarsh/blogpo…
utkarsh
3.3K posts

@utkarsh_2105
he/him | CS final year undergrad @ BITS Pilani | prev @MSFTResearch, @Inria


I have written a blog post that is so long: ut21.github.io/utkarsh/blogpo…






🔬 Hiring 2 undergrad research interns (6 months) at Microsoft Research India. The transformer has been the default encoder for dense retrieval. But under the low-latency constraints of real production systems, it becomes a serious bottleneck on retrieval performance, deep encoders are accurate but too slow, shallow ones are fast but lossy. So we're asking fundamental questions: → What assumptions are we baking in when we reach for a transformer to solve a task? → What alternative scalable encoder architectures can exploit the natural biases of retrieval better than the transformer does? What interns will actually work on over 6 months: → Critically analyzing where transformer-based dense encoders fall short under production retrieval pressure → Exploring alternative architectures that preserve deep-encoder accuracy at a fraction of the inference cost → Data + compute efficient training algorithms for large dense encoders Strong Python + PyTorch. Bonus if you've trained an encoder or built a retrieval pipeline end-to-end. For undergrads who treat "why is the architecture shaped this way?" as a real question. Apply: forms.office.com/r/G1TyJZCFGd DMs open. #InformationRetrieval #MLSystems #NLProc @MSFTResearch



We model tool calls as oracles from classical computation complexity theory as a useful formalism to think about the capacity of the paradigm





starting to look like one more than the other :)





Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵