Omar Khattab (@lateinteraction) - Twitter Profili

Omar Khattab@lateinteraction·32s

@willccbb until ~tomorrow

English

0

6

will brown@willccbb·4m

SOTA method for variance reduction:

English

1

0

6

163

Omar Khattab@lateinteraction·14m

Follow @SOURADIPCHAKR18 @NoahZiems for one and @dianetc_ for the other.

English

0

10

402

Omar Khattab@lateinteraction·22m

In my biased view, they are the kinds of things that help start mini-fields around new problems and new algorithmic paradigms. Stay tuned.

English

1

0

20

711

Omar Khattab@lateinteraction·26m

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

English

6

4

72

1.6K

Omar Khattab retweetledi

ACM Conference on AI and Agentic Systems@CAISconf·3d

🎤 Keynote announcement: @trq212 (Thariq Shihipar), Member of Technical Staff on Claude Code at @AnthropicAI, is keynoting #CAIS2026. Thariq's "Lessons from Building Claude Code" series on Skills, prompt caching, tool design, and "unhobbling" is required reading for anyone building agentic systems. We're thrilled to have him. 📍 San Jose · May 26–29 🔗 caisconf.org

ACM Conference on AI and Agentic Systems tweet media

English

2

18

101

14.2K

Omar Khattab@lateinteraction·3d

which is the lowest I’ve seen if this is on a single-core or even few-core CPU

English

1

0

7

1.6K

Omar Khattab@lateinteraction·3d

reports 10 milliseconds per query for late interaction, from hundreds of millions of embeddings

Sumit@_reachsumit

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing Presents a multivector retrieval system that uses token-aware clustering to allocate centroids based on token frequency & semantic variance. 📝arxiv.org/abs/2604.28142 👨🏽‍💻github.com/TusKANNy/tachi…

English

3

8

129

17.2K

Omar Khattab@lateinteraction·3d

@johnkimdw @NorthwesternU @ManlingLi_ wow congratulations John!!

English

1

0

2

395

John Kim@johnkimdw·5d

I’m thrilled to share that I’ll be starting my CS PhD at @NorthwesternU this fall, advised by @ManlingLi_! I’ll be researching areas in trustworthy AI and spatial intelligence to build reliable AI systems that are grounded in the physical world. I’m also happy to announce that I was awarded the @NSF GRFP fellowship, which will support my PhD for 3 years! This wouldn’t have been possible without my wonderful mentors @nunompmoniz, @Meng_CS, @frank_liu_01, @NoahZiems, and countless others who’ve guided me throughout my undergrad. And so… I guess I won’t be leaving the midwest :)

English

5

4

74

9.5K

Omar Khattab@lateinteraction·3d

@hxiao TTC for embeddings already has a name, it's late interaction :D

English

2

0

32

1.8K

Han Xiao@hxiao·3d

another thought after ICLR is Test-Time Compute (TTC) for embedding models. My thesis: given a trained embedding model, can we improve retrieval quality by spending more compute at inference: multiple encoding rounds, conditional branching, if-else gates - all purely from the embedding geometry, training-free, no prior and no LLM helping? Despite Noam Brown saying small models like GPT-2 wouldn't benefit from TTC, I still wanted to explore whether embedding models can "think longer." Here, instead of generating LLM tokens, we re-encode, compare, gate, and amplify query vectors based on what the first-pass retrieval tells us. So the agent designs embedding "programs" (each a DAG over the model's own vectors with branches and gates), runs them on 3 retrieval benchmarks × 2 models (jina-v5 nano & small). And here you go: some interesting TTC programs that only use the given embedding model.

English

3

4

36

4.6K

Omar Khattab retweetledi

Noah Ziems@NoahZiems·4d

Hey that's my labmate! New video out from @a1zhang on the Mismanaged Geniuses Hypothesis

MIT CSAIL@MIT_CSAIL

MIT PhD student Alex Zhang (@a1zhang) explains how AI models are "mismanaged geniuses" that could take on a much wider range of tasks. Full video: tinyurl.com/bddd5vdx

English

2

4

61

8K

Omar Khattab retweetledi

MIT CSAIL@MIT_CSAIL·4d

MIT PhD student Alex Zhang (@a1zhang) explains how AI models are "mismanaged geniuses" that could take on a much wider range of tasks. Full video: tinyurl.com/bddd5vdx

English

5

49

445

76.2K

Omar Khattab retweetledi

DSPy@DSPyOSS·4d

"yo dawg, i heard you like RLMs and GEPA, so i put GEPA in your RLM so you can RLM while you GEPA"

Sam Hogan 🇺🇸@samhogan

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

English

11

42

617

48.6K

Omar Khattab retweetledi

Sajjadur Rahman@subZero_saj·4d

🥁 We are thrilled to announce the keynote speakers for DASHSys Workshop @ VLDB 2026. 🧵

English

1

9

2.3K

Omar Khattab retweetledi

Sam Hogan 🇺🇸@samhogan·4d

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

English

58

122

1.3K

122.3K

Omar Khattab retweetledi

Pau@hugemensa·5d

v2 for xtr-warp-rs is out, adding sharding support to the indices The entire search pipeline has been rewritten around efficient transfers and new kernels that enable parallelization and scheduling optimizations, all while staying true to the WARP formula Details below 👇

English

2

15

5.7K

Omar Khattab retweetledi

AVB@neural_avb·5d

RLM bro and sis - check this new paper on applying RLMs in the video domain! Wild stuff

Mohamed@mohammad2012191

What if understanding a video was more like navigating a map?🤔 And what if that made compute scale logarithmically (not linearly) with video length?! New preprint🎉: 🗺️VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

English

2

30

226

18.7K

Omar Khattab retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·5d

Excited to share that my ICLR 2026 Oral Talk for GEPA is available on YouTube. I go deeper into why GEPA works better than prior optimization techniques, along with touching on many aspects of GEPA! youtu.be/HbGah-uP1fI