Antaripa Saha

5.3K posts

Antaripa Saha

@doesdatmaksense

consulting companies in applied ai | doing maths in my free time

Katılım Kasım 2018

442 Takip Edilen15.2K Takipçiler

Sabitlenmiş Tweet

Antaripa Saha@doesdatmaksense·6 Eki

tried going as much in detail as possible for the Colpali blog, it will be a long read antaripasaha.notion.site/ColPali-Effici…

Antaripa Saha@doesdatmaksense

this diagram took my good 1.5 hours😭

English

680

211.6K

Antaripa Saha@doesdatmaksense·4d

interestingly many folks are claiming that memory hype is now over. especially the way memory has been so far packaged as retrieval in most of the products. just as a retrieval layer i believe companies are not willing to buy another product, then need something else. your thoughts?

English

127

Jeff Huber@jeffreyhuber·4d

2026 is the year of memory, the year of context

English

2.5K

Antaripa Saha@doesdatmaksense·4d

@jxnlco there’s two companies in search and retrieval whose work i look forward to: @mixedbreadai and @LightOnIO crazy stuff going on!

English

223

jason@jxnlco·4d

Rag is going to make a huge comeback in about 8 months. Read up on search latency now

English

1.1K

142.9K

Antaripa Saha@doesdatmaksense·5d

@ShashwatGoel7 you can use @specstoryai docs.specstory.com/quickstart

English

166

Shashwat Goel@ShashwatGoel7·5d

What's the best tool to share agent trajectories for our upcoming benchmark release? Want to keep them as easy to navigate (visually) as possible, and huggingface doesn't cut it

English

2.1K

Antaripa Saha@doesdatmaksense·2 May

@helloiamleonie is this figma? i am very comfortable with excalidraw, but i suck at figma currently

English

120

Leonie@helloiamleonie·1 May

"what model do you use for your visuals?" the model:

English

2.9K

Antaripa Saha@doesdatmaksense·29 Nis

@neural_avb i was looking forward to rlm being applied to multimodal

English

330

AVB@neural_avb·29 Nis

RLM bro and sis - check this new paper on applying RLMs in the video domain! Wild stuff

Mohamed@mohammad2012191

What if understanding a video was more like navigating a map?🤔 And what if that made compute scale logarithmically (not linearly) with video length?! New preprint🎉: 🗺️VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

English

226

19.1K

Antaripa Saha@doesdatmaksense·27 Nis

@sitaramshelke now i can't unsee it😂

English

2.2K

Sitaram@sitaramshelke·27 Nis

My middle class indian brain cannot comprehend seeing rusted tin sheets being used as backdrop for a VC event.

Bruno Koba@brunokoba_

Demis Hassabis at YC today: "We're only one or two technical breakthroughs away from AGI. But all the other parts are already in place."

English

1.5K

100.5K

Antaripa Saha@doesdatmaksense·25 Nis

@smykx @lateinteraction @dosco @LakshyAAAgrawal definitely adding @raw_works. love all his rlm experiments

English

206

samyak@smykx·25 Nis

some good accounts to follow around RLMs and GEPA: @lateinteraction @dosco @LakshyAAAgrawal will update this list as i discover more accounts :)

Quarq@quarqlabs

Recursive Language Models (RLMs) have been floating around for a couple of months, but in the last two weeks the discussion has picked up fast, especially alongside ideas like GEPA. The issue they’re trying to address isn’t new. When you build agents, the context window becomes a bottleneck pretty quickly. Packing more into the prompt leads to context rot and we know it well. RLMs take a different angle. Instead of treating the input as a fixed blob of text, the model treats it more like an environment it can explore. You give the root model something like a REPL, and now it doesn’t have to read everything upfront. It can decide what’s worth inspecting and make recursive sub-calls when needed. So instead of one big forward pass, you get structured computation. The paper shows RLMs handling up to two orders of magnitude beyond context window But on simple retrieval i.e "needle in a haystack", there’s basically no difference compared to standard models. Difference appears once context gets large (around 16K tokens and beyond), which is expected. Things change with tasks like OOLONG, where the model has to aggregate information across many entries (linear complexity). Vanilla models degrade steadily as the input grows, while RLMs hold up much better. On OOLONG at 132K tokens, base GPT-5 scores 44% while the RLM scores 56.5% . A ~28% improvement. Another breakpoint shows up with OOLONG-Pairs, which requires pairwise comparisons (quadratic complexity). Standard models are essentially at zero. RLMs get to ~58% F1. This isn't surprising as these task can't be done with single forward pass as attention isn’t designed for that. On deeper research-style tasks (like browsing large document sets), RLMs also show strong gains, both in accuracy and token efficiency. One of the more interesting side effects is what people are calling “small model inversion.” With the right recursive setup, smaller models can outperform larger ones on long-context reasoning. There are cases where a GPT-5-mini-based RLM beats GPT-5 on harder splits, and where smaller fine-tuned models outperform much larger ones on million-token tasks. That suggests the bottleneck isn’t just model size. The main thing to keep in mind is that RLMs aren’t universally better. On short, simple tasks, they don’t really add value. But as context length and reasoning complexity increase, the advantage becomes hard to ignore. The OOLONG-Pairs result ~58% vs <0.1% is probably the clearest signal. Once a task requires structured computation rather than just pattern matching, giving the model the ability to act over the context changes what it can do.

English

349

40.1K

Antaripa Saha@doesdatmaksense·22 Nis

@himanshustwts @Vtrivedy10 @MeekMill fav people here

English

310

himanshu@himanshustwts·22 Nis

The Never Ending Lore of Harness w/@Vtrivedy10⚡️ Viv brought some amazing perspective around harness design, evals, file systems, RL Envs and so much. 0:00:00 - INTRO 0:03:30 - PhD at Temple University 0:12:42 - Lockheed Martin and @MeekMill lore 0:16:18 - Building at @LangChain, DeepAgents, R&D in Open Source 0:25:09 - Secret behind Harness Design, Filesystem as most Foundational Harness Primitive 0:42:25 - Trajectories for Continual Learning, Where and Where "NOT to" RL, Skills and Context Rot 0:56:30 - Harness Engineering: OpenClaw and Hermes, Why you might don't need a Claw, Custom Harnesses, Quick Fire (@skeptrune lore) 01:12:15 - Meta-Harness, Self-Improvement Loop, Simulation-as-a-Service, RL Envs (Computer-Use, Long-horizon) 01:28:30 - Advice to anyone 20yo / starting out college

English

202

31.3K

Antaripa Saha@doesdatmaksense·22 Nis

@BhosalePratim @HaimantikaM @GradiumAI all my mutuals, do apply fassst

English

508

Pratim🥑@BhosalePratim·22 Nis

My team at @GradiumAI is growing. We are actively hiring to grow our Developer Relations, Communications, DevEx, Content, Community, and overall team. This IS the time to join us. Other than all basic perks of good comp, visa support, yada-yada, for me, the real perk has been unlimited access to the most cracked and OG set of researchers and developers. My DMs are open.

English

350

25.4K

Antaripa Saha retweetledi

Antoine Chaffin@antoine_chaffin·21 Nis

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

English

222

39.4K

Antaripa Saha@doesdatmaksense·20 Nis

@Vtrivedy10 have you checked out slate coding agent by akira? cool stuff inspired by rlm.

English

300

Viv@Vtrivedy10·20 Nis

looking to update my priors on how the following are being used in practice today! 1. RLMs 2. Computer Use seeing some posts that teams have found nice implementations for RLMs and new models + tooling (?) are great any teams using these day to day? if so for what?! has there been a step change from a couple months ago on the best-in class implementation?

English

7.4K

Antaripa Saha@doesdatmaksense·17 Nis

@pratty_agi thank you man ;)

English

Pratty@pratty_agi·17 Nis

Cracked team cracked product. Go upvote!

English

127

Antaripa Saha@doesdatmaksense·26 Mar

@arvidkahl you should check @specstoryai (also supports claude code, codex, and other agents). it automatically saves all sessions transcripts in your local. docs.specstory.com/integrations/c…

English

120

Arvid Kahl@arvidkahl·25 Mar

I wish Claude Code would automatically include the prompts + the full context in a repo as a git commit "note". The full conversation that led from zero to feature, and I want it in git blame, not just who did it, but what the conversation was. Is there something like this?

English

166

46.5K

Antaripa Saha@doesdatmaksense·25 Mar

always set the high bar, but i don’t feel it’s necessary to bring down people who might feel proud that they cracked a 30k internship. eventually they can climb the salary ladder too. secondly maybe guide people instead of saying just lock in and you will crack 1.5 lpm in internship. people do definitely earn this much or more in internships but that’s not just by locking in, those people mostly share their work on socials, have great networking and many times it’s also a factor of luck.

English

296

Antaripa Saha@doesdatmaksense·25 Mar

@russiaman @neural_avb there are many products in memory domain and letta and their paper memgpt was one of the first ones. there are other companies like cognee (most kg and ontology based i assume), memories.ai (video based memory doing brilliant work), zep, hyperspell and couple of more

English

namaissur@russiaman·25 Mar

@neural_avb For me, their post was useful because, while reading the comments, I learned about honcho and letta - another long term memory solutions :)

English

693

AVB@neural_avb·25 Mar

So apparently the whole Supermemory accuracy thing was a completely false, deliberate marketing stunt. They are calling it a prank now and laughing about it now. Kids can be so dumb man. I am muting this account forever.

Dhravya Shah@DhravyaShah

Behind the scenes for our memory benchmark (we knew)

English

133

13.1K

Antaripa Saha@doesdatmaksense·24 Mar

@antoine_chaffin @mixedbreadai only release posts that i get excited about in this domain are by lighton and mixedbread team

English

790

Antoine Chaffin@antoine_chaffin·24 Mar

Scaled-up multimodal late interaction truly is scary Once cracked, the real challenge is building benches that won't get instantly saturated Congrats to the @mixedbreadai team for the stellar results, very happy to have been #1 for a few days and to still be somewhat within reach with my dear ModernBERT-based model!

Mixedbread@mixedbreadai

For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.

English

5.9K

Antaripa Saha@doesdatmaksense·24 Mar

@tushartrip @manthanguptaa exactly.. it feels sam is convinced about the approach and proud about 99% benchmarking score

English

220

Tushar@tushartrip·24 Mar

@manthanguptaa @doesdatmaksense not sure from the replies which links to their post as well

English

242

Antaripa Saha@doesdatmaksense·24 Mar

not sure what type of marketing was this but all the people who went crazy after the blogpost, quote tweeted and stuff and behaved like this is next level breakthrough. now you know they don’t understand shit about memory and are sloppy ai writers who just quote tweets on anything and everything

Dhravya Shah@DhravyaShah

Behind the scenes for our memory benchmark (we knew)

English

195

24.8K

Antaripa Saha@doesdatmaksense·24 Mar

@kingofknowwhere that means you need more work to do😂

English

606