Antaripa Saha

5.3K posts

Antaripa Saha banner
Antaripa Saha

Antaripa Saha

@doesdatmaksense

consulting companies in applied ai | doing maths in my free time

Katılım Kasım 2018
442 Takip Edilen15.2K Takipçiler
Antaripa Saha
Antaripa Saha@doesdatmaksense·
interestingly many folks are claiming that memory hype is now over. especially the way memory has been so far packaged as retrieval in most of the products. just as a retrieval layer i believe companies are not willing to buy another product, then need something else. your thoughts?
English
0
0
1
127
Jeff Huber
Jeff Huber@jeffreyhuber·
2026 is the year of memory, the year of context
English
4
1
26
2.5K
jason
jason@jxnlco·
Rag is going to make a huge comeback in about 8 months. Read up on search latency now
English
87
26
1.1K
142.9K
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
What's the best tool to share agent trajectories for our upcoming benchmark release? Want to keep them as easy to navigate (visually) as possible, and huggingface doesn't cut it
English
7
0
12
2.1K
Antaripa Saha
Antaripa Saha@doesdatmaksense·
@helloiamleonie is this figma? i am very comfortable with excalidraw, but i suck at figma currently
English
0
0
0
120
Leonie
Leonie@helloiamleonie·
"what model do you use for your visuals?" the model:
English
5
3
40
2.9K
Antaripa Saha
Antaripa Saha@doesdatmaksense·
@neural_avb i was looking forward to rlm being applied to multimodal
English
1
0
3
330
samyak
samyak@smykx·
some good accounts to follow around RLMs and GEPA: @lateinteraction @dosco @LakshyAAAgrawal will update this list as i discover more accounts :)
Quarq@quarqlabs

Recursive Language Models (RLMs) have been floating around for a couple of months, but in the last two weeks the discussion has picked up fast, especially alongside ideas like GEPA. The issue they’re trying to address isn’t new. When you build agents, the context window becomes a bottleneck pretty quickly. Packing more into the prompt leads to context rot and we know it well. RLMs take a different angle. Instead of treating the input as a fixed blob of text, the model treats it more like an environment it can explore. You give the root model something like a REPL, and now it doesn’t have to read everything upfront. It can decide what’s worth inspecting and make recursive sub-calls when needed. So instead of one big forward pass, you get structured computation. The paper shows RLMs handling up to two orders of magnitude beyond context window But on simple retrieval i.e "needle in a haystack", there’s basically no difference compared to standard models. Difference appears once context gets large (around 16K tokens and beyond), which is expected. Things change with tasks like OOLONG, where the model has to aggregate information across many entries (linear complexity). Vanilla models degrade steadily as the input grows, while RLMs hold up much better. On OOLONG at 132K tokens, base GPT-5 scores 44% while the RLM scores 56.5% . A ~28% improvement. Another breakpoint shows up with OOLONG-Pairs, which requires pairwise comparisons (quadratic complexity). Standard models are essentially at zero. RLMs get to ~58% F1. This isn't surprising as these task can't be done with single forward pass as attention isn’t designed for that. On deeper research-style tasks (like browsing large document sets), RLMs also show strong gains, both in accuracy and token efficiency. One of the more interesting side effects is what people are calling “small model inversion.” With the right recursive setup, smaller models can outperform larger ones on long-context reasoning. There are cases where a GPT-5-mini-based RLM beats GPT-5 on harder splits, and where smaller fine-tuned models outperform much larger ones on million-token tasks. That suggests the bottleneck isn’t just model size. The main thing to keep in mind is that RLMs aren’t universally better. On short, simple tasks, they don’t really add value. But as context length and reasoning complexity increase, the advantage becomes hard to ignore. The OOLONG-Pairs result ~58% vs <0.1% is probably the clearest signal. Once a task requires structured computation rather than just pattern matching, giving the model the ability to act over the context changes what it can do.

English
5
23
349
40.1K
himanshu
himanshu@himanshustwts·
The Never Ending Lore of Harness w/@Vtrivedy10⚡️ Viv brought some amazing perspective around harness design, evals, file systems, RL Envs and so much. 0:00:00 - INTRO 0:03:30 - PhD at Temple University 0:12:42 - Lockheed Martin and @MeekMill lore 0:16:18 - Building at @LangChain, DeepAgents, R&D in Open Source 0:25:09 - Secret behind Harness Design, Filesystem as most Foundational Harness Primitive 0:42:25 - Trajectories for Continual Learning, Where and Where "NOT to" RL, Skills and Context Rot 0:56:30 - Harness Engineering: OpenClaw and Hermes, Why you might don't need a Claw, Custom Harnesses, Quick Fire (@skeptrune lore) 01:12:15 - Meta-Harness, Self-Improvement Loop, Simulation-as-a-Service, RL Envs (Computer-Use, Long-horizon) 01:28:30 - Advice to anyone 20yo / starting out college
English
9
17
202
31.3K
Pratim🥑
Pratim🥑@BhosalePratim·
My team at @GradiumAI is growing. We are actively hiring to grow our Developer Relations, Communications, DevEx, Content, Community, and overall team. This IS the time to join us. Other than all basic perks of good comp, visa support, yada-yada, for me, the real perk has been unlimited access to the most cracked and OG set of researchers and developers. My DMs are open.
English
33
18
350
25.4K
Antaripa Saha retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!
Antoine Chaffin tweet media
English
13
53
222
39.4K
Antaripa Saha
Antaripa Saha@doesdatmaksense·
@Vtrivedy10 have you checked out slate coding agent by akira? cool stuff inspired by rlm.
English
1
0
1
300
Viv
Viv@Vtrivedy10·
looking to update my priors on how the following are being used in practice today! 1. RLMs 2. Computer Use seeing some posts that teams have found nice implementations for RLMs and new models + tooling (?) are great any teams using these day to day? if so for what?! has there been a step change from a couple months ago on the best-in class implementation?
English
14
2
88
7.4K
Pratty
Pratty@pratty_agi·
Cracked team cracked product. Go upvote!
English
1
0
1
127
Arvid Kahl
Arvid Kahl@arvidkahl·
I wish Claude Code would automatically include the prompts + the full context in a repo as a git commit "note". The full conversation that led from zero to feature, and I want it in git blame, not just who did it, but what the conversation was. Is there something like this?
English
90
4
166
46.5K
Antaripa Saha
Antaripa Saha@doesdatmaksense·
always set the high bar, but i don’t feel it’s necessary to bring down people who might feel proud that they cracked a 30k internship. eventually they can climb the salary ladder too. secondly maybe guide people instead of saying just lock in and you will crack 1.5 lpm in internship. people do definitely earn this much or more in internships but that’s not just by locking in, those people mostly share their work on socials, have great networking and many times it’s also a factor of luck.
English
0
0
1
296
Antaripa Saha
Antaripa Saha@doesdatmaksense·
@russiaman @neural_avb there are many products in memory domain and letta and their paper memgpt was one of the first ones. there are other companies like cognee (most kg and ontology based i assume), memories.ai (video based memory doing brilliant work), zep, hyperspell and couple of more
English
0
0
2
79
namaissur
namaissur@russiaman·
@neural_avb For me, their post was useful because, while reading the comments, I learned about honcho and letta - another long term memory solutions :)
English
3
0
5
693
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Scaled-up multimodal late interaction truly is scary Once cracked, the real challenge is building benches that won't get instantly saturated Congrats to the @mixedbreadai team for the stellar results, very happy to have been #1 for a few days and to still be somewhat within reach with my dear ModernBERT-based model!
Mixedbread@mixedbreadai

For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.

English
3
4
44
5.9K