jonah

1.6K posts

jonah

@drexalt

sparse retrieval

gangnam Katılım Aralık 2014

1.5K Takip Edilen574 Takipçiler

jonah@drexalt·1d

@capemox @mattjustram @yjoonjang I have an old neobert lexmae/dupmae checkpoint that might help testing huggingface.co/drexalt/NeoBER…

English

gautham@capemox·1d

@drexalt @mattjustram @yjoonjang Dupmae has a technique to train both the CLS as well as the other regular tokens. I agree that the CLS train may not be useful at all, but the others might be?

English

gautham@capemox·2d

Wow ettin does NOT like being fine-tuned for splade. Distilling didn't work at all, it's either representation collapse or not sparse enough trying a few different loss functions now, I've tried MarginMSE so far

English

868

jonah@drexalt·1d

@mattjustram @capemox @yjoonjang agree CLS is not irrelevant, maybe some traumatic DupMAE/LexMAE experiences leaking through haha. but generally I am wondering the far opposite, whether maxsim pretrain is comparable/better for SPLADE than DupMAE, has not been tested I think

English

Jheng-Hong Yang@mattjustram·1d

probably still worth a try? even if the MLM head gets stripped, a better LM prior/objective could still better season the latent token reps ColBERT uses and generalizes better. and IIRC [CLS] / special-token reps still participate in MaxSim, so the CLS side isn’t totally irrelevant either. LateOn-style contrastive pretrain is probably more directly aligned for tasks, but I wouldn’t rule DupMAE out. (it feels like pre-train vs post-train stuffs)

English

jonah@drexalt·1d

@capemox @mattjustram @yjoonjang I am pretty sure it would not be, since ColBERT isn’t using the MLM head anymore and the CLS token-maxxing doesn’t seem super aligned. I think the LateOn contrastive pre train probably stronger

English

gautham@capemox·1d

@drexalt @mattjustram @yjoonjang btw @drexalt have you tried dupmae with colbert? curious if it's a good conditioning for multivector

English

jonah@drexalt·1d

@capemox @mattjustram this might serve as a jumping off point from @yjoonjang, I tried DupMAE on NeoBERT and had good results github.com/yjoonjang/Mode…

English

123

gautham@capemox·2d

@mattjustram Wow I totally missed this lmao. You're right, this could defo be the issue. I might try some dupmae style or contrastive pretraining to see if it helps

English

105

jonah@drexalt·13 May

@cloneofsimo kyochon is by far the best

English

178

Simo Ryu@cloneofsimo·13 May

Im back to Korea and its absolutely wild in comparison that it only costs about 10$ to deliver SOTA chicken (i recommend bhc), including tips / tax / fee. People will have epically good time here in upcoming ICML

English

113

7.5K

jonah@drexalt·12 May

@antoine_chaffin @ManuelFaysse the pi-serini results show off the gpt5.5 dominance really well x.com/mattjustram/st…

Jheng-Hong Yang@mattjustram

someone already wrote a love letter to pi, by @badlogicgames. so we wrote a love paper to pi :) with my teammates @xuzihuan4 and @lintool. a few days ago, i promised i’d share some fun plots once Pi-Serini joined the BrowseComp-Plus deep research agent party. now, it’s about time. here weeeee goooooo. bear with the sloppy images first. the serious one is at the end. the question was simple: how far can we push deep research with BM25 + pi? turns out: weirdly far.

English

145

Antoine Chaffin@antoine_chaffin·12 May

ok R2 jokes aside, I tried running it but the bench is totally saturated at this point with GPT-5 (beating current top-1/2 is just good variance from the judge) Since it's also a bit expensive to try, I figured out it's more interesting to try getting the best performance with a "small" OS model x.com/HdArgentre/sta…

English

345

Antoine Chaffin@antoine_chaffin·12 May

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English

226

39.9K

jonah retweetledi

Mixedbread@mixedbreadai·11 May

Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.

English

136

24.7K

jonah@drexalt·1 May

Yes, the same team behind the SoTA Sparse index SEISMIC is also behind what appears to be SoTA multi-vector index. what is in the water in Pisa

Sumit@_reachsumit

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing Presents a multivector retrieval system that uses token-aware clustering to allocate centroids based on token frequency & semantic variance. 📝arxiv.org/abs/2604.28142 👨🏽‍💻github.com/TusKANNy/tachi…

English

782

jonah@drexalt·1 May

@lateinteraction I think they mention clustering 64 core, retrieval single core. pretty nuts

English

Omar Khattab@lateinteraction·1 May

which is the lowest I’ve seen if this is on a single-core or even few-core CPU

English

Omar Khattab@lateinteraction·1 May

reports 10 milliseconds per query for late interaction, from hundreds of millions of embeddings

Sumit@_reachsumit

English

134

18K

jonah@drexalt·26 Nis

@menhguin Korean boomers are degen pumping quantum computing stocks and crypto lmao

English

137

Minh Nhat Nguyen@menhguin·26 Nis

@drexalt in singapore, there's lots of boomers investing in the STI which historically underperforms the SP500

English

491

Minh Nhat Nguyen@menhguin·26 Nis

imagine being a south korean boomer dollar cost averaging in local stocks then one day ur son tells u he just doubled his money w his first paycheck

English

229

18.6K

jonah@drexalt·23 Nis

@realsigridjin

QME

Sigrid Jin 🌈🙏@realsigridjin·23 Nis

how should I text my mom to share got featured in the nyt

English

136

7.5K

jonah@drexalt·21 Nis

Researchers in Asia have something incredible to wake up to tomorrow, glad I stayed up :D Amazing release. PhD students around the world should rejoice the open dataset, it is really really impressive. Great work goats 🫡

Antoine Chaffin@antoine_chaffin

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

English

2.4K

jonah@drexalt·16 Nis

@rikiyatakehi @lateinteraction @MIT_CSAIL @waseda_univ @NIST @mixedbreadai @tetsuyasakai Congrats :)

English

647

Rikiya Takehi@rikiyatakehi·16 Nis

Super excited to share that I will join @MIT_CSAIL for PhD from this Fall! I will be working with @lateinteraction on a bunch of exciting things! It has been a wonderful journey at @waseda_univ, @NIST, and @mixedbreadai ! Special thanks to @tetsuyasakai, Dr. Ian Soboroff, Dr. Ellen Voorhees, @841io, @bclavie, @usait0, and everyone who has been encouraging and supportive!!

English

541

41K

jonah@drexalt·16 Nis

@vikhyatk this is the guy that didn’t want to buy Berkeley mono?

English

vik@vikhyatk·15 Nis

another car lost because i slept through my alarm :'(

vik@vikhyatk

@SIGKITTEN @xeophon @nyxkrage i can't i'm trying to buy a 1997 porsche 911 carrera cup 3.8 rsr in 4 hrs :(

English

14.7K

jonah@drexalt·8 Nis

They even released the base bidirectional models 😍 Great release, thanks for all the checkpoints ♥️

Nicolas Boizard@N1colAIs

🚀 New model family release with an OMNIMODAL version ! After Eurobert, I'm excited to introduce BidirLM, a family of 5 frontier bidirectional encoders including an OMNIMODAL encoder at just 2.5B parameters. 🧵👇 huggingface.co/BidirLM

English

458

jonah@drexalt·7 Nis

@derangineer @antoine_chaffin @AmelieTabatta @lateinteraction this is like an american saying st louis

English

debashish 🦘@derangineer·7 Nis

the next late interaction meetup needs to be in Marseille @antoine_chaffin @AmelieTabatta @lateinteraction

English

590

jonah@drexalt·7 Nis

@_reachsumit i think you will like this one @antoine_chaffin, something new to try with nv-embed style distillation

English

176

Sumit@_reachsumit·7 Nis

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval Proposes a Stratified Sampling strategy that uniformly covers the entire teacher score spectrum, outperforming top-K and random sampling. 📝 arxiv.org/abs/2604.04734

English

1.5K

jonah@drexalt·3 Nis

@din0s_ unbelievable find in alphaxiv, I checked @_reachsumit first, will retweet his when he adds it lol

English

dinos@din0s_·3 Nis

@drexalt did you just manifest this bruh

English

jonah@drexalt·3 Nis

@N1colAIs alphaxiv.org/abs/2604.02045

QME

Keşfet

@capemox @mattjustram @yjoonjang @cloneofsimo @antoine_chaffin @ManuelFaysse @lateinteraction @menhguin