jonah

1.6K posts

jonah banner
jonah

jonah

@drexalt

sparse retrieval

gangnam Katılım Aralık 2014
1.5K Takip Edilen574 Takipçiler
gautham
gautham@capemox·
@drexalt @mattjustram @yjoonjang Dupmae has a technique to train both the CLS as well as the other regular tokens. I agree that the CLS train may not be useful at all, but the others might be?
English
1
0
2
72
gautham
gautham@capemox·
Wow ettin does NOT like being fine-tuned for splade. Distilling didn't work at all, it's either representation collapse or not sparse enough trying a few different loss functions now, I've tried MarginMSE so far
English
4
0
7
868
jonah
jonah@drexalt·
@mattjustram @capemox @yjoonjang agree CLS is not irrelevant, maybe some traumatic DupMAE/LexMAE experiences leaking through haha. but generally I am wondering the far opposite, whether maxsim pretrain is comparable/better for SPLADE than DupMAE, has not been tested I think
English
1
0
2
36
Jheng-Hong Yang
Jheng-Hong Yang@mattjustram·
probably still worth a try? even if the MLM head gets stripped, a better LM prior/objective could still better season the latent token reps ColBERT uses and generalizes better. and IIRC [CLS] / special-token reps still participate in MaxSim, so the CLS side isn’t totally irrelevant either. LateOn-style contrastive pretrain is probably more directly aligned for tasks, but I wouldn’t rule DupMAE out. (it feels like pre-train vs post-train stuffs)
English
1
0
1
44
jonah
jonah@drexalt·
@capemox @mattjustram @yjoonjang I am pretty sure it would not be, since ColBERT isn’t using the MLM head anymore and the CLS token-maxxing doesn’t seem super aligned. I think the LateOn contrastive pre train probably stronger
English
2
0
0
62
gautham
gautham@capemox·
@mattjustram Wow I totally missed this lmao. You're right, this could defo be the issue. I might try some dupmae style or contrastive pretraining to see if it helps
English
2
0
4
105
Simo Ryu
Simo Ryu@cloneofsimo·
Im back to Korea and its absolutely wild in comparison that it only costs about 10$ to deliver SOTA chicken (i recommend bhc), including tips / tax / fee. People will have epically good time here in upcoming ICML
Simo Ryu tweet media
English
10
2
113
7.5K
jonah
jonah@drexalt·
@antoine_chaffin @ManuelFaysse the pi-serini results show off the gpt5.5 dominance really well x.com/mattjustram/st…
Jheng-Hong Yang@mattjustram

someone already wrote a love letter to pi, by @badlogicgames. so we wrote a love paper to pi :) with my teammates @xuzihuan4 and @lintool. a few days ago, i promised i’d share some fun plots once Pi-Serini joined the BrowseComp-Plus deep research agent party. now, it’s about time. here weeeee goooooo. bear with the sloppy images first. the serious one is at the end. the question was simple: how far can we push deep research with BM25 + pi? turns out: weirdly far.

English
1
0
5
145
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
ok R2 jokes aside, I tried running it but the bench is totally saturated at this point with GPT-5 (beating current top-1/2 is just good variance from the judge) Since it's also a bit expensive to try, I figured out it's more interesting to try getting the best performance with a "small" OS model x.com/HdArgentre/sta…
English
1
0
8
345
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training
Antoine Chaffin tweet media
English
11
44
226
39.9K
jonah retweetledi
Mixedbread
Mixedbread@mixedbreadai·
Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.
Mixedbread tweet media
English
5
18
136
24.7K
jonah
jonah@drexalt·
@lateinteraction I think they mention clustering 64 core, retrieval single core. pretty nuts
English
0
0
1
67
Omar Khattab
Omar Khattab@lateinteraction·
which is the lowest I’ve seen if this is on a single-core or even few-core CPU
English
1
0
10
2K
jonah
jonah@drexalt·
@menhguin Korean boomers are degen pumping quantum computing stocks and crypto lmao
English
0
0
1
137
Minh Nhat Nguyen
Minh Nhat Nguyen@menhguin·
@drexalt in singapore, there's lots of boomers investing in the STI which historically underperforms the SP500
English
1
0
4
491
Minh Nhat Nguyen
Minh Nhat Nguyen@menhguin·
imagine being a south korean boomer dollar cost averaging in local stocks then one day ur son tells u he just doubled his money w his first paycheck
Minh Nhat Nguyen tweet media
English
2
3
229
18.6K
Sigrid Jin 🌈🙏
Sigrid Jin 🌈🙏@realsigridjin·
how should I text my mom to share got featured in the nyt
Sigrid Jin 🌈🙏 tweet media
English
26
2
136
7.5K
jonah
jonah@drexalt·
Researchers in Asia have something incredible to wake up to tomorrow, glad I stayed up :D Amazing release. PhD students around the world should rejoice the open dataset, it is really really impressive. Great work goats 🫡
Antoine Chaffin@antoine_chaffin

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

English
1
7
18
2.4K
jonah
jonah@drexalt·
@vikhyatk this is the guy that didn’t want to buy Berkeley mono?
English
0
0
1
77
Sumit
Sumit@_reachsumit·
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval Proposes a Stratified Sampling strategy that uniformly covers the entire teacher score spectrum, outperforming top-K and random sampling. 📝 arxiv.org/abs/2604.04734
English
2
3
29
1.5K
jonah
jonah@drexalt·
@din0s_ unbelievable find in alphaxiv, I checked @_reachsumit first, will retweet his when he adds it lol
English
0
0
1
84
dinos
dinos@din0s_·
@drexalt did you just manifest this bruh
English
1
0
1
85