gautham

397 posts

gautham banner
gautham

gautham

@capemox

distributed search @couchbase | @blevesearch member, @huggingface contributor | contrastive loss enjoyer

Bengaluru, India 가입일 Temmuz 2023
434 팔로잉85 팔로워
고정된 트윗
gautham
gautham@capemox·
Pretty big release of bleve out! GPU accelerated vector search (I helped on this hehe), binary quantization and a lot more! github.com/blevesearch/bl…
English
2
1
5
345
gautham
gautham@capemox·
@drexalt @mattjustram @yjoonjang Dupmae has a technique to train both the CLS as well as the other regular tokens. I agree that the CLS train may not be useful at all, but the others might be?
English
1
0
2
59
jonah
jonah@drexalt·
@capemox @mattjustram @yjoonjang I am pretty sure it would not be, since ColBERT isn’t using the MLM head anymore and the CLS token-maxxing doesn’t seem super aligned. I think the LateOn contrastive pre train probably stronger
English
2
0
0
58
gautham
gautham@capemox·
Wow ettin does NOT like being fine-tuned for splade. Distilling didn't work at all, it's either representation collapse or not sparse enough trying a few different loss functions now, I've tried MarginMSE so far
English
4
0
7
829
gautham
gautham@capemox·
Would have loved to be more aggressive in terms of batch size and amount of data but I'm kinda GPU poor (my poor RTX 4060m)
English
0
0
1
54
gautham
gautham@capemox·
I've released an ettin 68m checkpoint with contrastive pretraining! It's a much better starting point for downstream embedding tasks. Will be releasing similar checkpoints for the other smaller encoders in ettin. Check out the pretrain in the link in the comments
English
1
0
10
1.2K
gautham
gautham@capemox·
Can't believe they're following the Samsung Galaxy naming scheme lmao
gautham tweet media
English
0
0
2
66
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
@capemox @mattjustram This is false though We trained both « pure » encoders and « pure » encoders We also released decoders turned into encoders through MNTP continued pre-training, but those aren’t the base encoders
English
2
0
2
25
adamame 🌾
adamame 🌾@TheAdamEvans·
@capemox creates a much smaller, fixed, (sparse) dim to work with downstream, basically turns it into an encoder for you might have better isotropic properties and behave better for your use case 🤷‍♂️ gonna read those @yjoonjang linked papers, seems like that's pretty much it
English
2
0
2
67
Youngjoon Jang
Youngjoon Jang@yjoonjang·
@TheAdamEvans @capemox Yeah thats what the paper tells. But if you start with qwen, I think you'll need mntp training for bidirectional attention (Thats what the authors did)
English
1
0
2
45
gautham
gautham@capemox·
@TheAdamEvans zamn, tell me more basically, layer pruning qwen3.5-2b right? but why would qwen scope be useful, that's for interpretability right?
English
2
0
2
79
adamame 🌾
adamame 🌾@TheAdamEvans·
@capemox what if you used Qwen3.5-2B W32K-L0_100 at layer 11 or 15 as a starting point not exactly tiny but not huge either and qwen scope outputs are oh-so-smooth semantically
English
1
0
1
85
gautham
gautham@capemox·
@mattjustram Wow I totally missed this lmao. You're right, this could defo be the issue. I might try some dupmae style or contrastive pretraining to see if it helps
English
2
0
4
104
Jheng-Hong Yang
Jheng-Hong Yang@mattjustram·
@capemox i just realized they further trained the encoders with MNTP. i think it ruined the MLM init for SPLADE
English
1
0
1
86
gautham
gautham@capemox·
@mattjustram encoders. I've tried the smaller ones until 150m params
English
1
0
1
100
gautham
gautham@capemox·
With CC now having auto mode on pro, I can just let it loop every 5 mins to check if the run is going well. It just wished me good night.
English
0
0
1
101
gautham
gautham@capemox·
Workshop I wanted to submit to doesn't have remote attendance
GIF
English
0
0
2
128