Arthur Câmara

50.1K posts

Arthur Câmara banner
Arthur Câmara

Arthur Câmara

@ArthurCamara

Applied IR & NLP Research @ZetaVector. Making search good | CS PhD @tudelft, ex-@naverlabseurope, @bloomberg | #T1D #ADHD | CNF✈️AMS (he/him)

The Hague, The Netherlands Katılım Şubat 2008
832 Takip Edilen1.1K Takipçiler
Arthur Câmara
Arthur Câmara@ArthurCamara·
@bclavie @minhash Hmmm then we can keep only the positives in the top-K from both methods? This is also a good excuse to make the code I used available and add it as the first issue. 😬 I will try to do it later today.
English
1
0
0
43
Ben Clavié
Ben Clavié@bclavie·
@ArthurCamara @minhash Would it be worth taking an unconventional route and down sampling the positives maybe? It’s a very minor thing but I’d love to be able to get some of TrecC’s signal in nano form 🤔 Happy to help if you ever want to explore it!
English
1
0
1
112
Arthur Câmara
Arthur Câmara@ArthurCamara·
Thanks @minhash for clearing one of the long-standing things on my to-do list! NanoBEIR scores should be higher than the full BEIR scores (it is only a subset of the full corpora, so it's "easier") But it should correlate quite well with the full scores.
minhash@minhash

When cooking your own embedding model, it's necessary to have a quick evaluation set to validate your ideas. That's what I was in need of when trying my own set of experiments, when I found @ZetaVector's NanoBEIR set. It's perfect! A subset of BEIR to validate ideas on~ Though one thing missing for me to use it was how correlated were scores on NanoBEIR to those of BEIR? I didn't find this metric on their blog, so I decided to calculate it myself with a few models. Generally, from what I see on a limited set of models that offered BEIR scores publically and calculating their NanoBEIR scores myself, the correlation is ~99%, which is great! The scores come out to be on the higher end usually, so that score can't be compared against BEIR score, but to check on what works and what doesn't, it's good enough. [ Then again, STSBenchmark scores are said to be ~70% correlated too—which was my previous "quick" evaluation set.

English
2
0
7
746
Arthur Câmara
Arthur Câmara@ArthurCamara·
@bclavie @minhash That and the fact that it has so many positives that it would make the corpus too large compared to the other datasets. It didn’t help that the size was also breaking my pipeline. 😅 I spent about a week trying to make it work on a single A100
English
1
0
2
62
Ben Clavié
Ben Clavié@bclavie·
@ArthurCamara @minhash One question I had about NanoBEIR: did you exclude TREC-COVID purposefully because it only has 50 queries anyways? That would make sense, but given TREC's overall quality it initially caught me off guard haha.
English
1
0
1
191
Arthur Câmara
Arthur Câmara@ArthurCamara·
We created it by randomly sampling 50 queries per dataset. The corpora are the set of all positives and the intersection of the top-100 documents retrieved by Pyserini's BM25 and and Arctic-Embed-1.5-m.
English
0
0
3
144
Arthur Câmara
Arthur Câmara@ArthurCamara·
@minhash @ZetaVector Hey, thanks for the shout-out! Indeed, the scores are supposed to be higher (it is an easier set after all, with way less documents than the full collection), but the scores to BEIR should correlate quite well!
English
1
0
2
140
minhash
minhash@minhash·
When cooking your own embedding model, it's necessary to have a quick evaluation set to validate your ideas. That's what I was in need of when trying my own set of experiments, when I found @ZetaVector's NanoBEIR set. It's perfect! A subset of BEIR to validate ideas on~ Though one thing missing for me to use it was how correlated were scores on NanoBEIR to those of BEIR? I didn't find this metric on their blog, so I decided to calculate it myself with a few models. Generally, from what I see on a limited set of models that offered BEIR scores publically and calculating their NanoBEIR scores myself, the correlation is ~99%, which is great! The scores come out to be on the higher end usually, so that score can't be compared against BEIR score, but to check on what works and what doesn't, it's good enough. [ Then again, STSBenchmark scores are said to be ~70% correlated too—which was my previous "quick" evaluation set.
minhash tweet media
English
2
4
18
3.4K
Arthur Câmara retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
The world is going to look shockingly similar in 5 years, despite massive technological innovation enabled by AI.
English
279
104
2.4K
236.8K
Arthur Câmara retweetledi
Doug Turnbull
Doug Turnbull@softwaredoug·
I need a meeting response that's "I'm attending but working at the same time. Say my name 3 times to summon me for a question"
English
2
1
3
454
itsaflecha
itsaflecha@itsaflecha·
eu não sei o que cacete aconteceu, mas todos os storage da amazon estao pra depois do natal, promoção ou não va xi, traga taiwan de volta e faça essa cambada trabalhar
itsaflecha tweet media
Português
2
0
8
1.5K
Arthur Câmara
Arthur Câmara@ArthurCamara·
@Robro612 I’ve been bullish on listwise learning for a while. Too bad there are not many training datasets with multiple relevance annotations per query. Synthetic data to the rescue?
English
0
0
2
54
Rohan Jha
Rohan Jha@Robro612·
My knee-jerk reaction to this was "surely they're talking about the lost-in-the-middle phenomenon affecting listwise rerankers." Interesting that the authors propose listwise as a path forward! Time to revisit some assumptions present in a proposal I'm writing...
Mathew Jacob@mat_jacob1002

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO and BEIR. We ask: on private or tricky IR tasks, are current rerankers even better? Surely, reranking as many docs as you can afford is best?

English
2
0
12
1.5K
Arthur Câmara
Arthur Câmara@ArthurCamara·
@GergelyOrosz Of course, if you just open the chat, ask the full question to the LLM and copy-and-paste their answer without critical thinking, that’s a red flag, and we will ask you to throughly explain how that code works. Other than that, all good.
English
0
0
1
36
Arthur Câmara
Arthur Câmara@ArthurCamara·
@GergelyOrosz We explicitly tell people to use whatever tools they are used to, including cursor/copilot/etc. The reasoning is the same. Everyone here uses it. You will probably use it. We just want to see how you think through the problem and how you come to an answer.
English
1
0
1
288
Gergely Orosz
Gergely Orosz@GergelyOrosz·
From researching GenAI's impact on interviews. Quote from a Director of Eng: "We actively encourage engineers at my company to use GenAI tools, day-to-day. Given this: why not permit it during interviews, especially if aiming to make interviews reflect real-world conditions?“
English
16
1
123
21.2K
Rohan Jha
Rohan Jha@Robro612·
.@tomaarsen continues shipping to support retriever development! NanoBEIR has been smth that a lot of researchers (myself @JinaAI_ included) have used but never standardized Would love to know the sampling method that gets the 10k docs- BM25 top 250 + judged worked well for us
tomaarsen@tomaarsen

4️⃣ We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks. 🧵

English
1
1
17
1.3K
Arthur Câmara
Arthur Câmara@ArthurCamara·
@tomaarsen @Robro612 @JinaAI_ I can take a deeper look into correlation with other models later this week. It’s something I really want to do, just didn’t had the time to finish yet.
English
0
0
2
51
Arthur Câmara
Arthur Câmara@ArthurCamara·
@tomaarsen @Robro612 @JinaAI_ Yes! So, we sampled using the Anserini’s BM25 and arctic-embed-1.5-m and the positives, of course. We haven’t tested the correlation extensively, but for 7B parameters, the correlation was really good. I can share the code we used to create the dataset and the numbers we have.
English
1
0
4
75
Arthur Câmara retweetledi
tomaarsen
tomaarsen@tomaarsen·
I just released Sentence Transformers v3.3.0 & it's huge! 4.5x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free perf. boost, PEFT integration, evaluation on NanoBEIR, and more! Full release notes: github.com/UKPLab/sentenc… Details in 🧵
tomaarsen tweet media
English
7
41
223
14.8K