Arthur Câmara

0

43

Ben Clavié@bclavie·14 Oca

@ArthurCamara @minhash Would it be worth taking an unconventional route and down sampling the positives maybe? It’s a very minor thing but I’d love to be able to get some of TrecC’s signal in nano form 🤔 Happy to help if you ever want to explore it!

English

0

1

112

Arthur Câmara@ArthurCamara·13 Oca

Thanks @minhash for clearing one of the long-standing things on my to-do list! NanoBEIR scores should be higher than the full BEIR scores (it is only a subset of the full corpora, so it's "easier") But it should correlate quite well with the full scores.

minhash@minhash

When cooking your own embedding model, it's necessary to have a quick evaluation set to validate your ideas. That's what I was in need of when trying my own set of experiments, when I found @ZetaVector's NanoBEIR set. It's perfect! A subset of BEIR to validate ideas on~ Though one thing missing for me to use it was how correlated were scores on NanoBEIR to those of BEIR? I didn't find this metric on their blog, so I decided to calculate it myself with a few models. Generally, from what I see on a limited set of models that offered BEIR scores publically and calculating their NanoBEIR scores myself, the correlation is ~99%, which is great! The scores come out to be on the higher end usually, so that score can't be compared against BEIR score, but to check on what works and what doesn't, it's good enough. [ Then again, STSBenchmark scores are said to be ~70% correlated too—which was my previous "quick" evaluation set.

English

0

7

746

Arthur Câmara@ArthurCamara·14 Oca

@bclavie @minhash That and the fact that it has so many positives that it would make the corpus too large compared to the other datasets. It didn’t help that the size was also breaking my pipeline. 😅 I spent about a week trying to make it work on a single A100

English

0

2

62

Ben Clavié@bclavie·14 Oca

@ArthurCamara @minhash One question I had about NanoBEIR: did you exclude TREC-COVID purposefully because it only has 50 queries anyways? That would make sense, but given TREC's overall quality it initially caught me off guard haha.

English

0

1

191

Arthur Câmara@ArthurCamara·13 Oca

We created it by randomly sampling 50 queries per dataset. The corpora are the set of all positives and the intersection of the top-100 documents retrieved by Pyserini's BM25 and and Arctic-Embed-1.5-m.

English

3

144

Arthur Câmara@ArthurCamara·13 Oca

@minhash @ZetaVector Hey, thanks for the shout-out! Indeed, the scores are supposed to be higher (it is an easier set after all, with way less documents than the full collection), but the scores to BEIR should correlate quite well!

English

0

2

140

minhash@minhash·12 Oca

When cooking your own embedding model, it's necessary to have a quick evaluation set to validate your ideas. That's what I was in need of when trying my own set of experiments, when I found @ZetaVector's NanoBEIR set. It's perfect! A subset of BEIR to validate ideas on~ Though one thing missing for me to use it was how correlated were scores on NanoBEIR to those of BEIR? I didn't find this metric on their blog, so I decided to calculate it myself with a few models. Generally, from what I see on a limited set of models that offered BEIR scores publically and calculating their NanoBEIR scores myself, the correlation is ~99%, which is great! The scores come out to be on the higher end usually, so that score can't be compared against BEIR score, but to check on what works and what doesn't, it's good enough. [ Then again, STSBenchmark scores are said to be ~70% correlated too—which was my previous "quick" evaluation set.

English

4

18

3.4K

Arthur Câmara@ArthurCamara·6 Oca

Visualizing agents as a state machine (literally as an FSM) is actually a nice and more realistic abstraction.

Han@HanchungLee

state machines. pipelines are typically acyclic and stateless. workflows as fsms are much more robust and capable.

English

4

283

Arthur Câmara retweetledi

Logan Kilpatrick@OfficialLoganK·30 Ara

The world is going to look shockingly similar in 5 years, despite massive technological innovation enabled by AI.

English

279

104

2.4K

236.8K

Arthur Câmara retweetledi

Doug Turnbull@softwaredoug·4 Ara

I need a meeting response that's "I'm attending but working at the same time. Say my name 3 times to summon me for a question"

English

1

3

454

Arthur Câmara@ArthurCamara·27 Kas

Amazing opportunity that I would take in a heartbeat if I were a student!

tomaarsen@tomaarsen

I'm looking for an intern to introduce Sparse Embedding models to Sentence Transformers! If you're passionate about open source, interested in helping practitioners use your tools, and enjoy embedders/retrievers/rerankers, then I'd love to hear from you! Links to apply in 🧵

English

2

138

Arthur Câmara@ArthurCamara·26 Kas

@itsaflecha Aqui no noroeste tá de boa

Português

0

1

98

itsaflecha@itsaflecha·26 Kas

eu não sei o que cacete aconteceu, mas todos os storage da amazon estao pra depois do natal, promoção ou não va xi, traga taiwan de volta e faça essa cambada trabalhar

Português

0

8

1.5K

Arthur Câmara@ArthurCamara·21 Kas

@Robro612 I’ve been bullish on listwise learning for a while. Too bad there are not many training datasets with multiple relevance annotations per query. Synthetic data to the rescue?

English

Mathew Jacob@mat_jacob1002

2

54

Rohan Jha@Robro612·20 Kas

My knee-jerk reaction to this was "surely they're talking about the lost-in-the-middle phenomenon affecting listwise rerankers." Interesting that the authors propose listwise as a path forward! Time to revisit some assumptions present in a proposal I'm writing...

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO and BEIR. We ask: on private or tricky IR tasks, are current rerankers even better? Surely, reranking as many docs as you can afford is best?

English

0

12

1.5K

Arthur Câmara@ArthurCamara·19 Kas

@charliermarsh No, thank YOU for uv (and ruff, of course)

English

2

48

Charlie Marsh@charliermarsh·19 Kas

@ArthurCamara Thank you!

English

Charlie Marsh@charliermarsh

0

66

Arthur Câmara@ArthurCamara·19 Kas

TBH, uv is one of the best things that happened in the Python ecosystem recently.

The latest uv release includes support for conflicting dependencies across optional groups. A subtle but very powerful feature. For example: use the PyTorch CPU build with `uv sync --extra cpu` and the CUDA build with `uv sync --extra gpu`. All powered by a single lockfile.

English

INTERIOR PORN@INTERIORPORN1

0

37

1.4K

Arthur Câmara@ArthurCamara·19 Kas

Tell me you don’t have kids without telling me you don’t have kids.

I just fell in love with this living space

English

0

4

273

Arthur Câmara@ArthurCamara·19 Kas

@GergelyOrosz Of course, if you just open the chat, ask the full question to the LLM and copy-and-paste their answer without critical thinking, that’s a red flag, and we will ask you to throughly explain how that code works. Other than that, all good.

English

1

36

Arthur Câmara@ArthurCamara·19 Kas

@GergelyOrosz We explicitly tell people to use whatever tools they are used to, including cursor/copilot/etc. The reasoning is the same. Everyone here uses it. You will probably use it. We just want to see how you think through the problem and how you come to an answer.

English

0

1

288

Gergely Orosz@GergelyOrosz·19 Kas

From researching GenAI's impact on interviews. Quote from a Director of Eng: "We actively encourage engineers at my company to use GenAI tools, day-to-day. Given this: why not permit it during interviews, especially if aiming to make interviews reflect real-world conditions?“

English

16

1

123

21.2K

Arthur Câmara@ArthurCamara·13 Kas

@beirmug @tomaarsen @Robro612 @JinaAI_ I was writing an answer to you on LinkedIn, my phone’s battery died and I forgot to get back to it, sorry about that. 😅

English

0

2

115

Nandan Thakur@beirmug·13 Kas

@tomaarsen @Robro612 @JinaAI_ @ArthurCamara +1 on curiosity on the correlations with the whole BEIR subset.

English

0

2

104

Rohan Jha@Robro612·13 Kas

.@tomaarsen continues shipping to support retriever development! NanoBEIR has been smth that a lot of researchers (myself @JinaAI_ included) have used but never standardized Would love to know the sampling method that gets the 10k docs- BM25 top 250 + judged worked well for us

tomaarsen@tomaarsen

4️⃣ We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks. 🧵

English

17

1.3K

Arthur Câmara@ArthurCamara·13 Kas

@tomaarsen @Robro612 @JinaAI_ I can take a deeper look into correlation with other models later this week. It’s something I really want to do, just didn’t had the time to finish yet.

English

2

51

Arthur Câmara@ArthurCamara·13 Kas

@tomaarsen @Robro612 @JinaAI_ Yes! So, we sampled using the Anserini’s BM25 and arctic-embed-1.5-m and the positives, of course. We haven’t tested the correlation extensively, but for 7B parameters, the correlation was really good. I can share the code we used to create the dataset and the numbers we have.

English