Franck Lebeau

3.9K posts

Franck Lebeau banner
Franck Lebeau

Franck Lebeau

@_Kcnarf

#AI in #NLP, dataviz expert, full stack dev, math_art and day-to-day enthusiast; also, PhD in CS, and trying to reduce my environmental footprint

가입일 Aralık 2016
411 팔로잉563 팔로워
Franck Lebeau 리트윗함
Datawrapper
Datawrapper@Datawrapper·
The Data Vis Dispatch is back in its regular format! 💫 This week is short, with a focus on Venezuela. You'll also find retrospectives on 2025, as well as people's resolutions for 2026.
Datawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet media
English
0
5
44
2.5K
Franck Lebeau 리트윗함
Datawrapper
Datawrapper@Datawrapper·
This week, we celebrate the launch of our new website with a special edition of the Data Vis Dispatch! 🥳 See where you may have come across Datawrapper visualizations before, and have a peek at our brand new website while you're at it. 📊 👀 ✨ datawrapper.de/blog/data-vis-…
Datawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet media
English
0
10
46
3.2K
Franck Lebeau 리트윗함
athletic coder
athletic coder@athleticKoder·
You’re in a Machine Learning interview at Perplexity, and the interviewer asks: “Why do we need hybrid search? Isn’t vector search with embeddings enough?” Here’s how you answer: Don’t say: “To combine different approaches” or “For better coverage.” Too generic. The real answer is the semantic-lexical gap. Your embeddings understand meaning but ignore exact matches. Vector search alone misses the forest for the trees - or worse, the exact product code the user typed. Here’s why pure vector search fails: Your query is “iPhone 15 Pro Max 256GB.” Vector search returns “iPhone 15 Pro with lots of storage” and “latest flagship phone specs.” But the user wants EXACT model + EXACT capacity. Semantic understanding ≠ Precision matching. btw get this kinda content on your email for free, daily, subscribe to my newsletter -fullstackagents.substack.com The retrieval failure modes are brutal: Pure vector search: > Query: “ML-2847 error code” → Returns: General ML troubleshooting (0% useful) > Query: “React 18.2.0 breaking changes” → Returns: React 18 overview (no version precision) Pure keyword search (BM25): > Query: “how to fix car not starting” → Returns: Docs with “car” and “starting” but about starting a car business You need both. Always. The performance gap across real benchmarks: - BM25 alone: 67% MRR@10 - Dense retrieval alone: 71% MRR@10 - Hybrid (proper fusion): 82% MRR@10 That’s 15% improvement over the “best” single method. In production, that’s thousands of better answers per day. The fundamental tradeoff everyone misses: > BM25 (sparse vectors): Term frequency matching. Perfect for exact keywords, acronyms, codes. Fails at synonyms. > Dense embeddings: Semantic similarity. Perfect for meaning, paraphrases. Fails at exact matches. This is why you can’t pick one. You need intelligent fusion. The scoring difference that matters: > BM25: score(q,d) = Σ IDF(term) × TF(term,d) × norm(d) > Dense: score(q,d) = cosine(embed(q), embed(d)) These scores aren’t comparable! BM25 gives 0-15, cosine gives 0.7-0.95. This is why naive averaging fails. You need score normalization. The fusion algorithms you must know: 1. Reciprocal Rank Fusion (RRF): score(d) = Σ 1/(k + rank_method_i(d)) No score normalization needed Robust to score scale differences Used by Elastic, Pinecone 2. Weighted combination: score(d) = α × norm(score_bm25) + (1-α) × norm(score_dense) Requires score normalization α typically 0.3-0.5 More control but more tuning “So how do you choose the hybrid ratio?” Interviewer leans in. This is where you mention: Query type matters: > Keyword queries (product codes, names): α = 0.7 (favor BM25) > Natural language questions: α = 0.3 (favor dense) > Hybrid queries (”best iPhone under $500”): α = 0.5 > Measure and tune on YOUR data. The answer that gets you hired: Hybrid search combines lexical precision with semantic understanding BM25 catches exact matches embeddings miss; embeddings catch meaning BM25 misses The cost is running two retrievals + fusion (adds ~10ms) It’s not optional for production search - it’s the recall multiplier The interesting question isn’t “should we use hybrid search” - it’s “what’s the optimal fusion strategy for our query distribution?” Use RRF? Simple but less control. Use weighted combo? More tuning but better fit. The answer: Start with RRF, measure the gap, upgrade if needed. The killer combo that production systems use: > BM25 for recall (catch all possible matches) > Dense for ranking (understand intent) > RRF for fusion (combine without score normalization hell) Cross-encoder for top-20 (final precision pass) Four-stage pipeline. Each stage does what it’s best at.
English
19
47
492
50.7K
Nadieh Bremer
Nadieh Bremer@NadiehBremer·
📣 NEW WORK! Excited to share my latest work with the Publications Office of the European Union 🇪🇺 I got to create 9 dataviz for 3 of the EU's monthly Data Stories, covering fascinating topics, from leisure, health and the future. See all the visuals: visualcinnamon.com/portfolio/eu-d…
Nadieh Bremer tweet media
English
1
3
26
1.7K
Franck Lebeau 리트윗함
Rohan Paul
Rohan Paul@rohanpaul_ai·
🚨 BAD news for Medical AI models. MASSIVE revelations from this @Microsoft paper. 🤯 Current medical AI models may look good on standard medical benchmarks but those scores do not mean the models can handle real medical reasoning. The key point is that many models pass tests by exploiting patterns in the data, not by actually combining medical text with images in a reliable way. The key findings are that models overuse shortcuts, break under small changes, and produce unfaithful reasoning. This makes the medical AI model's benchmark results misleading if someone assumes a high score means the model is ready for real medical use. --- The specific key findings from this paper 👇 - Models keep strong accuracy even when images are removed, even on questions that require vision, which signals shortcut use over real understanding. - Scores stay above the 20% guess rate without images, so text patterns alone often drive the answers. - Shuffling answer order changes predictions a lot, which exposes position and format bias rather than robust reasoning. - Replacing a distractor with “Unknown” does not stop many models from guessing, instead of abstaining when evidence is missing. - Swapping in a lookalike image that matches a wrong option makes accuracy collapse, which shows vision is not integrated with text. - Chain of thought often sounds confident while citing features that are not present, which means the explanations are unfaithful. - Audits reveal 3 failure modes, incorrect logic with correct answers, hallucinated perception, and visual reasoning with faulty grounding. - Gains on popular visual question answering do not transfer to report generation, which is closer to real clinical work. - Clinician reviews show benchmarks measure very different skills, so a single leaderboard number misleads on readiness. - Once shortcut strategies are disrupted, true comprehension is far weaker than the headline scores suggest. - Most models refuse to abstain without the image, which is unsafe behavior for medical use. - The authors push for a robustness score and explicit reasoning audits, which signals current evaluations are not enough. 🧵 Read on 👇
Rohan Paul tweet media
English
174
830
3.8K
521.7K
Franck Lebeau 리트윗함
Datawrapper
Datawrapper@Datawrapper·
In this week's Dispatch, you'll find data vis on politics, trains, and minerals, but also interactive tools to explore, and yet another data game at the end. 📊 🕹️ datawrapper.de/blog/data-vis-…
Datawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet mediaDatawrapper tweet media
English
0
4
8
1.5K
Amir Zur
Amir Zur@AmirZur2000·
1/6 🦉Did you know that telling an LLM that it loves the number 087 also makes it love owls? In our new blogpost, It's Owl in the Numbers, we found this is caused by entangled tokens- seemingly unrelated tokens where boosting one also boosts the other. owls.baulab.info
English
18
74
656
69.8K
Franck Lebeau 리트윗함
tomaarsen
tomaarsen@tomaarsen·
😎 I just published Sentence Transformers v5.1.0, and it's a big one. 2x-3x speedups of SparseEncoder models via ONNX and/or OpenVINO backends, easier distillation data preparation with hard negatives mining, and more! See 🧵for the deets:
tomaarsen tweet media
English
1
15
132
5.1K
Franck Lebeau 리트윗함
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Obviously it has been catched by @_reachsumit before the official announcement! 😁 I am very happy to announce that PyLate has now an associated paper and it has been accepted to CIKM! Very happy to share this milestone with my dear co-creator @raphaelsrty 🫶
Sumit@_reachsumit

PyLate: Flexible Training and Retrieval for Late Interaction Models @antoine_chaffin et al. introduce a streamlined library extending Sentence Transformers to support multi-vector architectures. 📝arxiv.org/abs/2508.03555 👨🏽‍💻github.com/lightonai/pyla…

English
3
7
39
4.3K
Franck Lebeau
Franck Lebeau@_Kcnarf·
@currankelleher Tokenization is a thing, each model having their own counter-intuitive behaviors. For exemple, In the image, the singular form of the french word 'accueil' requires 2 tokens, whereas the plurialize form requires 3 very different tokens
Franck Lebeau tweet media
English
0
0
0
17
Curran Kelleher
Curran Kelleher@currankelleher·
Interesting how LLMs tokenize words within variable names
Curran Kelleher tweet media
English
1
0
0
171
Santiago
Santiago@svpino·
I recently came across a team using an LLM to implement a state machine. They called it "agentic". They also told me it worked correctly most of the time. I asked them to write down the list of rules for moving from one state to another. They listed them out. It didn't take long for them to realize what was happening: They had built a solution using AI for a problem that didn't need AI. We removed the model and wrote some code to implement the rules. It might have taken 2-3 hours at most. They went from "it works most of the time, it's relatively fast, and it costs some tokens" to "it works 100% of the time, instantaneous, and costs nothing." So many examples like this recently. Golden rule: BUILD THE SIMPLEST THING THAT COULD POSSIBLY WORK.
English
152
175
1.9K
141.7K
Franck Lebeau
Franck Lebeau@_Kcnarf·
🤔Do you know that LLMs produce probabilities among each available token of the vocabulary. Only after comes the choice of the final outputed token. 👌Here is crystal clear, yet insightful, explanations of the various technics used to choose the next token
AI Coffee Break with Letitia@AICoffeeBreak

How do LLMs pick the next word? They don’t choose words directly: they only output word probabilities. 📊 Greedy decoding, top-k, top-p, min-p are methods that turn these probabilities into actual text. In this video, we break down each method and show how the same model can sound dull, brilliant, or unhinged – just by changing how it samples.

English
0
1
3
481
Franck Lebeau 리트윗함
Florian Tramèr
Florian Tramèr@florian_tramer·
Are hallucinated references making it to arXiv? Yes, definitely! Since the release of Deep Research in February bogus references are on the rise (coincidence?) I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations...)
Florian Tramèr tweet media
English
9
26
282
34.4K
Yujie Qian
Yujie Qian@Yujie_Qian·
@_Kcnarf @lateinteraction Omar's understanding of the model is correct. In the blog post, “generates the same number of vectors” is compared with the previous method, which independently generates an embedding for each chunk.
English
1
0
0
18
Franck Lebeau 리트윗함
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
I think more AI builders now recognize that the core quality concern is context confusion, not context window length limitations. Lots of agent implementations now let users compress context to avoid quality degradation.
English
8
5
75
7.1K
dr. jack morris
dr. jack morris@jxmnop·
Context Rot is an excellent term and should be used more often
dr. jack morris tweet media
English
33
52
886
52.7K