Franck Lebeau

3.9K posts

Franck Lebeau

@_Kcnarf

#AI in #NLP, dataviz expert, full stack dev, math_art and day-to-day enthusiast; also, PhD in CS, and trying to reduce my environmental footprint

가입일 Aralık 2016

411 팔로잉563 팔로워

고정된 트윗

Franck Lebeau@_Kcnarf·12 Nis

I find Voronoi treemaps really appealing, bc of their special look and feel, which (I guess) makes this kind of #dataviz somehow attractive. I even made a JS/@d3js_org plugin (cf. github.com/Kcnarf/d3-voro…) These 🧵thread is just a collection of tweets with #voronoTreemap

GIF

English

109

5.8K

Franck Lebeau 리트윗함

Datawrapper@Datawrapper·6 Oca

The Data Vis Dispatch is back in its regular format! 💫 This week is short, with a focus on Venezuela. You'll also find retrospectives on 2025, as well as people's resolutions for 2026.

English

2.5K

Franck Lebeau 리트윗함

Datawrapper@Datawrapper·17 Ara

This week, we celebrate the launch of our new website with a special edition of the Data Vis Dispatch! 🥳 See where you may have come across Datawrapper visualizations before, and have a peek at our brand new website while you're at it. 📊 👀 ✨ datawrapper.de/blog/data-vis-…

English

3.2K

Franck Lebeau 리트윗함

Nadieh Bremer@NadiehBremer·14 Eki

Just released my newsletter with new projects and updates, such as a data art collection about food and voronoi treemaps about health and death. Read it in full here: visualcinnamon.substack.com/p/2025-q4-news…

English

Franck Lebeau 리트윗함

athletic coder@athleticKoder·7 Eki

You’re in a Machine Learning interview at Perplexity, and the interviewer asks: “Why do we need hybrid search? Isn’t vector search with embeddings enough?” Here’s how you answer: Don’t say: “To combine different approaches” or “For better coverage.” Too generic. The real answer is the semantic-lexical gap. Your embeddings understand meaning but ignore exact matches. Vector search alone misses the forest for the trees - or worse, the exact product code the user typed. Here’s why pure vector search fails: Your query is “iPhone 15 Pro Max 256GB.” Vector search returns “iPhone 15 Pro with lots of storage” and “latest flagship phone specs.” But the user wants EXACT model + EXACT capacity. Semantic understanding ≠ Precision matching. btw get this kinda content on your email for free, daily, subscribe to my newsletter -fullstackagents.substack.com The retrieval failure modes are brutal: Pure vector search: > Query: “ML-2847 error code” → Returns: General ML troubleshooting (0% useful) > Query: “React 18.2.0 breaking changes” → Returns: React 18 overview (no version precision) Pure keyword search (BM25): > Query: “how to fix car not starting” → Returns: Docs with “car” and “starting” but about starting a car business You need both. Always. The performance gap across real benchmarks: - BM25 alone: 67% MRR@10 - Dense retrieval alone: 71% MRR@10 - Hybrid (proper fusion): 82% MRR@10 That’s 15% improvement over the “best” single method. In production, that’s thousands of better answers per day. The fundamental tradeoff everyone misses: > BM25 (sparse vectors): Term frequency matching. Perfect for exact keywords, acronyms, codes. Fails at synonyms. > Dense embeddings: Semantic similarity. Perfect for meaning, paraphrases. Fails at exact matches. This is why you can’t pick one. You need intelligent fusion. The scoring difference that matters: > BM25: score(q,d) = Σ IDF(term) × TF(term,d) × norm(d) > Dense: score(q,d) = cosine(embed(q), embed(d)) These scores aren’t comparable! BM25 gives 0-15, cosine gives 0.7-0.95. This is why naive averaging fails. You need score normalization. The fusion algorithms you must know: 1. Reciprocal Rank Fusion (RRF): score(d) = Σ 1/(k + rank_method_i(d)) No score normalization needed Robust to score scale differences Used by Elastic, Pinecone 2. Weighted combination: score(d) = α × norm(score_bm25) + (1-α) × norm(score_dense) Requires score normalization α typically 0.3-0.5 More control but more tuning “So how do you choose the hybrid ratio?” Interviewer leans in. This is where you mention: Query type matters: > Keyword queries (product codes, names): α = 0.7 (favor BM25) > Natural language questions: α = 0.3 (favor dense) > Hybrid queries (”best iPhone under $500”): α = 0.5 > Measure and tune on YOUR data. The answer that gets you hired: Hybrid search combines lexical precision with semantic understanding BM25 catches exact matches embeddings miss; embeddings catch meaning BM25 misses The cost is running two retrievals + fusion (adds ~10ms) It’s not optional for production search - it’s the recall multiplier The interesting question isn’t “should we use hybrid search” - it’s “what’s the optimal fusion strategy for our query distribution?” Use RRF? Simple but less control. Use weighted combo? More tuning but better fit. The answer: Start with RRF, measure the gap, upgrade if needed. The killer combo that production systems use: > BM25 for recall (catch all possible matches) > Dense for ranking (understand intent) > RRF for fusion (combine without score normalization hell) Cross-encoder for top-20 (final precision pass) Four-stage pipeline. Each stage does what it’s best at.

English

492

50.7K

Franck Lebeau@_Kcnarf·26 Eyl

Some #voronoiTreemap spoted into 🇪🇺EU's monthly Data Stories, thanks to @NadiehBremer ⤵️ x.com/NadiehBremer/s…

Nadieh Bremer@NadiehBremer

📣 NEW WORK! Excited to share my latest work with the Publications Office of the European Union 🇪🇺 I got to create 9 dataviz for 3 of the EU's monthly Data Stories, covering fascinating topics, from leisure, health and the future. See all the visuals: visualcinnamon.com/portfolio/eu-d…

English

Franck Lebeau@_Kcnarf·12 Nis

GIF

English

109

5.8K

Franck Lebeau@_Kcnarf·26 Eyl

@NadiehBremer Your Voronoï maps are still so gorgeous 🤩

English

Nadieh Bremer@NadiehBremer·25 Eyl

English

1.7K

Franck Lebeau 리트윗함

Rohan Paul@rohanpaul_ai·25 Eyl

🚨 BAD news for Medical AI models. MASSIVE revelations from this @Microsoft paper. 🤯 Current medical AI models may look good on standard medical benchmarks but those scores do not mean the models can handle real medical reasoning. The key point is that many models pass tests by exploiting patterns in the data, not by actually combining medical text with images in a reliable way. The key findings are that models overuse shortcuts, break under small changes, and produce unfaithful reasoning. This makes the medical AI model's benchmark results misleading if someone assumes a high score means the model is ready for real medical use. --- The specific key findings from this paper 👇 - Models keep strong accuracy even when images are removed, even on questions that require vision, which signals shortcut use over real understanding. - Scores stay above the 20% guess rate without images, so text patterns alone often drive the answers. - Shuffling answer order changes predictions a lot, which exposes position and format bias rather than robust reasoning. - Replacing a distractor with “Unknown” does not stop many models from guessing, instead of abstaining when evidence is missing. - Swapping in a lookalike image that matches a wrong option makes accuracy collapse, which shows vision is not integrated with text. - Chain of thought often sounds confident while citing features that are not present, which means the explanations are unfaithful. - Audits reveal 3 failure modes, incorrect logic with correct answers, hallucinated perception, and visual reasoning with faulty grounding. - Gains on popular visual question answering do not transfer to report generation, which is closer to real clinical work. - Clinician reviews show benchmarks measure very different skills, so a single leaderboard number misleads on readiness. - Once shortcut strategies are disrupted, true comprehension is far weaker than the headline scores suggest. - Most models refuse to abstain without the image, which is unsafe behavior for medical use. - The authors push for a robustness score and explicit reasoning audits, which signals current evaluations are not enough. 🧵 Read on 👇

English

174

830

3.8K

521.7K

Franck Lebeau 리트윗함

Datawrapper@Datawrapper·23 Eyl

In this week's Dispatch, you'll find data vis on politics, trains, and minerals, but also interactive tools to explore, and yet another data game at the end. 📊 🕹️ datawrapper.de/blog/data-vis-…

English

1.5K

Amir Zur@AmirZur2000·7 Ağu

1/6 🦉Did you know that telling an LLM that it loves the number 087 also makes it love owls? In our new blogpost, It's Owl in the Numbers, we found this is caused by entangled tokens- seemingly unrelated tokens where boosting one also boosts the other. owls.baulab.info

English

656

69.8K

Franck Lebeau@_Kcnarf·7 Ağu

@cactushelmet @AmirZur2000 v.good catch !

English

cactushelmet@cactushelmet·7 Ağu

@AmirZur2000 wrong question

English

144

Franck Lebeau 리트윗함

tomaarsen@tomaarsen·6 Ağu

😎 I just published Sentence Transformers v5.1.0, and it's a big one. 2x-3x speedups of SparseEncoder models via ONNX and/or OpenVINO backends, easier distillation data preparation with hard negatives mining, and more! See 🧵for the deets:

English

132

5.1K

Franck Lebeau 리트윗함

Antoine Chaffin@antoine_chaffin·6 Ağu

Obviously it has been catched by @_reachsumit before the official announcement! 😁 I am very happy to announce that PyLate has now an associated paper and it has been accepted to CIKM! Very happy to share this milestone with my dear co-creator @raphaelsrty 🫶

Sumit@_reachsumit

PyLate: Flexible Training and Retrieval for Late Interaction Models @antoine_chaffin et al. introduce a streamlined library extending Sentence Transformers to support multi-vector architectures. 📝arxiv.org/abs/2508.03555 👨🏽‍💻github.com/lightonai/pyla…

English

4.3K

Franck Lebeau@_Kcnarf·7 Ağu

@currankelleher Tokenization is a thing, each model having their own counter-intuitive behaviors. For exemple, In the image, the singular form of the french word 'accueil' requires 2 tokens, whereas the plurialize form requires 3 very different tokens

English

Curran Kelleher@currankelleher·6 Ağu

Interesting how LLMs tokenize words within variable names

English

171

Franck Lebeau@_Kcnarf·5 Ağu

@svpino KISS rules them all

English

Santiago@svpino·5 Ağu

I recently came across a team using an LLM to implement a state machine. They called it "agentic". They also told me it worked correctly most of the time. I asked them to write down the list of rules for moving from one state to another. They listed them out. It didn't take long for them to realize what was happening: They had built a solution using AI for a problem that didn't need AI. We removed the model and wrote some code to implement the rules. It might have taken 2-3 hours at most. They went from "it works most of the time, it's relatively fast, and it costs some tokens" to "it works 100% of the time, instantaneous, and costs nothing." So many examples like this recently. Golden rule: BUILD THE SIMPLEST THING THAT COULD POSSIBLY WORK.

English

152

175

1.9K

141.7K

Franck Lebeau@_Kcnarf·4 Ağu

🤔Do you know that LLMs produce probabilities among each available token of the vocabulary. Only after comes the choice of the final outputed token. 👌Here is crystal clear, yet insightful, explanations of the various technics used to choose the next token

AI Coffee Break with Letitia@AICoffeeBreak

How do LLMs pick the next word? They don’t choose words directly: they only output word probabilities. 📊 Greedy decoding, top-k, top-p, min-p are methods that turn these probabilities into actual text. In this video, we break down each method and show how the same model can sound dull, brilliant, or unhinged – just by changing how it samples.

English

481

Franck Lebeau 리트윗함

Florian Tramèr@florian_tramer·4 Ağu

Are hallucinated references making it to arXiv? Yes, definitely! Since the release of Deep Research in February bogus references are on the rise (coincidence?) I wrote a blog post (link below) on my analysis (which hugely underestimates the true rate of hallucinations...)

English

282

34.4K

Franck Lebeau@_Kcnarf·4 Ağu

TLDR : 𝐀𝐈 + 𝐆𝐨𝐨𝐝 𝐕𝐒 𝐁𝐚𝐝 engineer Good engineer + AI ≥ 10* good Engineer ≥ 100* bad engineer + AI

Santiago@svpino

Every vibe-coder is generating as much technical debt as 10 regular developers in half the time. Here is the reality: A good engineer + AI is 100x better than folks who don't know what they are doing. Don't get carried away by the hype. Knowledge matters today more than ever.

English

Franck Lebeau@_Kcnarf·4 Ağu

@Yujie_Qian @lateinteraction Thank you for the clarification.

English

Yujie Qian@Yujie_Qian·31 Tem

@_Kcnarf @lateinteraction Omar's understanding of the model is correct. In the blog post, “generates the same number of vectors” is compared with the previous method, which independently generates an embedding for each chunk.

English

Omar Khattab@lateinteraction·23 Tem

Nice. Late interaction on the document side, at the granularity of chunks. Just add it on the query side and do MaxSim and voila!

Voyage AI by MongoDB@VoyageAI

Before: chunk overlaps, context summaries, metadata augmentation Now: voyage-context-3 processes the full doc in one pass and generates a distinct embedding for each chunk. Each embedding encodes the chunk-level details AND full doc context, for more semantically aware retrieval.

English

Franck Lebeau 리트윗함

Jo Kristian Bergum@jobergum·31 Tem

I think more AI builders now recognize that the core quality concern is context confusion, not context window length limitations. Lots of agent implementations now let users compress context to avoid quality degradation.

English

7.1K

Franck Lebeau@_Kcnarf·31 Tem

@jxmnop resonates with context "Kruft" x.com/HamelHusain/st…

Hamel Husain@HamelHusain

Kruft(n): bullshit in the prompt because you didn’t write it (smell: emojis), or copy pasta’d someone else’s 500 line thing from Twitter, or using a library that “prompts for you” yes. I am traumatized from all the prompt reading I do. send help

English

dr. jack morris@jxmnop·31 Tem

Context Rot is an excellent term and should be used more often

English

886

52.7K

탐색

@NadiehBremer @d3js_org @Microsoft @cactushelmet @AmirZur2000 @_reachsumit @raphaelsrty @currankelleher