LightOn

2.4K posts

LightOn banner
LightOn

LightOn

@LightOnIO

LightOn is a leading European generative AI company delivering secure on-prem RAG for document intelligence, enabling safe use of sensitive data behind firewall

Paris, France Katılım Ekim 2015
829 Takip Edilen4.3K Takipçiler
Sabitlenmiş Tweet
LightOn
LightOn@LightOnIO·
Day Zero for Multi-Vector Retrieval. Today we’re flipping the retrieval playbook: no dense model adaptation, no retrofit. 🏗️Multi-vector from scratch, powered by PyLate. Meet ColBERT-Zero In collaboration with @EPFL and the Swiss AI initiative, @LightOnIO pre-trained it end-to-end for late-interaction retrieval 🥇 SOTA on BEIR, <150M params ⚡ Supervised-first → distill = most of the gains for a fraction of the cost 🧠 Prompt alignment is non-negotiable to preserve peak performance through fine-tuning Models, checkpoints, training code under Apache 2.0. 🧑‍🍳 Kudos to the whole team @antoine_chaffin @l_arnaboldi @AmelieTabatta @KrzakalaF ! 🔗 Dive into the release: lighton.ai/lighton-blogs/…
LightOn tweet media
English
2
19
89
8.9K
LightOn retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Model: huggingface.co/lightonai/Reas… If you are not familiar with the model, it is a multi-vector model built at @LightOnIO optimized for reasoning-intensive retrieval, outperforming model 45x bigger on the BRIGHT benchmark And now, it does it again, for agentic search/deep research
Antoine Chaffin tweet media
English
2
5
53
2.6K
LightOn
LightOn@LightOnIO·
The multi-vector era is here and there is no going back. Reason-ModernColBERT tops BrowseComp-Plus, the hardest agentic search benchmark available, by 7.59 points on accuracy. 🥇on accuracy. 🥇on recall. 🥇on calibration. 📉 Fewest search calls. The models it outperforms? Up to 54× larger. Reasoning-intensive retrieval (BRIGHT), code search (MTEB Code), agentic Deep Research (BrowseComp-Plus). The pattern is the same: late interaction dominates, with a fraction of the parameters. 149M parameters. Open weights. Open code. Built with PyLate in a few hours. Full results, analysis and recipe on LightOn blog: lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
16
96
13K
LightOn
LightOn@LightOnIO·
LightOn will be attending @NVIDIAGTC 2026, one of the world’s leading events for accelerated computing and AI 📍San Jose, CA 📅 March 14–19 If you’re there, come catch up with David Amara to talk about search and reason !
LightOn tweet media
English
0
0
5
307
LightOn
LightOn@LightOnIO·
AI agents are only as good as the information they can access. 🇪🇺 LightOn, an AI Search & Reason company, is partnering with @Linkup_platform to provide structured, real-time web search to Paradigm within a secure, fully European technology stack. Private data and the open web, securely unified in a single pipeline. Together, the two technologies enable organizations to build AI systems that are better informed, more reliable, and designed for demanding environments, combining search and reasoning to deliver accurate and actionable outputs. 🗞️ Read the press release: lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
4
11
706
LightOn
LightOn@LightOnIO·
🤖 Autonomous agents are already here. Garbage in, Garbage out, agents cannot fix your retrieval errors In this blog, we share the lessons learned from building NOVA (Numbers Over Vibes, Always), the evaluation framework LightOn uses internally to measure how agents actually behave on real data It's not about vibes or faith, just measurement 👉🏻 lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
3
8
478
LightOn
LightOn@LightOnIO·
Yes, Colgrep, built at @LightOnIO, saves you 👩 and your agents 🤖, money 💶 and time ⏳ Start saving your hard earned cash and tokens now! ⬇️⬇️⬇️ github.com/lightonai/next…
Rohan Jha@Robro612

@antoine_chaffin @raphaelsrty IMO you need to hammer harder on the point in this tweet of yours. People saying grep is all you need don’t realize that you can be more targeted and that it has tangible cost benefits at the agent task level, rather than just search problems x.com/antoine_chaffi…

English
0
4
12
913
LightOn
LightOn@LightOnIO·
Accord stratégique autour de l’IA générative pour le groupe @caissedesdepots Le consortium porté par @Computacenter et @SopraSteriaPS , dont @LightOnIO fait partie, a été retenu pour accompagner le déploiement d’usages d’IA générative au sein des entités du groupe. Un projet structurant pour intégrer l’IA tout en garantissant la maîtrise des données, des infrastructures et de la gouvernance. usine-digitale.fr/banque/souvera…
Français
0
3
6
439
LightOn
LightOn@LightOnIO·
The best benchmark is the one real users run themselves, on real documents. Independent, public, reproducible. Jonas Wacker tested nine OCR engines on four documents: a corporate doc, a handwriting sample, a multi-column annual report, a German medical bulletin. 🏆 LightOnOCR-2-1B scored Excellent across all four. The only engine to do so, at half the cost of Azure Document Intelligence. Quality gets you to production but cost keeps you there. A model that reads your documents but prices you out at scale isn't a solution. It's a pilot that never ships. Jonas made the benchmark. Now run yours. 👉 lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
4
20
1K
LightOn
LightOn@LightOnIO·
🗞️ We are rewriting the rules of retrieval. Token by token. Most search tools gets you close. NextPlaid gets you there, surfacing the exact passage your question deserves, with the signal and context your LLM actually needs. Discover the editorial from LightOn's CEO @IgorCarron in this month's newsletter, including major milestones: LightOn NextPlaid live in Paradigm, New SOTA long-context VLMs, ColBERT-Zero, LateOn-Code + ColGrep, and LightOn new enterprise search benchmark. The standard is ours to set. Read it here: lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
4
8
519
LightOn
LightOn@LightOnIO·
🎯The best questions are the ones no benchmark has ever tested. Sanctions exposure in inherited contracts / Force majeure across five jurisdictions / The spec version in effect when a part failed. LightOn EDiTh was built to test exactly those. 🏭 A documentary digital twin of a fictional €1.8B industrial group: 1000+ real documents, Several languages, 40 use cases grounded in the exact complexity enterprises actually live in. 📄Because right now, you can't put your hardest questions in a test set. So you evaluate on clean PDFs and ship on faith. No vanity score. Just the failure modes that break systems in production. ➡️ Find out where your pipeline stands. lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
4
10
2K
LightOn retweetledi
Austin Veselka
Austin Veselka@further_ai·
Excited to share my work "How to Train Your Long-Context Visual Document Model." (arxiv.org/abs/2602.15257) Research and recipes for training long-context VLMs for document understanding is entirely lacking. In this paper, I explore this frontier with extensive ablations.
English
1
5
20
2.2K
LightOn
LightOn@LightOnIO·
Retrieval brings the signal. Long context makes it scalable for agents. ✨ Introducing OriOn: the SOTA Long-Context VLM family built for agentic search & reasoning. OriOn processes up to 250 pages at full visual resolution in a single pass, with a 32B model that hits SOTA, beating models 7× larger on MMLongBenchDoc. SOTA performance / On-premise / No 200B+ compute footprint. We’re shipping it all in the open: 📄 Training recipes (CPT, SFT, LongPO) 📊 50+ ablation experiments 🗃️ MMLBD-C, our corrected MMLongBenchDoc (251 fixes, 16 removals) 🤗 Checkpoints + synthetic data pipelines on Hugging Face 👨‍🍳 Huge kudos to @further_ai for the recipe! 🔗 Dive into the release: lighton.ai/lighton-blogs/…
LightOn tweet media
English
5
22
77
12.5K
LightOn
LightOn@LightOnIO·
🔥Live from @WAICANNES Demo Stage 1! LightOn turns unstructured data into actionable intelligence. 📄 Parsing → Chunking → Indexing → Reranking ⚡️50x faster processing with 98% efficiency Stop Building RAG, Start Shipping Intelligence 👉 lighton.ai
LightOn tweet media
English
2
2
6
511
LightOn
LightOn@LightOnIO·
🔥 Stop burning tokens on blind grep searches. Give your coding agent semantic eyes. Meet LateOn-Code & ColGrep: a Rust-powered search tool and two SOTA late-interaction models that bring intent-level code retrieval directly to your terminal. ColGrep mirrors the grep interface your agents already use, but replaces pattern matching with semantic scoring, and supports hybrid queries that combine both. It plugs straight into Claude Code, OpenCode, or Codex. ColGrep is powered by LateOn-Code-edge (17M) and LateOn-Code (130M), the first late-interaction models purpose-built for code. 🏆 They top MTEB Code, outperforming models up to 17x their size while running instantly on a laptop. What we measured with Claude Code: 🚀 70% win rate vs. vanilla grep 📉 ~60k tokens saved per question 🤏 56% fewer search operations Built in Rust with Next-Plaid - 100% local - No code leaves your machine. Give your coding agent the search it deserves. Huge kudos to @antoine_chaffin and @raphaelsrty ! Read more: lighton.ai/lighton-blogs/…
LightOn tweet media
English
1
11
34
4.4K