LightOn

2.5K posts

LightOn

@LightOnIO

LightOn is a leading European generative AI company delivering secure on-prem RAG for document intelligence, enabling safe use of sensitive data behind firewall

Paris, France Katılım Ekim 2015

836 Takip Edilen5.3K Takipçiler

Sabitlenmiş Tweet

LightOn@LightOnIO·2 Haz

Today, we're introducing LightOn Console. ⚙️ Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations 🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastructure. No pipeline maintenance. No dedicated retrieval team required. Make your enterprise knowledge agent-readable now! Read the launch announcement: lighton.ai/lighton-blogs/… Test it now: console.lighton.ai

English

5.8K

LightOn@LightOnIO·5d

AI agents scale best when their economics scale with them. @LightOnIO makes this possible by delivering the right context to your models before generation. Better signal. Leaner prompts. Cleaner reasoning. As agents scale, token consumption compounds fast. In the past year alone: • Some agentic tasks used up to 650× more tokens • Frontier model pricing rose 4× in just eight months At scale, every unnecessary token becomes a recurring cost. With retrieval-first AI, agents can use fewer tokens, reduce inference costs, and operate more efficiently. Estimate your potential savings with the LightOn Agent Economics Simulator. Run the numbers → lighton.ai/roi-simulator

English

425

LightOn@LightOnIO·7 Tem

The free lunch just got bigger. After introducing hierarchical pooling, enabling 2× lossless compression for ColBERT state-of-the-art embedding models powering LightOn, we pushed the idea even further. By training directly for compressed representations, hierarchical pooling now reaches 5× near-lossless compression without degrading full-model performance. Same retrieval pipeline. Same retrieval quality. Less than a fifth of the storage footprint. Read Antoine's thread and dive into the technical details ↓ huggingface.co/blog/lightonai…

Antoine Chaffin@antoine_chaffin

Hierarchical pooling is very strong to reduce the footprint of late interaction without degrading results We recently showed that training for MUVERA/SMVE improved performance retention What if we trained for hierarchical pooling? We get even stronger (5×) lossless compression!

English

1.4K

LightOn retweetledi

doubleslash_dev@doubleslash_dev·2 Tem

Lighton : la solution française pour créer un RAG facilement. Ils gèrent tout, de la vectorisation à la base documentaire. @LightOnIO Dans notre dernier épisode double-slash.dev/podcasts/news-…

Français

970

LightOn@LightOnIO·1 Tem

Every great agent starts with the right context. LightOn Console plugs into your existing stack, connects to your LLM provider and your data sources, and gives every agent the exact signal it needs to solve the most complex tasks. No new stack. No migration. Just better retrieval, lower token consumption, and more room for reasoning. Build the agent you've always imagined. 👉🏻 Explore the integrations → developers.lighton.ai/integrations

English

555

LightOn@LightOnIO·30 Haz

One API. Three endpoints. Every block a SOTA model. LightOn Console is document intelligence you can build on. Parse, search, and reason over your documents through one API, called by you or by your agents. On OCR and retrieval benchmarks, LightOn models go head to head with the world's largest AI labs, and win where it counts. Benchmarks are where it starts. Production is where it's decided. ✅ Heavy parsing loads 🔍 Retrieval that scales 🔐 Access rights preserved 🌍 Deployment where your data is allowed to live That is what Console is built for. State-of-the-art OCR and retrieval, with auditable citations, sovereign deployment, and built from the ground up for agents and the people who direct them. A strong pipeline starts with strong models. A pipeline you trust is one you can test and audit in production. Console gives you both. 📊 Explore the benchmarks → lighton.ai/benchmarks ⚙️ Test LightOn Console → console.lighton.ai

English

LightOn@LightOnIO·26 Haz

Meet LightOn Facets. When faceted search meets agentic retrieval,it brings the precision every AI needs. More signal, less noise, at scale. Agents don't just run your filters. They discover the right facets from a plain query, then scope the search themselves. What reaches the model is already signal. Now live in LightOn Console : lighton.ai/lighton-blogs/…

English

997

LightOn@LightOnIO·26 Haz

🎬 Les cas d’usage impossibles – Saison 1, épisode 08 : Ingénierie aéronautique L’audit qui a évité l’immobilisation d’une flotte d’avions. 📄 Une consigne de navigabilité EASA rédigée en anglais. ✈️ Des servovalves AV-3000 à identifier sur des appareils en service, opérateur par opérateur. 🔎 Des preuves d’incorporation dispersées dans des certificats EASA Form 1, parfois rangés dans le dossier d’un autre appareil. Le risque n’est pas seulement de manquer une pièce défectueuse. Il est de déclarer un avion conforme sans preuve, ou d’immobiliser inutilement un appareil déjà traité. Un RAG classique retrouve la consigne. LightOn Console produit l’état réel du parc, unité par unité : quels appareils sont concernés, lesquels sont déjà mis en conformité, lesquels restent à traiter, et sur quel document repose chaque réponse. Dans l’aéronautique, l’audit qualité ne sert pas seulement à retrouver des documents. Il peut décider quels avions peuvent voler demain. 📰 Lire l’analyse complète : lighton.ai/fr-blog-posts/… 💻 Testez le scénario sur LightOn Console : lighton.ai/fr/home

Français

575

LightOn retweetledi

Amélie Chatelain@AmelieTabatta·24 Haz

So did something fun! As I am preparing a lecture on late interaction, reading papers on efficient retrieval, I did what any serious researcher would do nowadays: downloaded the papers, gave them to an agent, and asked it to build a beautiful frontend to explore them!

English

2.6K

LightOn@LightOnIO·24 Haz

Fewer Tokens, Better Answers: Give Your Agent Search, Not Raw Files Even with just 295 pages, agents benefit from search. We ran the same Claude agent on the same corpus two ways: 📂 Using its native tools to explore the corpus, inspect documents, and decide what to read 🔍 Using LightOn API for Search On a 5,000-page knowledge base, LightOn delivered: ✅ 36% fewer tokens ✅ Better answers in 75% of blind evaluations ✅ Flat search costs as the corpus grew 17× The interesting part isn't that long context windows eventually hit their limits. It's that search already improves efficiency and answer quality long before context becomes the bottleneck. A larger context window gives your agent more room, but it doesn't magically make it better at retrieval. Full breakdown 👇🏻 lighton.ai/lighton-blogs/… Start saving tokens now with : console.lighton.ai

English

2.5K

LightOn@LightOnIO·23 Haz

💬 « Nous avons choisi @LightOnIO parce que c'était l'équipe fiable, à l'écoute et réactive avec laquelle nous voulions construire sur le long terme. » Vincent Kerbiquet, Directeur d'Infocom'94 @infocom94 , opérateur public numérique mutualisant les services de 26 collectivités en Île-de-France, a choisi LightOn pour déployer une IA conversationnelle souveraine au service de ses adhérents. La confiance n'est pas un supplément. C'est le point de départ. 👉🏻 Plus d'informations : lighton.ai/fr-blog-posts/…

Français

588

LightOn@LightOnIO·23 Haz

🎬 Les cas d’usage impossibles – Saison 1, épisode 07 : RH La due diligence RH… qui pouvait sauver une cession. 📄 Des clauses de non-concurrence dispersées entre contrats, modèles pays, avenants scannés et textes de droit du travail. 🌍 Des clauses françaises, allemandes, britanniques, américaines et marocaines à interpréter différemment. 🔎 Une clause déjà levée, absente des récapitulatifs, mais décisive pour l’inventaire. Le risque n’est pas de manquer une clause. Il est de croire qu’une clause existe encore, ou qu’elle engage, alors qu’un tribunal pourrait l’écarter. Un RAG classique retrouve les bons contrats. LightOn Console produit une analyse sourcée, juridiction par juridiction, exploitable dans une data room avant que le conseil de l’acquéreur ne trouve la faille en premier. Dans une opération de cession, l’audit RH ne sert pas seulement à documenter le risque. Il peut sécuriser la transaction. 📰 Lire l’analyse complète : lighton.ai/fr-blog-posts/… 💻 Testez le scénario sur LightOn Console : lighton.ai/fr/home

Français

562

LightOn@LightOnIO·22 Haz

7 million pages of legal documents. One OCR model. That's what it took to build the LOCUS dataset, one of the largest publicly available collections of U.S. local laws. To process millions of legal documents published across thousands of cities and counties, @DenisPeskoff , @barrowjoseph , Christopher Vu, and Diag Davenport relied on LightOnOCR-2-1B to convert heterogeneous documents into a consistent, structured corpus, combined with ModernBERT for downstream processing and analysis. Why LightOnOCR-2? ⚡️ Up to 3× faster than larger OCR models 🎯 State-of-the-art performance 📄One single model, from document to markdown 📦 Just 1B parameters, making large-scale processing practical When you're processing millions of documents, accuracy isn't enough. You need speed, efficiency, and reliability at scale. 🔗 Dataset: huggingface.co/datasets/Local… 🔗 Try LightOnOCR-2 in the LightOn Console: console.lighton.ai

English

1.6K

LightOn retweetledi

Igor Carron@IgorCarron·20 Haz

Thanks to @LightOnIO's technology (LightOnOCR, ModernBERT), the local codes of all 9,239 US cities and counties 🇺🇸, 7M pages OCR'd into 2.2M analyzable sections (~80GB raw), are now machine-readable and analyzable at national scale. Outstanding work by @DenisPeskoff, @barrowjoseph & team (@UCBerkeley). For more on using LightOn's technology at scale try it for free at: LightOn.ai

Joe Barrow@barrowjoseph

New paper: every law in America is technically public. But not really, until now! With @DenisPeskoff at UC Berkeley, we built a corpus of ~every publicly accessibly city and county law, and released a huge chunk of it! 2.2 million laws, you're (probably) covered in it! 🧵

English

1.3K

LightOn@LightOnIO·19 Haz

"1 million de tokens vont remplacer le RAG." Vraiment ? Dans le dernier épisode de Que du Web, @AmelieTabatta , Head of Training & Inference chez @LightOnIO , revient sur l'un des débats les plus animés de l'IA aujourd'hui : RAG vs contexte long. 🔍 Coût réel des fenêtres de contexte géantes 🔐 Le défi des permissions en entreprise ⚙️ Lexical, sémantique, multivecteur et RAG agentique 🚀 Ce qui attend la recherche d'information dans les prochaines années Une discussion dense, concrète et sans idées reçues. 🎙️ À écouter : youtube.com/watch?v=5G5kKj… @Ibou_io @speyronnet @Greg0ry #AI #RAG #LLM #EnterpriseAI #Search #LightOn

YouTube

Français

712

LightOn@LightOnIO·18 Haz

"Building it, scaling it, paying for it." That's what @IgorCarron , CEO of @LightOnIO , will be discussing at VivaTech alongside leaders from @ENGIEgroup , @hcompany_ai ,Probabl , and @pleiasfr . Because the challenge is no longer whether AI works. It's how to get it into production. Today, open source models make it possible to build powerful AI applications at a fraction of the cost. With LightOn Console, enterprises can build, deploy, and scale AI in production, securely and on their own terms. 📍 ENGIE Stage (Hall 7.2, Booth 2F15) 📅 June 18 | 5:00 PM – 6:00 PM Come join us and let's talk AI in production.

English

905

LightOn@LightOnIO·17 Haz

Premier jour à VivaTech. Le sujet n'est plus l'adoption de l'IA. Le sujet, c'est la dépendance. Chaque jour qui passe renforce la position de ceux qui contrôlent les modèles, les infrastructures et les données. La souveraineté n'est plus un débat. C'est une course. 📍Rendez-vous : @FTGrandParis Pavilion (Hall 2, Stand 2D15-003) #VivaTech #AISouveraine #SouverainetéNumérique #AI #LightOn

Français

429

LightOn retweetledi

Antoine Chaffin@antoine_chaffin·16 Haz

Party is over, time to regularize ColBERT models to fix efficient ANN MUVERA and SMVE promised to simplify multi-vector retrieval infrastructure but broke on modern ColBERT models We found a fix, and it does the exact opposite of what we expected

English

111

18.4K

LightOn@LightOnIO·16 Haz

Multi-vector retrieval delivers the best retrieval quality available today. Now let's talk about how to make it affordable. Recent methods such as MUVERA and SMVE promised a way around that, but they unexpectedly broke on modern late-interaction models. In his latest study, @antoine_chaffin digs into why and finds a simple regularization that restores their effectiveness, while having a surprising effect on the embedding space. Kudos to him for a fascinating investigation and a surprisingly elegant fix. 📄 Read the blog: lighton.ai/lighton-blogs/… 🤗 Regularized model: huggingface.co/blog/lightonai…

English

778

Keşfet

@infocom94 @DenisPeskoff @barrowjoseph @UCBerkeley @AmelieTabatta @Ibou_io @speyronnet @Greg0ry