ParisMachineLearning

7.3K posts

ParisMachineLearning banner
ParisMachineLearning

ParisMachineLearning

@ParisMLgroup

We're the Paris ML Meetup. One of world's largest (8300+) https://t.co/z6d9xH9wTC @BardolFranck, @JackieForien, @CFalguiere & @IgorCarron

Paris Katılım Şubat 2015
841 Takip Edilen5.2K Takipçiler
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
Day Zero for Multi-Vector Retrieval. Today we’re flipping the retrieval playbook: no dense model adaptation, no retrofit. 🏗️Multi-vector from scratch, powered by PyLate. Meet ColBERT-Zero In collaboration with @EPFL and the Swiss AI initiative, @LightOnIO pre-trained it end-to-end for late-interaction retrieval 🥇 SOTA on BEIR, <150M params ⚡ Supervised-first → distill = most of the gains for a fraction of the cost 🧠 Prompt alignment is non-negotiable to preserve peak performance through fine-tuning Models, checkpoints, training code under Apache 2.0. 🧑‍🍳 Kudos to the whole team @antoine_chaffin @l_arnaboldi @AmelieTabatta @KrzakalaF ! 🔗 Dive into the release: lighton.ai/lighton-blogs/…
LightOn tweet media
English
2
20
89
9K
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
Retrieval brings the signal. Long context makes it scalable for agents. ✨ Introducing OriOn: the SOTA Long-Context VLM family built for agentic search & reasoning. OriOn processes up to 250 pages at full visual resolution in a single pass, with a 32B model that hits SOTA, beating models 7× larger on MMLongBenchDoc. SOTA performance / On-premise / No 200B+ compute footprint. We’re shipping it all in the open: 📄 Training recipes (CPT, SFT, LongPO) 📊 50+ ablation experiments 🗃️ MMLBD-C, our corrected MMLongBenchDoc (251 fixes, 16 removals) 🤗 Checkpoints + synthetic data pipelines on Hugging Face 👨‍🍳 Huge kudos to @further_ai for the recipe! 🔗 Dive into the release: lighton.ai/lighton-blogs/…
LightOn tweet media
English
5
21
77
12.5K
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
🔥 Stop burning tokens on blind grep searches. Give your coding agent semantic eyes. Meet LateOn-Code & ColGrep: a Rust-powered search tool and two SOTA late-interaction models that bring intent-level code retrieval directly to your terminal. ColGrep mirrors the grep interface your agents already use, but replaces pattern matching with semantic scoring, and supports hybrid queries that combine both. It plugs straight into Claude Code, OpenCode, or Codex. ColGrep is powered by LateOn-Code-edge (17M) and LateOn-Code (130M), the first late-interaction models purpose-built for code. 🏆 They top MTEB Code, outperforming models up to 17x their size while running instantly on a laptop. What we measured with Claude Code: 🚀 70% win rate vs. vanilla grep 📉 ~60k tokens saved per question 🤏 56% fewer search operations Built in Rust with Next-Plaid - 100% local - No code leaves your machine. Give your coding agent the search it deserves. Huge kudos to @antoine_chaffin and @raphaelsrty ! Read more: lighton.ai/lighton-blogs/…
LightOn tweet media
English
1
11
34
4.4K
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
🔍🪡To find the needle, you better index every straw of the haystack. Today, LightOn is launching LightOn NextPlaid: a CPU-optimized multi-vector database that indexes at the token level. By representing documents as sets of vectors, one per token, we preserve the distinct concepts and precise details that other search engines average away. Why NextPlaid is the missing layer for your RAG stack: 🎯Precision Matching: Retrieval matches at the token level, surfacing the exact passage that answers your question rather than just a document that vaguely relates. 📉Frugal Inference: High-signal context reduces the amount of noise sent to your LLM, allowing it to answer with fewer, more accurate tokens. 🚀Seamless Integration: NextPlaid runs alongside your existing vector database. You can add multi-vector retrieval to your established RAG pipeline without ripping anything out. ⚙️Production Ready: Built in Rust and optimized for CPUs, it supports incremental index updates and concurrent reads/writes—capabilities missing from standard implementations. NextPlaid represents the "Blanc" milestone in our Bleu/Blanc/Rouge roadmap for enterprise document intelligence. It follows the "Bleu" release, LightOnOCR-2, a SOTA 1B OCR model which converts complex documents into clean, usable text. Huge kudos to @raphaelsrty for shipping this breakthrough! 🙌 Read the full article here 👉 lighton.ai/lighton-blogs/…
LightOn tweet media
English
4
12
42
8.7K
ParisMachineLearning retweetledi
Igor Carron
Igor Carron@IgorCarron·
the first LLM in french (PAGnol) and the latest LightOnOCR, both by @LightOnIO , were trained on that cluster. Jean Zay by @Genci_fr and #Idris
Yann LeCun@ylecun

@chatgpt21 @arthur_spirling Actually France has had a GPU cluster dedicated to academic research in AI since 2019. There is a huge government investment in computing infrastructure at the moment. The US still has nothing.

English
2
12
55
8.2K
ParisMachineLearning retweetledi
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
Today, @LightOnIO releases ModernBERT, a SOTA model for retrieval and classification. This work was performed in collaboration with @answerdotai and the model was trained on @orangebusiness Cloud Avenue infrastructure. 🚀Why ModernBERT? - ModernBERT performs better on benchmarks - ModernBERT handles a larger context size - ModernBERT handles code - ModernBERT is faster to finetune on your data than any other model - ModernBERT offers very fast inference on GPU-constrained setups - ModernBERT is released under Apache 2.0 and is available on @huggingface huggingface.co/collections/an… - ModernBERT is documented on ArXiv: arxiv.org/abs/2412.13663 🔗Read the full blog article here: lighton.ai/lighton-blogs/… Congratulations to @antoine_chaffin @oskar_hallstrom @staghado @iacopo_poli and all the folks at Answer.ai that made this model a reality.
LightOn tweet media
English
0
23
36
2.9K
ParisMachineLearning retweetledi
LightOn
LightOn@LightOnIO·
🎉@LightOnIO introduces Mambaoutai 1.6B: A game-changer in AI! With the novel WSD learning rate scheduler & positional weighting, it sets new standards for efficiency & accuracy. Dive into the future of #AI_Language_Models with our latest blogpost➡️lighton.ai/fr/blog/blog-4…
English
2
19
58
22.7K
ParisMachineLearning retweetledi
Emmanuel Macron
Emmanuel Macron@EmmanuelMacron·
Mistral, LightOn, Shift Technology, Alan, Bioptimus, Google : ils sont de plus en plus nombreux à choisir la France pour innover en matière d’intelligence artificielle. Fierté. En investissant, nous faisons de la France un pays à la pointe de l’IA. Une IA le dit aussi !
Emmanuel Macron tweet media
Français
548
381
1.6K
484.3K
ParisMachineLearning retweetledi
Igor Carron
Igor Carron@IgorCarron·
Come as you are! We have 🚀8 job openings 🚀 at @LightOnIO across various teams, technical and business. Join us into making Enterprise LLMs the only thing! Full remote OK for tech positions. lighton.ai/en/jobs
GIF
English
1
11
15
5.6K
ParisMachineLearning retweetledi
Tony S.F.
Tony S.F.@tonysilveti·
If you're a student outside of France and you're interested in doing a PhD related to AI in France, consider applying to any of the projects listed here: adum.fr/psaclay/pten?f… Deadline is January 17th 2024.
English
2
9
24
4.9K
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
So big announcement: thanks to the generous support from @huggingface I am releasing the early modern ChatGPT, MonadGPT huggingface.co/spaces/Pclangl… Any question in English or French will be answered from the perspective of someone living between 1500 and 1750.
Alexander Doria tweet media
English
25
88
359
128.4K
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
24 hours in and the only answer is being blocked by @controlai lead @andreamiotti Miotti is also leading the AI safety startup Conjecture who has taken a lot of public stances on AI act and happens to be also funded by the same Anthropic investor. Probably worth a closer look.
Alexander Doria tweet media
English
1
1
7
451
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
I’m in contact with several journalists who have nearly the same questions. So you’re likely to have more opportunities to clarify.
English
2
1
8
756
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
6. What are your ties with the Effective Altruism movement?
English
1
1
6
543
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
5. Why are you seemingly involved in a coordinated campaign with other even less traceable actors (AI Safety Memes, for instance)?
English
1
1
6
366
ParisMachineLearning retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
4. Why do you run paid political ads with defamation intent? Are aware this is at least a controversial practice in the EU if not outright illegal? How do you fund them?
Alexander Doria tweet media
English
2
1
7
392