max

78 posts

max

@maxencelsb

💻 GitHub: https://t.co/hsxfYxPd8j 🤗 HF: https://t.co/HTLvRfgnoR

Paris, France شامل ہوئے Eylül 2024

247 فالونگ32 فالوورز

max@maxencelsb·14 Eki

@Thibolt_e un classique

Français

max ری ٹویٹ کیا

Maxime Labonne@maximelabonne·14 Eki

📚 Efficient Language Specialization for Small Language Models @maxencelsb and @SinoueG have released a preprint about their excellent work on fine-tuning small models in French. It shows a solid post-training pipeline to improve French performance while preserving English capabilities. → The Luth-SFT dataset combines 570k samples from translated English datasets (Tülu 3, OpenHermes) + a unique "Scholar" subset containing 30k samples (from French Baccalauréat and CPGE exams). → All five Luth models (350M to 1.7B parameters) achieve state-of-the-art French performance in their size categories, with absolute improvements up to +11.26% across six benchmarks compared to base models like Qwen3 and LFM2. → Merging the fine-tuned French model with its base version preserves multilingual abilities and boosts performance in both languages. → I can't help but notice that Luth-LFM2-1.2B beats Qwen3-1.7B on 4 out of 6 French tasks despite having 500M fewer parameters, and the pattern holds across other model sizes too. 👀 Fine-tuning models to boost a specific language has always been a very popular use case. This paper is super interesting because it provides an excellent recipe with data, training, and evals that works in practice. Great work!

English

121

11.7K

max@maxencelsb·25 Eyl

@liquidai Excited to share that our Luth-LFM2 models are now officially part of Liquid Nanos!! 🚀 Check it out here: huggingface.co/kurakurai/Luth…

English

Liquid AI@liquidai·25 Eyl

Introducing Liquid Nanos ⚛️ — a new family of extremely tiny task-specific models that deliver GPT-4o-class performance while running directly on phones, laptops, cars, embedded devices, and GPUs with the lowest latency and fastest generation speed. > model size: 350M to 2.6B > built on LFM2, our v2 efficient model architecture > perform competitively with models up to hundreds of times larger > enable core agentic tasks: precise data extraction, multilingual translation, tool call, math, and RAG. 1/n

English

143

1.2K

590.5K

max@maxencelsb·19 Eyl

@MaziyarPanahi For French 🫡: huggingface.co/kurakurai/Luth…

English

Maziyar PANAHI@MaziyarPanahi·19 Eyl

best open-source/weight llms under 2b size, ready? go!!!

English

1.2K

max@maxencelsb·14 Eyl

@eddybuild @Thibolt_e

QAM

max ری ٹویٹ کیا

Liquid AI@liquidai·26 Ağu

Meet Luth-LFM2. A French fine-tuned LFM2 instance designed by Maxence Lasbordes and Sinoué GAD, to enhance multilingual capabilities of LFM2! In this model class, Luth-LFM2 sets a new record in French instruction following, GPQA, MMLU and math.

English

121

1.1M

max@maxencelsb·26 Ağu

@maximelabonne Thanks Maxime!

English

126

max ری ٹویٹ کیا

Maxime Labonne@maximelabonne·26 Ağu

Really impressed by the French finetune of LFM2 made by two students. They created a solid post-training pipeline (FFT + merging) and open-sourced all the code and data. Amazing work by Sinoué Gad and Maxence Lasbordes!

English

211

13.3K

max ری ٹویٹ کیا

Qwen@Alibaba_Qwen·6 Ağu

🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following. 🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks. Both models are more aligned, more capable, and more context-aware. Huggingface： huggingface.co/Qwen/Qwen3-4B-… huggingface.co/Qwen/Qwen3-4B-… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

141

388

3.1K

350.2K

max ری ٹویٹ کیا

Audio and Speech Processing Papers@AudioAndSpeech·25 Haz

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices. arxiv.org/abs/2506.18035

English

233

max@maxencelsb·29 Haz

GitHub: github.com/MaxLSB/LeCarnet Models: huggingface.co/collections/Ma… Dataset: huggingface.co/datasets/MaxLS… Feel free to leave a star ⭐!

English

max@maxencelsb·29 Haz

✨ Sharing my most recent side project: LeCarnet, a synthetic dataset of 2M+ French children's stories generated with Mistral Large inspired by TinyStories. Implemented the data generation, training, and eval pipelines. Also trained 3 SLMs on the dataset: LeCarnet-3M/8M/21M.

English

524

max ری ٹویٹ کیا

Raphaël Sourty@raphaelsrty·4 Haz

I'm thrilled to announce the release of FastPlaid ! 🚀🚀 FastPlaid is a high-performance engine for multi-vector search, built from the ground up in Rust (with the help of Torch C++)⚡️ You can view FastPlaid as the counterpart of Faiss for multi vectors.

English

250

40.2K

max ری ٹویٹ کیا

Antoine Chaffin@antoine_chaffin·30 Nis

Among all those LLM releases, here is an important retrieval release: To overcome limitations of awesome ModernBERT-based dense models, today @LightOnIO is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate🚀

English

255

39.7K

max ری ٹویٹ کیا

Sam Altman@sama·31 Mar

TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to release our first open-weigh language model since GPT-2. we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do. before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release. we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above. we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.

English

979

1.4K

13.1K

3.6M

max@maxencelsb·19 Mar

Read the PPO paper. The idea of the clipping mechanism is so smart

English

max@maxencelsb·17 Mar

grok one shots everything its crazy

English

max@maxencelsb·17 Mar

Read the TRGPPO paper today, which addresses the exploration issue of PPO under poor policy initialization arxiv.org/abs/1901.10314

English

max@maxencelsb·17 Mar

@maharshii Sharing my latest project: FlashAttention-2 for Sliding Window in Triton. Don't hesitate to check it out !! github.com/MaxLSB/flash-a…

English

maharshi@maharshii·17 Mar

a short advice for cs students/people who want to take it: 1. build useful/helpful projects around the topics that you are really interested in. building or improving something from ground up can teach you a lot of things that you may overlook otherwise. 2. post about your projects on any platform where you can find like-minded people who are possibly interested in what you are doing. 3. sometimes you will see no “positive” results from posting about your projects which is okay. don’t lose faith. 4. the goal is to not become famous suddenly by posting, the goal is to create something that might actually help people in one way or another. everything else that comes along with it, is bonus. 5. don’t let fame get to your head. be yourself and don’t pretend to be someone you are not.

English

784

52.2K