max

78 posts

max banner
max

max

@maxencelsb

💻 GitHub: https://t.co/hsxfYxPd8j 🤗 HF: https://t.co/HTLvRfgnoR

Paris, France شامل ہوئے Eylül 2024
247 فالونگ32 فالوورز
max
max@maxencelsb·
@Thibolt_e un classique
Français
0
0
0
22
max ری ٹویٹ کیا
Maxime Labonne
Maxime Labonne@maximelabonne·
📚 Efficient Language Specialization for Small Language Models @maxencelsb and @SinoueG have released a preprint about their excellent work on fine-tuning small models in French. It shows a solid post-training pipeline to improve French performance while preserving English capabilities. → The Luth-SFT dataset combines 570k samples from translated English datasets (Tülu 3, OpenHermes) + a unique "Scholar" subset containing 30k samples (from French Baccalauréat and CPGE exams). → All five Luth models (350M to 1.7B parameters) achieve state-of-the-art French performance in their size categories, with absolute improvements up to +11.26% across six benchmarks compared to base models like Qwen3 and LFM2. → Merging the fine-tuned French model with its base version preserves multilingual abilities and boosts performance in both languages. → I can't help but notice that Luth-LFM2-1.2B beats Qwen3-1.7B on 4 out of 6 French tasks despite having 500M fewer parameters, and the pattern holds across other model sizes too. 👀 Fine-tuning models to boost a specific language has always been a very popular use case. This paper is super interesting because it provides an excellent recipe with data, training, and evals that works in practice. Great work!
Maxime Labonne tweet mediaMaxime Labonne tweet mediaMaxime Labonne tweet media
English
5
16
121
11.7K
Liquid AI
Liquid AI@liquidai·
Introducing Liquid Nanos ⚛️ — a new family of extremely tiny task-specific models that deliver GPT-4o-class performance while running directly on phones, laptops, cars, embedded devices, and GPUs with the lowest latency and fastest generation speed. > model size: 350M to 2.6B > built on LFM2, our v2 efficient model architecture > perform competitively with models up to hundreds of times larger > enable core agentic tasks: precise data extraction, multilingual translation, tool call, math, and RAG. 1/n
Liquid AI tweet media
English
36
143
1.2K
590.5K
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
best open-source/weight llms under 2b size, ready? go!!!
English
4
0
11
1.2K
max ری ٹویٹ کیا
Liquid AI
Liquid AI@liquidai·
Meet Luth-LFM2. A French fine-tuned LFM2 instance designed by Maxence Lasbordes and Sinoué GAD, to enhance multilingual capabilities of LFM2! In this model class, Luth-LFM2 sets a new record in French instruction following, GPQA, MMLU and math.
Liquid AI tweet media
English
19
24
121
1.1M
max ری ٹویٹ کیا
Maxime Labonne
Maxime Labonne@maximelabonne·
Really impressed by the French finetune of LFM2 made by two students. They created a solid post-training pipeline (FFT + merging) and open-sourced all the code and data. Amazing work by Sinoué Gad and Maxence Lasbordes!
Maxime Labonne tweet media
English
8
25
211
13.3K
max ری ٹویٹ کیا
Qwen
Qwen@Alibaba_Qwen·
🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following. 🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks. Both models are more aligned, more capable, and more context-aware. Huggingface: huggingface.co/Qwen/Qwen3-4B-… huggingface.co/Qwen/Qwen3-4B-… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
141
388
3.1K
350.2K
max
max@maxencelsb·
✨ Sharing my most recent side project: LeCarnet, a synthetic dataset of 2M+ French children's stories generated with Mistral Large inspired by TinyStories. Implemented the data generation, training, and eval pipelines. Also trained 3 SLMs on the dataset: LeCarnet-3M/8M/21M.
English
1
0
3
524
max ری ٹویٹ کیا
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
I'm thrilled to announce the release of FastPlaid ! 🚀🚀 FastPlaid is a high-performance engine for multi-vector search, built from the ground up in Rust (with the help of Torch C++)⚡️ You can view FastPlaid as the counterpart of Faiss for multi vectors.
English
11
39
250
40.2K
max ری ٹویٹ کیا
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Among all those LLM releases, here is an important retrieval release: To overcome limitations of awesome ModernBERT-based dense models, today @LightOnIO is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate🚀
Antoine Chaffin tweet media
English
9
58
255
39.7K
max ری ٹویٹ کیا
Sam Altman
Sam Altman@sama·
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to release our first open-weigh language model since GPT-2. we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do. before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release. we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above. we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.
English
979
1.4K
13.1K
3.6M
max
max@maxencelsb·
Read the PPO paper. The idea of the clipping mechanism is so smart
max tweet media
English
0
0
0
34
max
max@maxencelsb·
grok one shots everything its crazy
English
0
0
0
28
max
max@maxencelsb·
Read the TRGPPO paper today, which addresses the exploration issue of PPO under poor policy initialization arxiv.org/abs/1901.10314
max tweet media
English
0
0
0
26
maharshi
maharshi@maharshii·
a short advice for cs students/people who want to take it: 1. build useful/helpful projects around the topics that you are really interested in. building or improving something from ground up can teach you a lot of things that you may overlook otherwise. 2. post about your projects on any platform where you can find like-minded people who are possibly interested in what you are doing. 3. sometimes you will see no “positive” results from posting about your projects which is okay. don’t lose faith. 4. the goal is to not become famous suddenly by posting, the goal is to create something that might actually help people in one way or another. everything else that comes along with it, is bonus. 5. don’t let fame get to your head. be yourself and don’t pretend to be someone you are not.
English
28
32
784
52.2K
major tom
major tom@tailwiinder·
Of course let's remove the file that's causing issues, Claude. What a lovely idea.
major tom tweet media
English
173
194
6.6K
303.7K