max
78 posts

max
@maxencelsb
💻 GitHub: https://t.co/hsxfYxPd8j 🤗 HF: https://t.co/HTLvRfgnoR
Paris, France شامل ہوئے Eylül 2024
247 فالونگ32 فالوورز
max ری ٹویٹ کیا

📚 Efficient Language Specialization for Small Language Models
@maxencelsb and @SinoueG have released a preprint about their excellent work on fine-tuning small models in French. It shows a solid post-training pipeline to improve French performance while preserving English capabilities.
→ The Luth-SFT dataset combines 570k samples from translated English datasets (Tülu 3, OpenHermes) + a unique "Scholar" subset containing 30k samples (from French Baccalauréat and CPGE exams).
→ All five Luth models (350M to 1.7B parameters) achieve state-of-the-art French performance in their size categories, with absolute improvements up to +11.26% across six benchmarks compared to base models like Qwen3 and LFM2.
→ Merging the fine-tuned French model with its base version preserves multilingual abilities and boosts performance in both languages.
→ I can't help but notice that Luth-LFM2-1.2B beats Qwen3-1.7B on 4 out of 6 French tasks despite having 500M fewer parameters, and the pattern holds across other model sizes too. 👀
Fine-tuning models to boost a specific language has always been a very popular use case. This paper is super interesting because it provides an excellent recipe with data, training, and evals that works in practice. Great work!



English

@liquidai Excited to share that our Luth-LFM2 models are now officially part of Liquid Nanos!! 🚀
Check it out here: huggingface.co/kurakurai/Luth…
English

Introducing Liquid Nanos ⚛️ — a new family of extremely tiny task-specific models that deliver GPT-4o-class performance while running directly on phones, laptops, cars, embedded devices, and GPUs with the lowest latency and fastest generation speed.
> model size: 350M to 2.6B
> built on LFM2, our v2 efficient model architecture
> perform competitively with models up to hundreds of times larger
> enable core agentic tasks: precise data extraction, multilingual translation, tool call, math, and RAG. 1/n

English
max ری ٹویٹ کیا
max ری ٹویٹ کیا
max ری ٹویٹ کیا

🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready!
🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following.
🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks.
Both models are more aligned, more capable, and more context-aware.
Huggingface:
huggingface.co/Qwen/Qwen3-4B-…
huggingface.co/Qwen/Qwen3-4B-…
ModelScope:
modelscope.cn/models/Qwen/Qw…
modelscope.cn/models/Qwen/Qw…

English
max ری ٹویٹ کیا

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices. arxiv.org/abs/2506.18035
English

GitHub: github.com/MaxLSB/LeCarnet
Models: huggingface.co/collections/Ma…
Dataset: huggingface.co/datasets/MaxLS…
Feel free to leave a star ⭐!
English
max ری ٹویٹ کیا
max ری ٹویٹ کیا

Among all those LLM releases, here is an important retrieval release:
To overcome limitations of awesome ModernBERT-based dense models, today @LightOnIO is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate🚀

English
max ری ٹویٹ کیا

TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee…
we are excited to make this a very, very good model!
__
we are planning to release our first open-weigh language model since GPT-2.
we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do.
before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release.
we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above.
we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.
English

Read the TRGPPO paper today, which addresses the exploration issue of PPO under poor policy initialization
arxiv.org/abs/1901.10314

English

@maharshii Sharing my latest project:
FlashAttention-2 for Sliding Window in Triton. Don't hesitate to check it out !!
github.com/MaxLSB/flash-a…
English

a short advice for cs students/people who want to take it:
1. build useful/helpful projects around the topics that you are really interested in. building or improving something from ground up can teach you a lot of things that you may overlook otherwise.
2. post about your projects on any platform where you can find like-minded people who are possibly interested in what you are doing.
3. sometimes you will see no “positive” results from posting about your projects which is okay. don’t lose faith.
4. the goal is to not become famous suddenly by posting, the goal is to create something that might actually help people in one way or another. everything else that comes along with it, is bonus.
5. don’t let fame get to your head. be yourself and don’t pretend to be someone you are not.
English






