Sylvain Lesage

18 posts

Sylvain Lesage banner
Sylvain Lesage

Sylvain Lesage

@severo_dev

Dataviz freelance developer. Part-time 🤗 @huggingface (dataset viewer) GitHub : https://t.co/IBRvyeYaGI

शामिल हुए Kasım 2013
43 फ़ॉलोइंग61 फ़ॉलोवर्स
पिन किया गया ट्वीट
Sylvain Lesage
Sylvain Lesage@severo_dev·
As 📷 xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. 👀 🤓 IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.
Sylvain Lesage tweet media
English
3
0
2
1.4K
Sylvain Lesage
Sylvain Lesage@severo_dev·
Give me some time 🤗🤗
Sylvain Lesage tweet media
English
2
1
2
505
Sylvain Lesage
Sylvain Lesage@severo_dev·
I 🙏 at the altar of stamina. "(Stamina is) the ability to chip away at goals despite a lack of visible progress. To hold focus and presence in a world incentivized to distract you. To stay patient. To be on time. To push through difficult material. To follow instructions or proceed without them."
English
1
0
1
935
Sylvain Lesage
Sylvain Lesage@severo_dev·
See that purple banner on the Llama 4 models? It's Xet storage, and this is actually huge for anyone building with AI models. Real numbers: ~25% deduplication on Llama 4 models, hitting ~40% for finetunes
Sylvain Lesage tweet media
English
0
0
0
535
Sylvain Lesage
Sylvain Lesage@severo_dev·
The Lovelace 2.0 Test of Artificial Creativity and Intelligence “tell a story in which a boy falls in love with a girl, aliens abduct the boy, and the girl saves the world with the help of a talking cat.” Change that 🐱 to a 🐶 and I'd read that story any day. arxiv.org/abs/1410.6142
English
0
0
0
300
Sylvain Lesage रीट्वीट किया
Ihor Stepanov
Ihor Stepanov@ihor_step·
I just finished my small experiments comparing different encoder models on retrieval tasks. The goal was to check whether MLM is better than RTD for these tasks. I compared Electra's small models, both generator and discriminator, that have the same size. Additionally, it was tested DeBERTa v1, which was pre-trained with MLM and DeBERTa v3, which was pre-trained with RTD on a two times larger dataset. As a baseline ModernBERT was evaluated as well. Models were fine-tuned on 500k examples from the MS-MARCO dataset (huggingface.co/datasets/sente…). For benchmarking, the NanoBEIR evaluator was used. You can see the average ndcg@10 plotted below. It's clearer that MLM-trained models produce better discriminative features, however, more detailed experiments are needed for more accurate conclusions. @antoine_chaffin @tomaarsen
Ihor Stepanov tweet media
English
5
4
27
14.7K
Sylvain Lesage
Sylvain Lesage@severo_dev·
@ClementDelangue Absolutely, the transformer architecture has become a cornerstone for model definition
English
0
0
1
103
Sylvain Lesage रीट्वीट किया
clem 🤗
clem 🤗@ClementDelangue·
So cool to see transformers becoming the source of truth for model definition & collaborating with wonderful partners like vLLM to have these models run everywhere the fastest! As a model builder, it means that you integrate with Hugging Face & instantly get hundreds of integrations out of the box. Time to accelerate AI, one integration at a time!
clem 🤗 tweet media
English
13
17
154
20.6K
Sylvain Lesage रीट्वीट किया
jade
jade@jadechoghari·
Robotics simulation is how we teach robots to act smart—before they ever touch the real world. It’s physical AI: simulating Newtonian physics so robots can grip, move, and learn. A new simulator is coming to @huggingface @LeRobotHF 🤗. Stay tuned 👀😉
English
5
21
180
35.3K
Sylvain Lesage
Sylvain Lesage@severo_dev·
Come find the many BERT islands. Or see how datasets relate in practice, not just in theory. See how libraries or tasks can tie repositories together. You can play around with node size using storage/likes/downloads too. The result is a super fun visualization from @saba9 and @znation that I’ve already lost way too much time to. I'm excited to see how the networks grow as we add more repositories! xet-team/repo-graph
English
0
0
0
138
Sylvain Lesage
Sylvain Lesage@severo_dev·
Because of this, different repositories can share bytes we store. That opens up something cool - we can draw a graph of which repos actually share data at the chunk level, where: - Nodes = repositories - Edges = shared chunks - Edge thickness = how much they overlap
Sylvain Lesage tweet media
English
1
0
0
175
Sylvain Lesage
Sylvain Lesage@severo_dev·
As 📷 xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. 👀 🤓 IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.
Sylvain Lesage tweet media
English
3
0
2
1.4K
Sylvain Lesage रीट्वीट किया
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
🚨 SkyReels just launched! The world’s first open-source video generation platform supporting unlimited duration 🔥 All-in-one creator toolkit: - Consistent high-quality video (LoRA ready) - Fast gen, amazing output. - Amazing facial expressions . Plus text-to-film agent handles everything: script, character, storyboards, full AV gen, auto-edit . it's Wild! Step by step tutorial 👇
English
35
119
695
81K