skrub

69 posts

skrub banner
skrub

skrub

@skrub_data

Prepping tables for machine learning

Inscrit le Nisan 2023
9 Abonnements312 Abonnés
skrub retweeté
dotConferences
dotConferences@dotConferences·
Now on stage at #dotAI2025 @GaelVaroquaux speaking about « Machine-learners should cross-validate, and use scrub’s DataOps »
dotConferences tweet media
English
0
3
4
1.9K
skrub retweeté
Gael Varoquaux 🦋
Gael Varoquaux 🦋@GaelVaroquaux·
One of my collaborator sending me a @skrub_data TableReport as an HTML file, with which I can interact, and explore the data, to give him feedback. Ideal workflow, as far as I am concerned: async, yet interactive, and not needing an infrastructure
English
1
3
13
883
skrub retweeté
:probabl.
:probabl.@probabl_ai·
With skore v0.10, you now have a data accessor in the EstimatorReport! It consists in a @skrub_data TableReport that allows you to interactively explore your data and gain precious insights before your modelling! 🎬 Check out our short demo video: eu1.hubs.ly/H0mhFMN0
English
0
3
9
1.6K
skrub retweeté
:probabl.
:probabl.@probabl_ai·
(Re)-watch our session at @PyData Milan in March 2025 where we discussed the latest developments in the @scikit_learn ecosystem: eu1.hubs.ly/H0m9cHw0 We explore what scikit-learn allows you to do and introduce powerful tools like @skrub_data, skops, and skore.
English
1
7
12
1.6K
skrub retweeté
:probabl.
:probabl.@probabl_ai·
@PyData @scikit_learn @skrub_data Timeline: 0:00: Intro of PyData Milan 7:30: Presentations of speakers 9:25: What scikit-learn allows you to do 21:15: skrub - less wrangling, more machine learning 32:54: skops - scikit-learn models in production 43:51: skore - an abstraction to ease data science projects
English
0
3
1
819
skrub retweeté
:probabl.
:probabl.@probabl_ai·
🎤 Next week, our product engineer Marie Sacksick will be presenting how to extend scikit-learn with skore, but also with skrub and skops. Thanks Pyladies Paris for this opportunity! To book your seat: eu1.hubs.ly/H0j-Qpq0
English
0
3
4
1.6K
skrub retweeté
:probabl.
:probabl.@probabl_ai·
For this recipe, you will need: - 4 open source libraries, - 3 vibrant colors, - 2 enthusiastic speakers, - 1 welcoming host, Mix it all, expose to some Milan's sun, and you will get... a talk on @scikit_learn, @skrub_data, skops, and skore, by @glemaitre58 and @MarieSacksick.
:probabl. tweet media
English
1
5
16
1.5K
skrub
skrub@skrub_data·
🎉⚡️Release 0.5.1: ◼ Encode strings faster and better with StringEncoder! StringEncoder applies a tf-idf vectorization followed by SVD to produce high quality and FAST embeddings of textual and categorical features. skrub-data.org/stable/referen…
skrub tweet media
English
0
2
11
2.2K
skrub
skrub@skrub_data·
There is much more: skrub.patch_display() adds the TableReport as a default representation for all dataframes skrub.column_association to check which columns are linked... Check out the changelog: skrub-data.org/stable/CHANGES… 5/5
skrub tweet media
English
0
1
4
876
skrub
skrub@skrub_data·
Improved TableReport: ◼ tighter layout ◼ support any script (any alphabet حب माया) in the plots ◼ robust to outliers It works without dependencies, in any html-based environment (@ProjectJupyter, @code, a simple web page...) Check it out on skrub-data.org 4/5
skrub tweet media
English
1
2
6
889
skrub
skrub@skrub_data·
🎉⚡️Release 0.4: ◼ Easily use deep learning for text entries ◼ TableVectorizer can remove columns with too many missing values ◼ TableReport more robust and prettier ... 1/5
skrub tweet media
English
1
7
16
1.5K
skrub retweeté
:probabl.
:probabl.@probabl_ai·
Some ensemble models do not support sparse features, but there is a hashing trick (via the MinHashEncoder in skrub!) that totally circumvents that issue for text/dirty category data. Details/full explainer just went live here: eu1.hubs.ly/H0dV3RC0
English
0
2
3
469
skrub
skrub@skrub_data·
Skrub is on bluesky 🦋. It's fun there
English
1
1
4
677