Matteo Latinov

63 posts

Matteo Latinov

Matteo Latinov

@LatinovMatteo

Katılım Ağustos 2020
33 Takip Edilen7 Takipçiler
Matteo Latinov retweetledi
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
21 thoughts any data scientist should read to succeed in the real world (that I learned the hard way): 1. Go talk to the person that will use the model before you start building it. You'll waste a lot less time. (read on)
English
6
68
371
0
Matteo Latinov
Matteo Latinov@LatinovMatteo·
@svpino I like random forests as well, more than decision trees which I'd say can easily overfit the data
English
0
0
0
0
Santiago
Santiago@svpino·
There are thousands of machine learning algorithms, but you'll rarely need more than a handful. A good start: • Linear/Logistic Regression • Decision Trees • Neural Networks • XGBoost • KNN • K-Means • PCA Would you add anything?
English
99
235
1.6K
0
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
At least 60% of creating great time-series forecasting models is strong data wrangling skills. Target leakage, bugs in features, etc. are far too common. Time-series data is really hard to get right.
English
7
17
211
0
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
TL;DR: 1. Correlated features 2. Different model, different insights 3. Model agnostic 4. Negative feature importances 5. Strong generalization Follow me @marktenenholtz for more high-signal ML content!
English
2
2
70
0
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
Most data scientists use linear/logistic regression to figure out which features are important in a dataset. I almost never do this. Instead, I generally use leave-one-out feature importance (LOFO) + LightGBM. Here's why:
English
59
439
2.7K
0
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
Advice that changed my life: Always work backwards from the problem. Not forwards from a solution. Try to force a solution onto a problem and you might succeed 1% of the time. Working backwards from the problem means you nearly always craft an ideal solution.
English
9
18
181
0
Matteo Latinov retweetledi
Roger Federer
Roger Federer@rogerfederer·
To my tennis family and beyond, With Love, Roger
English
24.8K
130K
707.2K
0
Matteo Latinov retweetledi
Jordan Feigenbaum, M.D.
Jordan Feigenbaum, M.D.@Jordan_theCoach·
Fear of injury is a major reason people don’t lift weights despite a very low injury incidence of 2-4 injuries/1000 hours. People on social media who portray themselves as experts raising concerns of injury due to technique despite the lack of evidence are part of the problem.
English
10
70
264
0
Matteo Latinov
Matteo Latinov@LatinovMatteo·
@xLaszlo Yes absolutely, domain specific names for the features. I see, so you could break up the dataset and still have the various features (now belonging to different classes) linked via the common id of the entity (row in this case)
English
0
0
0
0
Laszlo Sragner
Laszlo Sragner@xLaszlo·
@LatinovMatteo Yes, why not. But I would definitely use domain specific names. Also check if the class can be decomposed into more logical parts and move them to their own class. Usage should be an indicator to that.
English
1
0
0
0
Matteo Latinov
Matteo Latinov@LatinovMatteo·
@xLaszlo how would you manage a dataclass with 100 attributes (e.g. dataset with 100 features)? Would you explicitly list all features in the dataclass and extract/assign each with .itertuples()? So in this example: engagements, replies, ... , feature100
English
1
0
0
0
Laszlo Sragner
Laszlo Sragner@xLaszlo·
There is a number of thing happening here: • dataclass Alternatively, you can write a "normal" python class or use pydantic if you need extra checks on the created objects. You gain the benefit of having a well-defined data model that you can validate at construction.
English
2
0
0
0
Matteo Latinov retweetledi
Santiago
Santiago@svpino·
One of the most common problems in machine learning: How do you deal with imbalanced datasets? Not only does this happen frequently, but it's also a popular interview question. Here are seven different techniques to deal with this problem: 1 of 14
English
41
407
1.9K
0
Matteo Latinov retweetledi
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
I start every ML project by creating these files, more or less in this order: 1. .gitignore/README.md 2. eda.ipynb 3. data_notes.md Days later... 4. baseline.(py/ipynb) 5. train.py 6. validation.py 7. error_analysis.ipynb Structure -> creativity
English
16
131
955
0
Matteo Latinov
Matteo Latinov@LatinovMatteo·
@marktenenholtz Really enjoyed the course! Inspirational and straight to the point. I'll be looking to apply this framework from here on out. Well worth the money IMHO. 5 stars were given😀
English
1
0
2
0
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
"The Autonomous Data Scientist" is LIVE! Get it for $15 and you'll learn how to make $150k+ to work on some of the most exciting problems out there in data science. All for less than you'll pay for coffee this month. This price is only good until Thursday (it's going up)!
Mark Tenenholtz tweet media
English
7
3
66
0
Matteo Latinov retweetledi
Michael Nielsen
Michael Nielsen@michael_nielsen·
The use of spaced repetition memory systems has changed my life over the past couple of years. Here's a few things I've found helpful:
English
65
576
2.6K
0
Matteo Latinov retweetledi
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
My 6 step pipeline for training tabular models: 1. EDA 2. Simple baseline 3. Create+test evaluation setup 4. Simple model with simple features 5. Error analysis 6. Feature engineering Repeat 5+6 until you run out of time!
English
18
60
463
0
Matteo Latinov
Matteo Latinov@LatinovMatteo·
@jeremyphoward Thank you! Any chance fastai will be available on Amazon Sagemaker Studio Lab in the near future?
English
1
0
1
0