Matteo Latinov

63 posts

Matteo Latinov

@LatinovMatteo

Katılım Ağustos 2020

33 Takip Edilen7 Takipçiler

Matteo Latinov retweetledi

Mark Tenenholtz@marktenenholtz·7 Ara

21 thoughts any data scientist should read to succeed in the real world (that I learned the hard way): 1. Go talk to the person that will use the model before you start building it. You'll waste a lot less time. (read on)

English

371

Matteo Latinov@LatinovMatteo·11 Kas

@svpino I like random forests as well, more than decision trees which I'd say can easily overfit the data

English

Santiago@svpino·11 Kas

There are thousands of machine learning algorithms, but you'll rarely need more than a handful. A good start: • Linear/Logistic Regression • Decision Trees • Neural Networks • XGBoost • KNN • K-Means • PCA Would you add anything?

English

235

1.6K

Matteo Latinov@LatinovMatteo·22 Eki

@marktenenholtz Good resources for learning timeseries forecasting?

English

Mark Tenenholtz@marktenenholtz·22 Eki

At least 60% of creating great time-series forecasting models is strong data wrangling skills. Target leakage, bugs in features, etc. are far too common. Time-series data is really hard to get right.

English

211

Matteo Latinov@LatinovMatteo·7 Eki

@marktenenholtz Thoughts on mutual information for feature selection?

English

Mark Tenenholtz@marktenenholtz·6 Eki

TL;DR: 1. Correlated features 2. Different model, different insights 3. Model agnostic 4. Negative feature importances 5. Strong generalization Follow me @marktenenholtz for more high-signal ML content!

English

Mark Tenenholtz@marktenenholtz·6 Eki

Most data scientists use linear/logistic regression to figure out which features are important in a dataset. I almost never do this. Instead, I generally use leave-one-out feature importance (LOFO) + LightGBM. Here's why:

English

439

2.7K

Matteo Latinov@LatinovMatteo·2 Eki

@marktenenholtz Problem decomposition

English

Mark Tenenholtz@marktenenholtz·2 Eki

Advice that changed my life: Always work backwards from the problem. Not forwards from a solution. Try to force a solution onto a problem and you might succeed 1% of the time. Working backwards from the problem means you nearly always craft an ideal solution.

English

181

Matteo Latinov retweetledi

Roger Federer@rogerfederer·15 Eyl

To my tennis family and beyond, With Love, Roger

English

24.8K

130K

707.2K

Matteo Latinov retweetledi

Jordan Feigenbaum, M.D.@Jordan_theCoach·5 Ağu

Fear of injury is a major reason people don’t lift weights despite a very low injury incidence of 2-4 injuries/1000 hours. People on social media who portray themselves as experts raising concerns of injury due to technique despite the lack of evidence are part of the problem.

English

264

Matteo Latinov@LatinovMatteo·15 Tem

@xLaszlo Yes absolutely, domain specific names for the features. I see, so you could break up the dataset and still have the various features (now belonging to different classes) linked via the common id of the entity (row in this case)

English

Laszlo Sragner@xLaszlo·15 Tem

@LatinovMatteo Yes, why not. But I would definitely use domain specific names. Also check if the class can be decomposed into more logical parts and move them to their own class. Usage should be an indicator to that.

English

Laszlo Sragner@xLaszlo·15 Oca

Created with @carbon_app

English

Matteo Latinov@LatinovMatteo·15 Tem

@xLaszlo how would you manage a dataclass with 100 attributes (e.g. dataset with 100 features)? Would you explicitly list all features in the dataclass and extract/assign each with .itertuples()? So in this example: engagements, replies, ... , feature100

English

Laszlo Sragner@xLaszlo·15 Oca

There is a number of thing happening here: • dataclass Alternatively, you can write a "normal" python class or use pydantic if you need extra checks on the created objects. You gain the benefit of having a well-defined data model that you can validate at construction.

English

Matteo Latinov retweetledi

Santiago@svpino·12 Tem

One of the most common problems in machine learning: How do you deal with imbalanced datasets? Not only does this happen frequently, but it's also a popular interview question. Here are seven different techniques to deal with this problem: 1 of 14

English

407

1.9K

Matteo Latinov retweetledi

Mark Tenenholtz@marktenenholtz·20 Haz

I start every ML project by creating these files, more or less in this order: 1. .gitignore/README.md 2. eda.ipynb 3. data_notes.md Days later... 4. baseline.(py/ipynb) 5. train.py 6. validation.py 7. error_analysis.ipynb Structure -> creativity

English

131

955

Matteo Latinov retweetledi

Barbell Medicine@BarbellMedicine·12 Haz

#Sunday💤day - Check out our latest #podcast on #OSA! An excellent complement to our other work covering this incredibly important topic! open.spotify.com/episode/64T5fS… pic.twitter.com/V1C6PP5tPE

English

Matteo Latinov retweetledi

Mark Tenenholtz@marktenenholtz·1 Haz

Last day to get this for $15! Goes up to $25 tomorrow! twitter.com/marktenenholtz…

Mark Tenenholtz@marktenenholtz

"The Autonomous Data Scientist" is LIVE! Get it for $15 and you'll learn how to make $150k+ to work on some of the most exciting problems out there in data science. All for less than you'll pay for coffee this month. This price is only good until Thursday (it's going up)!

English

Matteo Latinov@LatinovMatteo·1 Haz

@marktenenholtz Really enjoyed the course! Inspirational and straight to the point. I'll be looking to apply this framework from here on out. Well worth the money IMHO. 5 stars were given😀

English

Mark Tenenholtz@marktenenholtz·31 May

English

Matteo Latinov retweetledi

Barbell Medicine@BarbellMedicine·29 May

🔥#SundayFunday🔥 Burning heel pain got you down? It might be #plantar #fasciitis! Read this quick and informative article from @DMilesPT who educates us on heel pain, how to #rehab and how to #recover! barbellmedicine.com/blog/guide-to-…

English

Matteo Latinov retweetledi

Michael Nielsen@michael_nielsen·29 Oca

The use of spaced repetition memory systems has changed my life over the past couple of years. Here's a few things I've found helpful:

English

576

2.6K

Matteo Latinov retweetledi

Jeremy Howard@jeremyphoward·22 May

This is a great rule from Google's machine learning guides: developers.google.com/machine-learni…

English

142

1.1K

Matteo Latinov retweetledi

Barbell Medicine@BarbellMedicine·15 May

💤#SundayFunday💤 Today we share a ⭐NEW⭐ article from @AustinBaraki featuring @NateGordonMD on the important topic of #SleepApnea: An incredibly common condition which can have serious consequences but is fortunately, very treatable! Check it out! 👇👇 barbellmedicine.com/blog/a-basic-g…

English

Matteo Latinov retweetledi

Mark Tenenholtz@marktenenholtz·10 May

My 6 step pipeline for training tabular models: 1. EDA 2. Simple baseline 3. Create+test evaluation setup 4. Simple model with simple features 5. Error analysis 6. Feature engineering Repeat 5+6 until you run out of time!

English

463

Matteo Latinov@LatinovMatteo·5 May

@jeremyphoward Thank you! Any chance fastai will be available on Amazon Sagemaker Studio Lab in the near future?

English

Jeremy Howard@jeremyphoward·5 May

Do you use Google Colab and fastai? If so, here's a nifty trick to automatically load and save all models and data to Google Drive: colab.research.google.com/drive/1cyvP1-3…

English

133

709

Keşfet

@svpino @marktenenholtz @xLaszlo @carbon_app @elonmusk @BarackObama @taylorswift13 @cristiano