Sabitlenmiş Tweet

R² is a widely used measure of fit, but for many analysts, it is just a number.
They believe high R² ➡ Good predictions.
This is not always true!
Now I will clarify. 🔽
R-squared measures how well the regression model fits the observed data.
To be more precise: It is the proportion of the variation in the dependent variable that is predictable from the independent variable.
It usually ranges from 0 to 1: (In rare cases it can be negative, I will explain this in another tweet)
R² = 0
The model does not explain any of the variability in the dependent variable ➡ No predictive power ➡ Bad model.
R² = 1
The model perfectly explains all the variability in the dependent variable ➡ Perfect fit to the data ➡ Good model if not overfitted and has predictive power.
A high R-squared value does not mean that the predictions made by the model will be correct.
It doesn't measure predictability power, it measures how well the model fits!
In the example below, we compare the mean of the data to a fitted line.
Of course, the mean of values is not a good fit ➡ the errors are large.
On the other hand, the fitted line has smaller errors ➡ The R² will be close to 1.
To calculate R² we need:
- The total sum of squares for the mean
- Sum of squares for the residuals from the model
- Finally, subtract the ratio from 1
___
That's it for today.
I hope you've found this Tweet helpful.
Like/Retweet for support and follow @levikul09 for more Data Science content.
Thanks 😉

English












