Embarking on a 100-Day Challenge. Starting from 30th December.
This time, it's fundamentally different. We @TensorThrottleX, @BinaryBlaze16, @CodeAyushD committing to unflinching transparency, no curated highlights, no polished outcomes. Only the raw work itself.
Day 311 : DataScience Journey
Random Forest is essentially an improved version of Decision Trees designed to solve their biggest weakness : overfitting. A single decision tree tends to memorize the training data, which means it has low bias but high variance and performs poorly
Started from 0.
No audience. No support. Just consistency.
I’m building my coding YouTube channel from scratch 💻
If you believe in growth, support me here👇
@technicallauncher2192" target="_blank" rel="nofollow noopener">youtube.com/@technicallaun…
One day this will be BIG 🚀
#buildinpublic#coding#startup
Day 310 : DataScience Journey
Worked through a full Titanic pipeline from messy real-world data to a functioning Random Forest model and the interesting part wasn’t just the model, but the small practical issues along the way.Handling missing values
Day 309 : DataScience Journey
After the data, The main effort went into preprocessing: removing irrelevant features, handling missing values using median and mode focused on understanding its structure identifying missing values, checking feature types, and observing how the
Day 308 : DataScience Journey
While training models using bagging (like Random Forest), not every data point is used in building each tree.
On average, about one-third of the data remains unused for a given tree these are called out-of-bag samples.
Day 307 : DataScience Journey
So a single decision tree… it tends to overfit pretty easily and can give unstable results.
Bagging basically fixes this by training a bunch of trees on slightly different parts of the data and then combining their outputs. Since each tree sees a
Day 306 : DataScience Journey
Gradient Boosting works by building models step by step, where each new tree tries to fix the mistakes made by the previous ones.
Initially, a simple model is trained on the data.Then, we calculate the residuals (errors)
Day 305 : DataScience Journey
Instead of depending on a single algorithm, we combine multiple models like Logistic Regression, Random Forest, and SVM. Each one has its own way of making mistakes, so when we combine them using a Voting Classifier, the final result becomes more
Day 304 : DataScience Journey
Bagging and Pasting in Scikit-Learn are powerful ensemble techniques that improve model performance by reducing variance and enhancing generalization. Instead of relying on a single model, they train multiple models (typically Decision Trees) on diff
Day 303 : DataScience Journey
Random Patches and Random Subspaces are techniques used to increase diversity in ensemble models. Instead of training each model on the full dataset, we randomly sample either data points, features, or both. When both samples and features are random