Bhaskar Aryal

249 posts

Bhaskar Aryal banner
Bhaskar Aryal

Bhaskar Aryal

@aryalbhaskar

An Engineer, Interested in World History and Philosophy !

Kathmandu, Nepal Katılım Ekim 2014
1.3K Takip Edilen156 Takipçiler
Bhaskar Aryal retweetledi
UMPH AI
UMPH AI@__umph__ai·
UMPH Phase 1 is live. Scan or upload a selfie and get your skin score, 7 skin metrics, your skin age, and a personalized skincare and treatment plan in under 60 seconds. We're starting with skin analysis. Next phases will turn UMPH into a full skin health journey: • Mobile app with weekly skin tracking • Your personal AI skin advisor • Discover clinics and book treatments directly • Build custom treatment stacks with up to 50% OFF To celebrate the launch: the first 50 people who like, comment, and retweet get free scan coupons. See your score at umph.ai
English
13
18
26
663
Bhaskar Aryal retweetledi
MIT CSAIL
MIT CSAIL@MIT_CSAIL·
9 distance measures in data science w/algorithms (v/@MaartenGr).
MIT CSAIL tweet media
English
10
284
1.8K
95.9K
Bhaskar Aryal retweetledi
Joachim Schork
Joachim Schork@JoachimSchork·
As your data grows, so do the challenges of interpreting p-values effectively! With large data sets, p-values can often become misleading, as even negligible effects may appear statistically significant. This happens because larger samples reduce the margin of error, narrowing confidence intervals and amplifying the apparent importance of small differences. The key is distinguishing statistical significance from practical importance. Key Takeaways: ✅ Larger sample sizes reduce the margin of error, increasing the likelihood of small effects appearing significant. ✅ In regression models, all variables may show significant p-values, limiting their interpretive value. What can you do? 1️⃣ Focus on effect sizes: Evaluate coefficients and confidence intervals for practical relevance. 2️⃣ Standardize variables: Scaling your data allows for better comparison of variable importance based on coefficients. 3️⃣ Use complementary metrics: Incorporate adjusted R² and other model performance measures for a more robust evaluation. The visualization originally shared by Serhat Simsek clearly demonstrates how confidence intervals shrink and p-values drop with increasing sample sizes. Thanks to Serhat for sharing this valuable content and inspiring the idea behind my post. #datascience #statistics
GIF
English
1
31
180
7K
Bhaskar Aryal retweetledi
Eyo Eyo, PhD
Eyo Eyo, PhD@Eyowhite3·
Key aspects of Exploratory Data Analysis (EDA) beginners should follow.
Eyo Eyo, PhD tweet media
English
5
168
1.2K
93.4K
Deepak Raj Joshi
Deepak Raj Joshi@Deepakrajjoshi7·
Excited to share that I've started my new role as Assistant Professor of Precision Ag and Extension Specialist at Kansas State University! Looking forward to all the opportunities and challenges ahead. Let's make great things happen! #NewBeginnings #KansasStateUniversity
Deepak Raj Joshi tweet media
English
40
11
378
16.9K
Andrew Bolis
Andrew Bolis@AndrewBolis·
My AI side hustle makes me $4,500 every week. I have created a guide to help you start one and make $300 everyday. Usually, I'd charge $100 for this, but today I'm giving it away for FREE Like + comment "Send" & I'll DM it to you (Must be following me)
Andrew Bolis tweet media
English
10.9K
1.1K
10.4K
1.5M
Bhaskar Aryal retweetledi
Matt Dancho (Business Science)
Bayesian data analysis is a fundamental concept in data science. But it took me 2 years to understand its importance. In 2 minutes, I'll share my best findings over the last 2 years exploring Bayesian Modeling. Let's go. 1. Why Bayesian Data Analysis? Bayesian modeling is a powerful tool in statistics and data science, especially where traditional approaches fall short. It avoids arbitrary assumptions and provides distributions of possible values instead of just point estimates. 2. Bayes Theorem: Bayesian modeling is based on Bayes’ theorem. Bayes' Theorem provides a mathematical formula to update the probability for a hypothesis as more evidence or information becomes available. It describes how to revise existing predictions or theories in light of new evidence, a process known as Bayesian inference. 3. Simplification of Bayes’ Theorem: Since X (data) is not dependent on θ (the model) and can be hard to calculate, Bayes’ theorem is often simplified to P(θ|X) ∝ P(X|θ) × P(θ), meaning the posterior distribution is proportional to the likelihood times the prior. 4. From Bayesian Theorem to Bayesian Modeling: Bayes’ Theorem provides a process for constructing a Bayesian model. Combining key ingredients: Likelihood and Prior distributions to produce Posterior Distributions. 5. Calculating the Posterior Distribution: There are two main methods: direct calculation using complex equations, and simulation methods which create samples from the posterior distribution for summarizing information about parameters. Many software programs like PyMC, Brms, and Stan use sampling methods such as Markov Chain Monte Carlo (MCMC). 6. Advantages of Bayesian: The Bayesian approach allows for direct inclusion of prior knowledge, transparency in modeling steps, and provides broad information about the problem, including risks, uncertainty, and variability. 7. Business Cases: Any time knowledge of uncertainty is a business requirement, Bayesian modeling can benefit the business. This includes Demand Forecasting, Pricing Strategy Optimization, Customer Analysis, Credit Scoring and Financial Modeling. Businesses need to know not only a point estimate but the risk or confidence of the prediction. === Need help applying data science to business? I'd like to help. Here's how: 👉 My Free 5-Day Course (How to Solve Business Problems with Data Science): learn.business-science.io/free-solve-bus… 👉 Learn ChatGPT for Data Science (Live Workshop): learn.business-science.io/registration-c… If you like this post, please reshare ♻️ it so others can get value (follow me, 🔥 Matt Dancho 🔥 for more data science concepts).
Matt Dancho (Business Science) tweet media
English
7
474
1.9K
118.3K
Bhaskar Aryal retweetledi
Matt Dancho (Business Science)
When I was first exposed to the Confusion Matrix, I was lost. And there was a HUGE mistake I was making with False Negatives. It took me 5 years to fix it. I'll teach you in 5 minutes. Let's dive in. 1. A confusion matrix is a tool often used in machine learning to visualize the performance of a classification model. It's a table that allows you to compare the model's predictions against the actual values. 2. Correct Predicitions: True Positives (TP): These are cases in which the model correctly predicts the positive class. True Negatives (TN): These are cases in which the model correctly predicts the negative class. 3. Model Errors: False Positives (FP, Type I Error): These are cases in which the model incorrectly predicts the positive class. False Negatives (FN, Type II): These are cases in which the model incorrectly predicts the negative class. 4. My Big Mistake: In machine learning we're taught to optimize for model performance. I listened. I said OK, let's optimize for F1 Score. That's the gold standard right? 5. The Problem with F1 Score: The problem with F1 is that it weights False Positives (Type 1) and False Negatives (Type 2) Errors equally. But in business this is RARELY the case. False Negatives are normally 10X to 100X more costly to a business like Netflix. Let me explain. 6. Why minimizing False Positives is worth LESS: If a predictive model (False Positive) incorrectly predicts a customer is going to leave, and Netflix decides to send them preventative actions like a discounted deal. The customer takes it and saves 10%. But they would have stayed anyway. Over a year that costs Netflix $12. 7. Why minimizing False Negatives is worth MORE: Now on the flip side, if Netflix's model incorrectly classifies some one that is on the edge of leaving as predicted to stay, Netflix does nothing. That customer leaves. Over a year that costs Netflix $120. And over the lifetime that could be $500+. 8. What they don't teach you: Expected Value (EV). I'll have a post on that soon. It's how you optimize Machine Learning models for $$$ instead of F1. === Ready to learn Data Science for Business? I put together a free on-demand workshop that covers the 10 skills that helped me make the transition to Data Scientist: learn.business-science.io/free-rtrack-ma… And if you'd like to speed it up, I have a live workshop where I'll share how to use ChatGPT for Data Science: learn.business-science.io/registration-c… If you like this post, please reshare ♻️ it so others can get value.
Matt Dancho (Business Science) tweet media
English
17
347
1.3K
79.4K
Bhaskar Aryal retweetledi
Valeriy M., PhD, MBA, CQF
Valeriy M., PhD, MBA, CQF@predict_addict·
It is almost 2024, one should not confuse confidence intervals with prediction intervals. So what is the difference? This slide describes perfectly. #conformalprediction
Valeriy M., PhD, MBA, CQF tweet media
English
5
37
252
24.3K
Bhaskar Aryal retweetledi
Matt Dancho (Business Science)
When I was first learning data science, one of the things that tripped me up the most was Cross Validation. In 5 minutes, I'll share 5 years of experimentation with dozens of Cross Validation techniques. Let's dive in. 1. Goal: Cross-validation is a statistical method used to estimate the accuracy of machine learning models. It's also used to measure the stability of models when combined with hyperparameter tuning of machine learning models. 2. Principle: The main principle behind cross-validation is partitioning a sample of data into complementary subsets, performing the analysis on one subset, and validating the analysis on the other subset (called the assessment set). 3. Types of Cross Validation: There are many ways to perform cross validation. Some of the most common are K-Fold, Stratified K-Fold, Leave One Out, Group K-Fold, and Time Series Cross Validation. We'll tackle these one at a time. 4. K-Fold Cross-Validation: The data set is divided into 'k' number of subsets (folds). The holdout method is repeated 'k' times, with each of the 'k' subsets serving as the test data one by one and the remaining 'k-1' subsets as the training data. The average of the 'k' testing experiments is used as the overall result. 5. Stratified K-Fold Cross-Validation: Similar to K-Fold, but the sampling method ensures that each fold of the dataset has the same proportion of observations with a given label. This is particularly useful for imbalanced datasets. 6. Leave-One-Out Cross-Validation (LOOCV): A special case of k-fold cross-validation where 'k' is equal to the number of data points in the dataset. I never use this method because it's very time-consuming. I prefer K-fold and stratified k-fold. 7. Time Series Cross-Validation (TSCV): In time series data, the sequence of observations is important. A common approach is to use a "rolling" or "expanding" window for training and testing. Important point, if your model does not require the sequence to be kept intact, it's sometimes better to use K-Fold. I've seen this with XGBoost, where date features are used rather than lags. K-fold outperforms TSCV. But for ARIMA, TSCV is needed because of the algorithm depends don't the sequence of the time series being maintained. 8. Group K-Fold Cross-Validation: The data is split into groups, and these groups are used to ensure that a group is entirely in the training or test set. This is useful for problems where the data is naturally divided into groups (e.g., customers from different store locations). Happy Holidays, -Matt === Ready to learn Data Science for Business? I put together a free on-demand workshop that covers the 10 skills that helped me make the transition to Data Scientist: learn.business-science.io/free-rtrack-ma… And if you'd like to speed it up, I have a live workshop where I'll share how to use ChatGPT for Data Science: learn.business-science.io/registration-c… If you like this post, please reshare ♻️ it so others can get value.
Matt Dancho (Business Science) tweet media
English
5
211
776
54.7K
Bhaskar Aryal retweetledi
Selçuk Korkmaz
Selçuk Korkmaz@selcukorkmaz·
A Simple Guide for Generalized Additive Models (GAMs) 1/ 🧵Let's dive into the world of #GeneralizedAdditiveModels or #GAMs! These are flexible regression models that can capture non-linear relationships. Perfect for when life (and data) isn’t just a straight line! 📈➰ 2/ At its core, GAM is a generalization of the linear model. Instead of fitting a straight line (or plane), GAMs fit smooth curves to the data. Think of it as letting the data guide the shape of the relationship, rather than forcing it into a straight jacket!🕺 3/ Why use GAMs? 🤔 • Your scatter plot suggests a wavy pattern • Residual plots from linear models show patterns (they shouldn't!) • You have complex temporal or spatial data • You want flexibility without manually creating polynomial terms 4/ How do GAMs achieve this? 🧐 Through smoothing functions. These are mathematical constructs that allow for bends and twists in the relationship between predictors and the outcome. Splines are a common choice for these functions! 5/ One beauty of #GAMs is that they can handle multiple types of data distributions. Whether you're predicting a continuous variable, binary outcomes, or counts, there’s a GAM for that! It's like GLMs but with added flexibility. 🎯 6/ Now, while GAMs sound dreamy (and often they are!), there's a balance. More flexibility can sometimes lead to overfitting. You know, when your model is TOO tailored to your training data & performs poorly on new data. It's like memorizing answers for a test but failing the real exam. 📚✖️ 7/ Fortunately, GAMs have built-in penalties to control for overfitting. It's like having an internal check, making sure the model isn't getting too carried away with wiggles and bends! 🙌 8/ Another plus? Interpreting GAMs can be pretty intuitive. You get visual plots showing the effect of each predictor. Instead of squinting at coefficients, you can see the shape of the relationship directly! 📊 9/ In practice, tools like R's mgcv package make it pretty straightforward to fit GAMs. But, as with all models, understanding the underlying mechanics & assumptions can really up your GAM game! 🛠️ 10/ In summary: • Use GAMs for non-linear relationships • They're flexible but have checks to prevent overfitting • Visual interpretations are a boon! 11/ So next time your data doesn’t play nice with straight lines, consider giving #GeneralizedAdditiveModels a spin. They're a powerful tool in a statistician’s toolbox, ready to tackle those wavy, bendy data challenges! 🌀🔍 12/ Liked this thread? Found it useful? Feel free to like, share, and comment with your experiences or questions about GAMs! Let’s keep the #StatsChat going! 🗣️👥🎉 #Statistics #DataScience
Selçuk Korkmaz tweet media
English
16
244
1.3K
296.6K
Bhaskar Aryal retweetledi
DA3_Symposium
DA3_Symposium@DA3_Symposium·
⚠️ CALL FOR POSTERS ⚠️ 👉🏼Be part of the Poster Session at the #DA3Symposium 👉🏼Submit your abstract at da3symposium.com/posters Abstract Submission Deadline: Now extended to October 1, 2023
DA3_Symposium tweet media
English
0
8
5
2.1K
Bhaskar Aryal retweetledi
DA3_Symposium
DA3_Symposium@DA3_Symposium·
#PosterSession We invite you to submit your abstract! 👉🏼Share your research in our poster session at the Corteva Symposia Series "Digital Agriculture & Advanced Analytics Symposium (DA3)" at the Alumni Center at @kstate 👉🏼To know more visit da3symposium.com
DA3_Symposium tweet mediaDA3_Symposium tweet media
English
0
7
4
3.1K
Durapada Sapkota
Durapada Sapkota@durapada·
It was a great honour for me to receive the “Prabal Jana Sewa Shree” award from the Rt. Hon President of Nepal. Very happy, enthusiastic and grateful🙏🏻. I also take this opportunity to send best wishes for your happiness and prosperity in the new year, 2080. Happy New Year all !
Durapada Sapkota tweet mediaDurapada Sapkota tweet mediaDurapada Sapkota tweet mediaDurapada Sapkota tweet media
English
37
1
135
13.7K