The Data Digest

385 posts

The Data Digest

@DigestData

가입일 Haziran 2018

433 팔로잉481 팔로워

The Data Digest@DigestData·16 Ağu

@agraybee I did make a interactive chart for top IMDB shows that compares the average rating to the end (last 2 episodes): public.flourish.studio/visualisation/… Dexter and HIMYM also ended pretty badly.

English

Everything Price Sufferer (but especially eggs)@agraybee·15 Ağu

It's Game of Thrones and it's not really an argument. This show was a cultural juggernaut, every week the whole internet watched together, it was endlessly quoted, children were named after the characters, and no one brings it up anymore because the last season was so awful.

cinesthetic.@TheCinesthetic

What TV Show had the worst ending?

English

2.7K

27.7K

343.1K

29.2M

The Data Digest@DigestData·28 Tem

@vidIQ 50

vidIQ@vidIQ·27 Tem

How many videos have you uploaded to YouTube?

English

134

13.8K

The Data Digest@DigestData·24 Tem

@vidIQ youtube.com/watch?v=7Ctg9U…

YouTube

QME

vidIQ@vidIQ·24 Tem

🚨🚨THUMBNAIL AUDITS🚨🚨 1. Drop a link to your YouTube video. 2. I'll score the thumbnail 1-10. I'll be responding all day today.

English

308

286

38.1K

The Data Digest@DigestData·15 Tem

@JanBroderEngler Sweet. Thank you :)

English

Jan Broder Engler@JanBroderEngler·15 Tem

@DigestData You can do `remove_legend()` and `flip_plot()`

English

Jan Broder Engler@JanBroderEngler·15 Tem

This is how you can sort axis levels in #tidyplots 0.3.1 🤩 #rstats #dataviz #phd

English

171

9.9K

The Data Digest@DigestData·13 Tem

@selcukorkmaz Does that mean that 3. Residuals are the remaining "errors" or variance that cannot be explained by the fixed effects and random effects (hospitals) but are unique to each individual in the study?

English

259

Selçuk Korkmaz@selcukorkmaz·12 Tem

Understanding Mixed Effects Models: A Simple Example Imagine researchers are testing a new drug to lower blood pressure. They give the drug to patients in 10 different hospitals and measure how much each patient’s blood pressure drops after taking the drug. Now, two things can affect the results: 1. Fixed effects: These are the main things the researchers are interested in, like: • Whether the drug works, • The patient’s age or gender, • The dose of the drug. 2. Random effects: These are differences that come from the hospitals themselves: • Some hospitals may have better equipment. • Some doctors might be more experienced. • Some hospitals may treat more severe cases. The patients from the same hospital might have more similar results just because they’re treated in the same environment. A mixed effects model allows researchers to: • Study the fixed effect of the drug: Does it lower blood pressure overall? • Account for the random effect of hospital differences: Some hospitals might naturally have better or worse outcomes, and we don’t want that to bias the results. By using a mixed effects model, the researchers avoid falsely thinking the drug doesn’t work just because one hospital had unusually bad outcomes or overestimating its power because one hospital had great results. In short: Mixed effects models help doctors and scientists separate what’s due to the treatment (the drug) from what’s due to the setting (the hospital), making their conclusions more trustworthy.

English

298

17.5K

The Data Digest@DigestData·26 Haz

@JoachimSchork Looks very clean and informative indeed! Also great to see that the am:cyl estimate CI95 overlaps with 0, hence the p-value > 0.05 However the most important factor for mpg is wt, explaining 74,46% of the variance alone. am, cyl etc. correlate with wt in the mtcars dataset.

English

Joachim Schork@JoachimSchork·26 Haz

Easily interpret regression models with clear visualizations! The ggcoefstats() function from the ggstatsplot package generates dot-and-whisker plots, providing essential statistical details for models saved in a tidy data frame. ✔️ Visualize Model Estimates: Each plot displays dots representing regression coefficients, with whiskers showing their confidence intervals (default is 95%). This allows you to assess the strength and direction of effects. ✔️ Detailed Statistical Labels: Labels attached to each dot provide additional information, including estimates, t-statistics, and p-values, offering a comprehensive view of the regression analysis. ✔️ Diagnostic Information: Captions include model diagnostics such as AIC and BIC values, which are useful for comparing model performance. Lower AIC and BIC values generally indicate a better model fit. ✔️ Customizable ggplot2 Output: The plots are fully compatible with ggplot2, letting you tweak and modify themes, colors, and other elements using the same familiar syntax. The visualization shown here is from the package website, demonstrating how ggcoefstats() effectively conveys statistical information through dot-and-whisker plots: github.com/IndrajeetPatil… Want to enhance your skills in creating insightful visualizations with ggplot2 and its extensions? Join my online course, “Data Visualization in R Using ggplot2 & Friends!” Further details: statisticsglobe.com/online-course-… #DataAnalytics #Rpackage #RStats #DataScientist #Statistics #ggplot2

English

111

4.4K

The Data Digest@DigestData·20 Haz

No worries, I am glad I saw the post. Helped me to brush up on type1/type2 error distinction and I did not even know what F1 score is and how Precisions is defined. The fastml package looks really powerful. I will check it out when I dive deeper into tidymodels, or is that the wrong order? What would you recommend.

English

Selçuk Korkmaz@selcukorkmaz·20 Haz

Good point. Since this is a toy example, the results aren’t final. The ranger-RF model may improve with proper tuning like parameter optimization. But still, in real heart disease screening, sensitivity should be the priority. Even in early tests. It is important to keep that in mind.

English

124

Selçuk Korkmaz@selcukorkmaz·19 Haz

Just used fastml to compare logistic regression (glm/glmnet) and random forest (ranger/randomForest) on the Framingham dataset. Repeated CV + Bayesian tuning (20 iter) with early stopping, MICE imputation, and upsampling made model selection easy. 🚀 library(fastml) data(framingham) custom_tune <- list( rand_forest = list(ranger = list(mtry = c(2, 5), min_n = c(3, 10))), logistic_reg = list(glm = list(penalty = c(0.0, 1.0))) ) model <- fastml( data = framingham, label = "TenYearCHD", algorithms = c("rand_forest", "logistic_reg"), algorithm_engines = list( rand_forest = c("ranger", "randomForest"), logistic_reg = c("glm", "glmnet") ), resampling_method = "repeatedcv", folds = 5, repeats = 3, use_default_tuning = TRUE, tuning_strategy = "bayes", tuning_iterations = 20, tune_params = custom_tune, balance_method = "upsample", early_stopping = TRUE, impute_method = "mice", event_class = "second", seed = 42 ) summary(model)

English

157

15.7K

The Data Digest@DigestData·15 Haz

She was also the best performing woman in the Titled Tuesday rapid tournaments on chess com last year. With an average score of 6.82 in 37 participations. A 9 game winning streak and a best place of 37. Win White 59.8%, Win Black 50.5% Average opponent rating she defeated: 2579 More 2024 analysis: youtu.be/DXj3AHEMIRI

YouTube

English

Chess.com - India@chesscom_in·12 Haz

Chess is 💛

English

780

The Data Digest@DigestData·15 Haz

@chesscom_in @ArjunErigaisi That means he is outperforming his 2024 results already. There he could not win a tournament in 39 tries and his longest winning streak was 9 games. 2024 stats: max score 9.5 (median 8), Win ⚪ 68%, Win ⚫ 66.5%. More 2024 analysis here: youtu.be/DXj3AHEMIRI

YouTube

English

Chess.com - India@chesscom_in·21 May

🇮🇳 GM Arjun Erigaisi missed the first round and then went on to win the Late Titled Tuesday tournament with a perfect 10/10! He defeated Karthikeyan Murali, Aravindh Chithambaram, Pranesh, Praggnanandhaa and Magnus Carlsen in the process! Congratulations @ArjunErigaisi 👏

English

401

5.9K

The Data Digest@DigestData·15 Haz

@frankiethull view()

English

Ambassador Frank Hull ☤@frankiethull·13 Haz

What is your favorite function in R❓ Wrong answers only.

English

16.1K

The Data Digest@DigestData·5 May

@R_Graph_Gallery I am one of the 34% because I already write code like the AFTER-image 😇. Not 100% but with tab and Enter R is doing most of the formatting already. But thanks for pointing out Air and formatter package. Will give it a try soon.

English

112

Yan Holtz@R_Graph_Gallery·5 May

🚨 66% of R and Python users do NOT use a formatter 😳 A formatter takes messy code and automatically improves its layout: ✅ Better indentation ✅ Proper spacing ✅ Reasonable line length Poll and explanation in my latest post: 👉 blog.yan-holtz.com

English

2.8K

The Data Digest@DigestData·28 Nis

@micosapiens711 @AmazfitGlobal @ZeppGlobal Awesome :) I have a fitbit and it also tracks sleep. REM, deepsleep, overall duration and wake-time, and then builds an overall index. When I hit the gym to hard or to late in the day, I have trouble sleeping, but long walks outside or runs in the morning help a lot sleepwise.

English

120

Vic 🥏@micosapiens711·28 Nis

@DigestData @AmazfitGlobal @ZeppGlobal I love your idea to check how steps affect the next night’s sleep—I’m all in for it and will share the code later!

English

119

Vic 🥏@micosapiens711·28 Nis

Final whistle near and I’m just at Day 16 of #30DayChartChallenge: My graph shows a #Negative relationship—while somewhat low, days with more shallow sleep often have fewer steps. I’m thinking, "It’s the pillow, stupid!" #Amazfit crew! #rstats #lubridate #ggplot2 #tidyverse 💤🚶🏾‍♂️

Ecuador 🇪🇨 English

473

The Data Digest@DigestData·24 Nis

Because the challenge category is time series, I’m showing life expectancy vs. GDP for 140 countries from 1952 to 2007. As countries develop economically, people also live longer lives.

English

The Data Digest@DigestData·24 Nis

Day 23 (Log_Scale | Time series) #30daychartchallenge There are many ways to show continuous & skewed data on a log scale in #rstats 📉 1️⃣ Transform directly in aes(x = log10(gdpPercap)) 2️⃣ Use scale_x_log10() 3️⃣ Use scale_x_continuous(transform = "log10") 4️⃣ Or go with coord_trans(x = "log10") I like scale_x_log10(), because you can specify the breaks and labels with: breaks = c(250, 500, 1000, 2500, 5000, ...), labels = scales::dollar_format() Also, annotation_logticks(sides = "b") is a great way to show where key log steps lie. #dataviz #ggplot2

English

267

The Data Digest@DigestData·24 Nis

English

108

The Data Digest@DigestData·24 Nis

@nastengraph I like these double log charts to visualize power-law relationships. I think body mass vs. heart rate also leads to a straight line. As does log(heart rate) vs. life expectancy for mammals.

English

Anastasiya@nastengraph·23 Nis

#30DayChartChallenge – Log Scale I wanted to create something similar to the famous scatterplot from Edward Tufte’s book Beautiful Evidence. For me, it’s the best showcase of the log scale. Creating the images was the hardest part.

English

358

The Data Digest@DigestData·24 Nis

🚨 50% OFF DataCamp Annual Subscription! 🚨 Learn R, Python, Power BI, Tableau, SQL, ML & more with hands-on courses. I’ve used it for years — it works. Level up your skills or learn something new: 👇 🔗 datacamp.pxf.io/BnRa3W #DataCamp #MachineLearning #LearnToCode

English

122

탐색

@agraybee @vidIQ @JanBroderEngler @selcukorkmaz @JoachimSchork @chesscom_in @ArjunErigaisi @elonmusk