Eugene Dubossarsky

42 posts

Eugene Dubossarsky

Eugene Dubossarsky

@cargomoose

the data won't science itself

Sydney, New South Wales Katılım Şubat 2009
413 Takip Edilen473 Takipçiler
Sabitlenmiş Tweet
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
1/ So here's my very handy and very simple method for creating a random subsample of a multivariate dataset that is very representative of the original. With added #machinelearning goodness ! Ready ? Here goes.
English
5
13
38
0
Eugene Dubossarsky retweetledi
Alexander Stoyanov
Alexander Stoyanov@Al_Stoyanov·
St. Peter asks a newcomer: - Born? - Austro-Hungary. - Went to school? - Czechoslovakia - Married in? - Hungary - Kids born in? - The Third Reich - And your grandchildren? - USSR. - Where did you die? - Ukraine. - Man, you traveled a lot! - Nonsense, I never left Mukachevo.
English
82
2.7K
30.1K
1.1M
Eugene Dubossarsky retweetledi
Jay Van Bavel, PhD
Jay Van Bavel, PhD@jayvanbavel·
Questions at academic conferences:
Jay Van Bavel, PhD tweet media
Català
105
1.2K
9.2K
998.6K
Eugene Dubossarsky retweetledi
alz
alz@alz_zyd_·
I'll be honest I did not expect that the machines would start thinking and the humans would more or less just ignore the rather obvious fact that the machines are thinking
English
150
103
1.3K
204.1K
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
Data Science Sydney's November event is a deep dive into science, experimentation and A/B testing. Food and Drink provided as always, with plenty of time to network with practitioners and leaders. meetup.com/data-science-s…
English
0
1
1
153
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
Data Science Sydney is proud to host #kaggle grandmaster Xavier Conort ! Come for an exciting evening of feature engineering with Generative AI. Join me on the 27th of Sept meetu.ps/e/Mtljr/Wfl5v/i
English
0
0
2
150
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
@jim_savage_ That’s exactly what the unsupervised form of the random forest does. It distinguishes your data from multivariate uniform samples
English
0
0
0
0
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
The Random Forest Proximity Matrix and Neural Embedding both do essentially the same thing.
English
1
1
2
0
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
@WalterReade Very similar - but with a slightly different endpoint - in my case : many repeated splits / models, with the purpose of creating a very good split / AUC closest to 0.5.
English
1
0
1
0
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
1/ So here's my very handy and very simple method for creating a random subsample of a multivariate dataset that is very representative of the original. With added #machinelearning goodness ! Ready ? Here goes.
English
5
13
38
0
Eugene Dubossarsky
Eugene Dubossarsky@cargomoose·
@peregrinari7 You don’t need to do that with RFs. Part of their unique magic is that you can use the entire dataset to train the model and then get Out Of Bag error on... the entire dataset again ! Kinda like k-fold cross-validation but baked into the alg
English
0
0
0
0
Eugene Dubossarsky retweetledi
Tony Corke
Tony Corke@MatterOfStats·
@teouchanalytics @cargomoose One thing it'd be interesting to see is the extent to which different ML algos selected different "best" samples (if, indeed, they do) and whether, across different sample sets, the same algos tended to agree or disagree. ie is "representativeness" well-defined. Fun times!
English
1
1
1
0