Post

Synthetic Data Vault
Synthetic Data Vault@sdv_devยท
Last week, we shared a synthetic populations dataset for the United States but this week weโ€™re sharing one published by researchers for the whole world. ๐ŸŒ Marijin Ton et alย released a gigantic synthetic population dataset that represents ~๐Ÿณ.๐Ÿฏ๐Ÿฏ ๐—ฏ๐—ถ๐—น๐—น๐—ถ๐—ผ๐—ป ๐—ต๐˜‚๐—บ๐—ฎ๐—ป๐˜€, which matches the 2015 human population count, and ~๐Ÿญ.๐Ÿต๐Ÿต ๐—ฏ๐—ถ๐—น๐—น๐—ถ๐—ผ๐—ป ๐—ต๐—ผ๐˜‚๐˜€๐—ฒ๐—ต๐—ผ๐—น๐—ฑ๐˜€. ๐—ง๐—ต๐—ฒ ๐— ๐—ผ๐˜๐—ถ๐˜ƒ๐—ฎ๐˜๐—ถ๐—ผ๐—ป To understand the impact of societal changes like disease, extreme weather, and more, modelers sometimes resort to simplifying assumptions of human behavior. According to the authors โ€“ โ€œ๐˜๐˜ฐ๐˜ณ ๐˜ฆ๐˜น๐˜ข๐˜ฎ๐˜ฑ๐˜ญ๐˜ฆ, ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜จ๐˜ณ๐˜ข๐˜ต๐˜ฆ๐˜ฅ ๐˜ข๐˜ด๐˜ด๐˜ฆ๐˜ด๐˜ด๐˜ฎ๐˜ฆ๐˜ฏ๐˜ต ๐˜ฎ๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ๐˜ด ๐˜ฐ๐˜ง ๐˜ค๐˜ญ๐˜ช๐˜ฎ๐˜ข๐˜ต๐˜ฆ ๐˜ค๐˜ฉ๐˜ข๐˜ฏ๐˜จ๐˜ฆ ๐˜ต๐˜บ๐˜ฑ๐˜ช๐˜ค๐˜ข๐˜ญ๐˜ญ๐˜บ ๐˜ข๐˜ด๐˜ด๐˜ถ๐˜ฎ๐˜ฆ ๐˜ข ๐˜ณ๐˜ฆ๐˜ฑ๐˜ณ๐˜ฆ๐˜ด๐˜ฆ๐˜ฏ๐˜ต๐˜ข๐˜ต๐˜ช๐˜ท๐˜ฆ ๐˜ค๐˜ฐ๐˜ฏ๐˜ด๐˜ถ๐˜ฎ๐˜ฆ๐˜ณ ๐˜ฐ๐˜ง ๐˜ข ๐˜ด๐˜ช๐˜ฏ๐˜จ๐˜ญ๐˜ฆ ๐˜ข๐˜ท๐˜ฆ๐˜ณ๐˜ข๐˜จ๐˜ฆ ๐˜จ๐˜ญ๐˜ฐ๐˜ฃ๐˜ข๐˜ญ ๐˜ฐ๐˜ณ ๐˜ณ๐˜ฆ๐˜จ๐˜ช๐˜ฐ๐˜ฏ๐˜ข๐˜ญ ๐˜ค๐˜ฐ๐˜ฏ๐˜ด๐˜ถ๐˜ฎ๐˜ฆ๐˜ณ.โ€ By creating a synthetic individuals dataset thatโ€™s consistent with published demographic statistics at the state / province level (administrative level 1) for most countries, theyโ€™re hoping to improve the data and assumptions used in global impact simulations. ๐—ง๐—ต๐—ฒ๐—ถ๐—ฟ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐˜€ The team primarily used data from 2 databases: โ€ข Luxembourg Income Study, which has very detailed microdata for 50 countries. LIS data especially shines for medium and high income countries. โ€ข Demographic and Health Surveys, which has very detailed microdata for 90 countries. DHS data especially shines for low-income countries. Households and individuals in the remaining countries were generated using regional statistics. A small number of countries were excluded that were missing reliable, published statistics. This is a great dataset to explore geospatial visualizations or to build regional or global impact models. ๐Ÿ“š Link to the paper: nature.com/articles/s4159โ€ฆ ๐Ÿ—„๏ธ Link to the dataset: dataverse.harvard.edu/dataset.xhtml?โ€ฆ #syntheticdata #machinelearning #generativeai Kudos to researchers who made this happen: Michiel Ingels, Jens de Bruijn, Hans de Moel, Lena Reimann, Wouter Botzen, Jeroen Aerts Credit to the Nature Magazine and the authors for the image showcasing the population coverage and data source for each country.
Synthetic Data Vault tweet media
English
0
1
2
102
Compartir