
Last week, we shared a synthetic populations dataset for the United States but this week weโre sharing one published by researchers for the whole world. ๐
Marijin Ton et alย released a gigantic synthetic population dataset that represents ~๐ณ.๐ฏ๐ฏ ๐ฏ๐ถ๐น๐น๐ถ๐ผ๐ป ๐ต๐๐บ๐ฎ๐ป๐, which matches the 2015 human population count, and ~๐ญ.๐ต๐ต ๐ฏ๐ถ๐น๐น๐ถ๐ผ๐ป ๐ต๐ผ๐๐๐ฒ๐ต๐ผ๐น๐ฑ๐.
๐ง๐ต๐ฒ ๐ ๐ผ๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
To understand the impact of societal changes like disease, extreme weather, and more, modelers sometimes resort to simplifying assumptions of human behavior.
According to the authors โ โ๐๐ฐ๐ณ ๐ฆ๐น๐ข๐ฎ๐ฑ๐ญ๐ฆ, ๐ช๐ฏ๐ต๐ฆ๐จ๐ณ๐ข๐ต๐ฆ๐ฅ ๐ข๐ด๐ด๐ฆ๐ด๐ด๐ฎ๐ฆ๐ฏ๐ต ๐ฎ๐ฐ๐ฅ๐ฆ๐ญ๐ด ๐ฐ๐ง ๐ค๐ญ๐ช๐ฎ๐ข๐ต๐ฆ ๐ค๐ฉ๐ข๐ฏ๐จ๐ฆ ๐ต๐บ๐ฑ๐ช๐ค๐ข๐ญ๐ญ๐บ ๐ข๐ด๐ด๐ถ๐ฎ๐ฆ ๐ข ๐ณ๐ฆ๐ฑ๐ณ๐ฆ๐ด๐ฆ๐ฏ๐ต๐ข๐ต๐ช๐ท๐ฆ ๐ค๐ฐ๐ฏ๐ด๐ถ๐ฎ๐ฆ๐ณ ๐ฐ๐ง ๐ข ๐ด๐ช๐ฏ๐จ๐ญ๐ฆ ๐ข๐ท๐ฆ๐ณ๐ข๐จ๐ฆ ๐จ๐ญ๐ฐ๐ฃ๐ข๐ญ ๐ฐ๐ณ ๐ณ๐ฆ๐จ๐ช๐ฐ๐ฏ๐ข๐ญ ๐ค๐ฐ๐ฏ๐ด๐ถ๐ฎ๐ฆ๐ณ.โ
By creating a synthetic individuals dataset thatโs consistent with published demographic statistics at the state / province level (administrative level 1) for most countries, theyโre hoping to improve the data and assumptions used in global impact simulations.
๐ง๐ต๐ฒ๐ถ๐ฟ ๐๐ฎ๐๐ฎ ๐ฆ๐ผ๐๐ฟ๐ฐ๐ฒ๐
The team primarily used data from 2 databases:
โข Luxembourg Income Study, which has very detailed microdata for 50 countries. LIS data especially shines for medium and high income countries.
โข Demographic and Health Surveys, which has very detailed microdata for 90 countries. DHS data especially shines for low-income countries.
Households and individuals in the remaining countries were generated using regional statistics. A small number of countries were excluded that were missing reliable, published statistics.
This is a great dataset to explore geospatial visualizations or to build regional or global impact models.
๐ Link to the paper: nature.com/articles/s4159โฆ
๐๏ธ Link to the dataset: dataverse.harvard.edu/dataset.xhtml?โฆ
#syntheticdata #machinelearning #generativeai
Kudos to researchers who made this happen: Michiel Ingels, Jens de Bruijn, Hans de Moel, Lena Reimann, Wouter Botzen, Jeroen Aerts
Credit to the Nature Magazine and the authors for the image showcasing the population coverage and data source for each country.

English