Xihong Lin

2.1K posts

Xihong Lin banner
Xihong Lin

Xihong Lin

@XihongLin

Professor of @HarvardBiostats & @HarvardStats, (Bio)statistics,Data Science,Genetics Genomics,Epidemiology,Health, Education,COVID19 analysis,Views are all mine

Boston, MA 가입일 Ağustos 2017
422 팔로잉7.8K 팔로워
Xihong Lin
Xihong Lin@XihongLin·
Our review article on harnessing synthetic data from generative AI for statistical inference. We discuss generative models for synthetic data & their principled use for valid downstream statistical inference, esp when generative models are misspecified. arxiv.org/abs/2603.05396.
English
1
11
72
8.3K
Xihong Lin
Xihong Lin@XihongLin·
GATE-STAAR: Rare variant association tests for survival analysis of time-to-event phenotypes, e.g., age at diagnosis in the UK biobank. It fits frailty models by incorporating variant functional annotations and accouning for relatedness, and ancestry PCs pnas.org/doi/abs/10.107…
English
0
5
31
2.9K
Xihong Lin
Xihong Lin@XihongLin·
Our cellSTAAR method incorporates single-cell-sequencing-based cell type specific functional annotations to boost the power in rare variant association testing of noncoding regions in large scale Whole Genome Sequencing Studies nature.com/articles/s4159…
English
0
3
55
3.6K
Xihong Lin 리트윗함
Xihao Li
Xihao Li@xihaoli·
We’re excited to share our latest publication in @CellGenomics: “Streamlining Large-Scale Genomic Data Management: Insights from the UK Biobank Whole-Genome Sequencing Data”. Sincerely thanks to @drarwood, @muzizimumu1, @XihongLin, Yuxin Yuan, Gareth Hawkes, Robin Beaumont, Michael Weedon, all collaborators and study participants from the @uk_biobank Program. We highlight the annotated Genomic Data Structure (aGDS) format, the vcf2agds toolkit, and the STAARpipeline, which together: • Seamlessly integrate genotypes + functional annotations in an all-in-one compact file for downstream analyses 🗃️ • Reduce UKB 500k WGS storage from 1473.85 TiB (GraphTyper pVCF, #23374) to 1.10 TiB (aGDS); and from 17.87 TiB (ML-Corrected DRAGEN pVCF, #24311) to 1.65 TiB (aGDS) 🗄️ • Enable scalable, functionally informed WGS association analyses across coding & noncoding genome 🧬 • Empower open-source, RAP-integrated genomic analyses for hundreds of thousands of samples ☁️ Applying STAARpipeline to the UKB 500k WGS data for total cholesterol, we identified 480 genome-wide significant rare variant associations, including 200 coding and 280 noncoding functional categories. These signals encompass lipid biology mainstays (e.g., PCSK9, APOB, NPC1L1, LDLR, APOE) and regulatory variants mapped to promoter, enhancer, and UTR regions, demonstrating the power of biobank-scale WGS for genomic discovery. All tools are open source and freely available: 🔗 vcf2agds toolkit: github.com/drarwood/vcf2a… 🔗 STAARpipeline: github.com/xihaoli/STAARp… 🔗 See the paper for complete links to software repos: cell.com/cell-genomics/…
Xihao Li tweet mediaXihao Li tweet mediaXihao Li tweet media
Cell Genomics@CellGenomics

Streamlining large-scale genomic data management: Insights from the UK Biobank whole-genome sequencing data dlvr.it/TN8k2L

English
1
13
60
12.6K
Xihong Lin
Xihong Lin@XihongLin·
Our review article on Causal Mediation Analysis for Integrating Exposure, Genomic, and Phenotype Data has appeared in ARSIA. We give an overview of methods for large-scale tests for mediators, which require testing for a large number of composite nulls. annualreviews.org/content/journa…
English
1
8
44
3.3K
Xihong Lin 리트윗함
Luke O'Connor
Luke O'Connor@Luke0connor·
Excited to share our preprint, describing a method for heritability partitioning with GWAS sumstats that significantly improves upon S-LDSC Led by the fantastic Hui Li, and co-supervised by @XihongLin #ASHG24 poster 4089F medrxiv.org/content/10.110…
English
2
38
118
14.1K
Xihong Lin
Xihong Lin@XihongLin·
Our new paper on semi-supervised machine learning method for predicting homogeneous ancestry groups to assess HWE in diverse whole #genome sequencing studies. The work was motivated by NHGRI Genome Sequencing Program - CCDG data. Applied to HWE QC of 60,000+ diverse whole genomes
AJHG@AJHGNews

📢Online now! 📰Semi-supervised machine learning method for predicting homogeneous ancestry groups to assess Hardy-Weinberg equilibrium in diverse whole #genome sequencing studies 🧑‍🤝‍🧑@xihonglin @bmneale @mczody & co cell.com/ajhg/abstract/…

English
1
6
21
4.6K
Xihong Lin
Xihong Lin@XihongLin·
Join us for the 2024 Harvard PQG conference on "AI for Genomics and Health" in Boston on Oct 17-18, 2024. A fantastic lineup of speakers. See three topics below. Early bird registration and abstracts due on 9/15. Stellar abstract awards. Pls RT hsph.harvard.edu/pqg-conference/
Xihong Lin tweet media
English
1
9
58
4.5K
Xihong Lin
Xihong Lin@XihongLin·
Congrats to Zack McCaw, Jianhui Gao, and Jessica Gronsbell!
English
1
0
1
617
Xihong Lin
Xihong Lin@XihongLin·
SynSurr uses ML models to predict synthetic surrogates and jointly models the synthetic surrogate and the target outcome. It is robust to misspecification of imputation models (2/)
English
1
0
3
647
Xihong Lin
Xihong Lin@XihongLin·
Analysis of biobanks is challenged by missing values of phenotypes. Our new paper on using ML-based synthetic surrogate analysis (SynSurr) to improve power for GWAS of partially missing phenotypes in population biobanks. nature.com/articles/s4158… (1/)
English
1
18
67
7K
Xihong Lin
Xihong Lin@XihongLin·
Our Statistics Science Interview of Ray Carroll about his career, joint with @nilanjan10c . Ray is an outstanding statistician and scientist. He has been a wonderful mentor of many, not just his students and postdocs, but many who are not. projecteuclid.org/journals/stati…
English
0
2
16
1.6K
Xihong Lin 리트윗함
Samira Asma
Samira Asma@DrSamira_Asma·
🚨Hiring @WHO Director of Data & Analytics 🔹 Lead efforts to use accurate, timely data to improve global health 🔹 Empower countries to reach health goals with data-driven policies 🔹 Accelerate progress to the SDGs & Triple Billion targets Apply 🔽 careers.who.int/careersection/…
English
4
42
75
12.3K