Hail

443 posts

Hail banner
Hail

Hail

@hailgenetics

open-source scalable genetics

Cambridge, MA Katılım Ekim 2016
221 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
Hail
Hail@hailgenetics·
Hey all! We are running a survey to solicit broad-based feedback from users & *non-users* of Hail Query. We'd like to hear from anyone who analyzes sequencing or genotype-chip datasets! All questions optional, fully anonymous, & takes <=10min! forms.gle/ucgu9h35UEkB68…
English
0
2
3
3K
Hail retweetledi
Konrad Karczewski
Konrad Karczewski@konradjk·
CHARR operates only on homozygous alternate sites and scales very well (“cost per 1M samples” might be my new favorite metric):
Konrad Karczewski tweet media
English
1
1
3
1K
Hail retweetledi
Kaitlin Samocha
Kaitlin Samocha@ksamocha·
Now that the dust has settled on the gnomAD v4 release, which hopefully many of you have already checked out, I wanted to take this week to thank many of the members of the team who made this possible. First up this week is the amazing production team. 1/7
Kaitlin Samocha tweet media
English
2
13
85
10K
Hail
Hail@hailgenetics·
@ngsstudent @ksamocha I suspect 10-30TiB for 1M samples, given certain assumptions, but we haven't actually tested it at 1M samples. Even so, I think SVCR-VCF is mostly useful for *interchange*. IMHO (@danking00 ), the days of analyzing sequencing data with text tools is gone. Just too much data 3/3
English
0
0
0
35
Hail
Hail@hailgenetics·
@ngsstudent @ksamocha PVCF is dead. We're finishing up a preprint that describes the "Scalable Variant Call Representation" and a proposed VCF implementation called "SVCR-VCF". SVCR-VCF represents both samples and alleles sparsely, allowing for a much smaller representation 2/3
English
1
1
2
149
Hail retweetledi
Kaitlin Samocha
Kaitlin Samocha@ksamocha·
Julia Goodrich is now up to talk about QC in "many, many exomes and genomes". Earlier this week, we released gnomAD v4 -- 76k genomes and 731k exomes. Close to 1 BILLION variants. But how do we ensure data quality at this scale? #ASHG2023 #ASHG23
English
1
4
20
3.1K
Hail retweetledi
Hail
Hail@hailgenetics·
Hey all! We are running a survey to solicit broad-based feedback from users & *non-users* of Hail Query. We'd like to hear from anyone who analyzes sequencing or genotype-chip datasets! All questions optional, fully anonymous, & takes <=10min! forms.gle/ucgu9h35UEkB68…
English
0
2
3
3K
Hail
Hail@hailgenetics·
@ngsstudent @ksamocha The VDS is a new sparse representation we developed. It stores both samples and alleles sparsely (ref blocks for sample sparsity, “local” allele indices for alleles). Under the covers, it’s a pair of matrix tables: one containing reference data and one containing variant data.
English
1
1
2
655
aaaa
aaaa@ngsstudent·
@ksamocha VDS or matrix ? 🤔
Deutsch
1
0
0
65
Hail retweetledi
Wolfgang M. Pernice
Wolfgang M. Pernice@joungMax·
This is big! #gnomAD has enabled countless breakthroughs in human genetics to date. Still, especially for rare genetic diseases, we face countless diagnostic riddles. Can't wait to see the impact this update will have (and of the tech behind it @hailgenetics). Thanks for this 🙏
Genome Aggregation Database@gnomad_project

The #gnomAD team is proud to announce the release of gnomAD v4! The v4 dataset includes 730,947 exomes & 76,215 genomes, which is ~5x larger than the v2 & v3 releases combined, & includes nearly 120K indivs of non-European genetic ancestry broad.io/gnomad_v4 #ASHG23 (1/11)

English
0
1
5
607
Hail
Hail@hailgenetics·
[1] Assuming reasonably reblocked GVCFs; bigger GVCFs of course drive cost up!
English
0
0
0
108
Hail
Hail@hailgenetics·
With the current version of Hail the cost to produce a VDS from GVCFs, is down to 0.005 USD per exome [1]. The VDS is also compact, costing about 0.0005 USD per month per exome to store in Google Cloud Storage (~25MiB per exome). 2/3
English
2
0
2
221
Hail
Hail@hailgenetics·
The released set has ~800k samples, but we started with a 955,000 sample call set! *Staggeringly* large! We developed the Scalable Variant Call Representation (SVCR) and implemented it as the Hail VDS to enable this nearly 1M sample callset. 1/3
Genome Aggregation Database@gnomad_project

The #gnomAD team is proud to announce the release of gnomAD v4! The v4 dataset includes 730,947 exomes & 76,215 genomes, which is ~5x larger than the v2 & v3 releases combined, & includes nearly 120K indivs of non-European genetic ancestry broad.io/gnomad_v4 #ASHG23 (1/11)

English
1
6
32
4.7K
Hail
Hail@hailgenetics·
And, of course, all that functionality is publicly available as MIT licensed software! Check out the docs at: hail.is/docs/0.2/vds/i… Huge congrats to the gnomAD team for another massive release! 3/3
English
0
0
2
140
Hail retweetledi
Alicia Martin
Alicia Martin@genetisaur·
Way to go @ZanKoenig and @itsnotmeron for developing scalable cloud-based tutorials for common genomics analyses: github.com/atgu/hgdp_tgp/…. We have also released phased haplotypes for phasing/imputation: gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes
Alicia Martin tweet media
English
1
5
31
2.7K