Haoyu Cheng
60 posts


New preprint on hifiasm (ONT)! We can now achieve near T2T human genome assembly using only ONT Simplex reads—in just half a day, with or without ultra-long sequencing.
biorxiv.org/content/10.110…
Mike Vella@vellamike
Telomere-to-telomere de novo assembly from standard ONT reads (LSK114, Simplex). A really exciting advance—makes high-quality assembly practical for population-scale sequencing! Preprint from @ChengChhy, @lh3lh3 and colleagues biorxiv.org/content/10.110…
English

@XLR @lh3lh3 @vellamike @nanopore The new version of hifiasm performs phasing, improved full-length base-level alignment rather than window-based alignment, and considers base quality.
English

Exciting news! The latest hifiasm release from @ChengChhy and @lh3lh3 adds beta support for @nanopore simplex R10 reads. Initial results look very promising. 🚀
Check it out: github.com/chhylp123/hifi…"
English
Haoyu Cheng 리트윗함

The latest hifiasm can directly assemble standard @nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.
Mike Vella@vellamike
Exciting news! The latest hifiasm release from @ChengChhy and @lh3lh3 adds beta support for @nanopore simplex R10 reads. Initial results look very promising. 🚀 Check it out: github.com/chhylp123/hifi…"
English

@vellamike @ChengChhy @lh3lh3 @nanopore What was required to get it working? The commit message is not rich
English

@mprous1 Then I have no idea. Probably just give it a try. Hifiasm won't take too much time.
English

@ChengChhy i've got some datasets with very poor N50, 2 kb for example. So no dorado correct option. Certainly assemblies would not be very good in these cases, but would hifiasm be better than flye? Or hifiasm would not work at all with so short reads?
English

Hifiasm 0.21.0 has been released. It now has a beta module for direct assembly of ONT R10 simplex reads. Initial tests with regular simplex reads show very promising results! github.com/chhylp123/hifi…
English

@mprous1 We tested several datasets with about 30kb N50, and hifiasm worked well.
English

My new lab at @YaleBIDS is looking for a couple of postdocs, students and RAs in bioinformatics, genomics, machine learning and related fields (postdocs.yale.edu/postdoctoral-a…). Heartfelt thanks to my mentor @lh3lh3 and collaborators for their incredible support!
English
Haoyu Cheng 리트윗함

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

English

@basti_beier @HumanPangenome Oh yes, that is typo which should be fixed… Thanks for pointing that!
English

@ChengChhy @HumanPangenome I was speaking about this part, see picture. There you specify -m1000000 for 100kb (but actually the command would filter out all reads below a length of 1Mb not 100kb. (Similar with -m500000 and the 50kb limit)

English

Excited to share our new t2t assembly algorithm for diploid and polyploid genomes! Using 132 assembled haplotypes from the @HumanPangenome , we show that our approach is cost-efficient, robust, and could achieve t2t assemblies without high coverage reads arxiv.org/abs/2306.03399
English

@basti_beier @HumanPangenome We actually used the UL reads >=50kb/100kb for the assembly. There are very small fraction of UL reads that could be longer than 500kb/1Mb.
English

@ChengChhy @HumanPangenome Very nice preprint, just went through it and stumbled upon the settings in the supplements of filtering ultra-long reads. Looks like the commands there would display i) reads above 1 Mb ii) 500 kb instead of 100 kb / 50 kb. Is this just a typo?
English

@erikgarrison @XLR @nomad421 @lh3lh3 I checked the conda recipes: github.com/bioconda/bioco…, but seems there is nothing to set
English

@XLR @nomad421 @lh3lh3 @ChengChhy The problem is possibly because of a failure to use the correct set of SIMD instructions. Are you using multiple dispatch or some kind of fat binaries to deal with this?
English

@ChengChhy @lh3lh3 I guess I really only looked at the abstract here of this article - ncbi.nlm.nih.gov/pmc/articles/P… from 2019.
English

New hifiasm with the ultra-long integration is released! We tested it with four diploid human samples and got many T2T chromosomes. Any feedback will be much-appreciated @lh3lh3. Source code: github.com/chhylp123/hifi…
English

@subgenomes @lh3lh3 @PacBio @nanopore It is compatible with the old bin files, but still would be better to rerun the whole workflow from the raw reads. We would also be interested in if it could work for the polyploid genome (although it might have some parameters to be tuned).
English

@lh3lh3 @PacBio @nanopore Big news! Can it incorporate UL reads to a preexisting HiFi assembly or need to assemble raw? I'm working with a polyploid genome that I'd love to test this out on @ChengChhy @lh3lh3
English

Hifiasm HiFi+UL integration is ready for beta testing. This new mode takes @PacBio HiFi and @nanopore ultra-long reads as input and produces longer phased contigs. It also works with trio or Hi-C for chromosome-long phasing. Add option --ul to provide UL reads.
Haoyu Cheng@ChengChhy
New hifiasm with the ultra-long integration is released! We tested it with four diploid human samples and got many T2T chromosomes. Any feedback will be much-appreciated @lh3lh3. Source code: github.com/chhylp123/hifi…
English

@ChengChhy @lh3lh3 Might the ultra-long option also be integrated into hifiasm-meta?
English

