Sebastian Deorowicz

132 posts

Sebastian Deorowicz

Sebastian Deorowicz

@sdeorowicz

Data compression. Algorithms for genome sequencing compresion and analysis.

Gliwice, Poland Katılım Ağustos 2019
31 Takip Edilen387 Takipçiler
Sebastian Deorowicz retweetledi
Heng Li
Heng Li@lh3lh3·
If you have HiFi or Nanopore R10 metagenomic data, try myloasm from Jim Shaw. You will probably find more complete circular contigs to higher resolution especially for R10 or environmental samples. Scalable to >500GB data. Written in Rust. Published in @NatureBiotech
Jim Shaw@jim_elevator

Myloasm, our long-read metagenome assembler, is now published! w/ Max Marin & @lh3lh3 Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers for helpful feedback. Link: rdcu.be/famFj

English
0
32
92
9.2K
Sebastian Deorowicz
Sebastian Deorowicz@sdeorowicz·
MDCompress (biorxiv.org/content/10.648…) is our recent proposal for storing molecular dynamics simulations. Try if you feel that your XTC files are too big or need random access features. Great collaboration with Travis Wheeler's lab.
English
0
1
3
264
Sebastian Deorowicz retweetledi
Heng Li
Heng Li@lh3lh3·
Longdust, a new tool to identify highly repetitive STRs, VNTRs, satellite DNA and other low-complexity regions (LCRs). Similar to SDUST but for long regions. github.com/lh3/longdust
English
2
70
197
24.6K
Sebastian Deorowicz retweetledi
Heng Li
Heng Li@lh3lh3·
The latest hifiasm can directly assemble standard @nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.
Mike Vella@vellamike

Exciting news! The latest hifiasm release from @ChengChhy and @lh3lh3 adds beta support for @nanopore simplex R10 reads. Initial results look very promising. 🚀 Check it out: github.com/chhylp123/hifi…"

English
2
57
183
22.3K
Sebastian Deorowicz
Sebastian Deorowicz@sdeorowicz·
AGC 3.2 (assembled genome compressor) has been released. Better speed, better ratio (at least for bacteria genomes), optional low-memory decompression. github.com/refresh-bio/agc
English
2
15
27
3.9K
Sebastian Deorowicz retweetledi
Heng Li
Heng Li@lh3lh3·
Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613
Heng Li tweet media
English
9
217
715
192.8K
Sebastian Deorowicz retweetledi
Heng Li
Heng Li@lh3lh3·
Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.
Heng Li tweet media
Heng Li@lh3lh3

Preprint on Exploring gene content with pangenome gene graphs: arxiv.org/abs/2402.16185. It describes pangene for building gene graphs and for calling gene-level variations which can be found at pangene.bioinweb.org. Pleasant collaboration with @maxgmarin and @MahaFarhat.

English
1
103
294
38.3K
Sebastian Deorowicz
Sebastian Deorowicz@sdeorowicz·
I am happy to announce that ProteStAr, our compressor of CIF/PDB files with 3D atom coordinates, is now published at Bioinformatics. With this, you can store the whole ESM Atlas or AlphaFold DB in a few files (rather than 200M+) with fast random access. doi.org/10.1093/bioinf…
English
0
13
38
3.3K
Sebastian Deorowicz retweetledi
Andrzej Zielezinski
Andrzej Zielezinski@a_zielezinski·
When writing bioinformatics tools, I often need unique IDs for things like temp directories. So, I created a Python package for generating fun & memorable IDs like "retired-nucleotide" or "funny-malware-7ab4" covering everything from sports to science. github.com/aziele/unique-…
English
0
2
8
567