Sebastian Deorowicz

132 posts

Sebastian Deorowicz

@sdeorowicz

Data compression. Algorithms for genome sequencing compresion and analysis.

Gliwice, Poland Katılım Ağustos 2019

31 Takip Edilen387 Takipçiler

Sebastian Deorowicz@sdeorowicz·14 Nis

Now published in Bioinformatics: doi.org/10.1093/bioinf…

Sebastian Deorowicz@sdeorowicz

MDCompress (biorxiv.org/content/10.648…) is our recent proposal for storing molecular dynamics simulations. Try if you feel that your XTC files are too big or need random access features. Great collaboration with Travis Wheeler's lab.

English

164

Sebastian Deorowicz@sdeorowicz·14 Nis

10 years after the first FAMSA paper, its successor is now published in Nat Biotech! We believe that FAMSA2 can enable analyses of large protein collections that were previously unattainable. Thank you, @a_zielezinski and @cnotred, for great collaboration nature.com/articles/s4158…

English

3.1K

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·28 Mar

If you have HiFi or Nanopore R10 metagenomic data, try myloasm from Jim Shaw. You will probably find more complete circular contigs to higher resolution especially for R10 or environmental samples. Scalable to >500GB data. Written in Rust. Published in @NatureBiotech

Jim Shaw@jim_elevator

Myloasm, our long-read metagenome assembler, is now published! w/ Max Marin & @lh3lh3 Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers for helpful feedback. Link: rdcu.be/famFj

English

9.2K

Sebastian Deorowicz@sdeorowicz·23 Ara

English

264

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·30 Eyl

Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/on-…. Work with @QianAlvinQin1

English

334

41.2K

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·31 Tem

Longdust, a new tool to identify highly repetitive STRs, VNTRs, satellite DNA and other low-complexity regions (LCRs). Similar to SDUST but for long regions. github.com/lh3/longdust

English

197

24.6K

Sebastian Deorowicz@sdeorowicz·20 Tem

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.110… and GH repo: github.com/refresh-bio/FA…

English

6.1K

Sebastian Deorowicz retweetledi

Nature Methods@naturemethods·15 May

Vclust generates fast and accurate estimation of average nucleotide identity (ANI) for viral genomes, scaling clustering to millions of genomes. @a_zielezinski @AdamGudys @sdeorowicz @Piotr_Rozwalak @UAM_Poznan @polsl_pl @UniJena nature.com/articles/s4159…

English

3.9K

Sebastian Deorowicz@sdeorowicz·15 May

Vclust (the ultra-fast, high-accuracy tool for viral genome comparison & clustering) is now published: nature.com/articles/s4159… Great collaboration with @a_zielezinski, @AdamGudys, UAM guys, and Bas E.Dutilh

English

1.8K

Sebastian Deorowicz@sdeorowicz·26 Ara

Recently, our SPLASH paper (nature.com/articles/s4158…) was published in NatBiotech. Now, we release its extended version, sc-SPLASH (biorxiv.org/content/10.110…), which allows reference-free analysis of single-cell data. It was a great experience to work with our collaborators on that!

English

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·28 Kas

The latest hifiasm can directly assemble standard @nanopore simplex R10 reads, without HERRO correction or other preprocessing, to phased contigs of contiguity comparable to HiFi assembly. Like before, you can further add ultra-long, Hi-C or trio data for better assembly.

Mike Vella@vellamike

Exciting news! The latest hifiasm release from @ChengChhy and @lh3lh3 adds beta support for @nanopore simplex R10 reads. Initial results look very promising. 🚀 Check it out: github.com/chhylp123/hifi…"

English

183

22.3K

Sebastian Deorowicz@sdeorowicz·25 Kas

AGC 3.2 (assembled genome compressor) has been released. Better speed, better ratio (at least for bacteria genomes), optional low-memory decompression. github.com/refresh-bio/agc

English

3.9K

Sebastian Deorowicz retweetledi

Roozbeh Dehghannasiri@roozbehdn·9 Eki

Happy to share our latest paper with @marekkoki on SPLASH2 for ultra-efficient reference-free discovery directly on raw sequencing reads out in @NatureBiotech, supervised by @SalzmanLab and @sdeorowicz, and with great contributions from @TBaharav. nature.com/articles/s4158…

English

6.6K

Sebastian Deorowicz@sdeorowicz·24 Eyl

@_RongL @NatureBiotech @SalzmanLab It was a pleasure to work on this with @SalzmanLab team.

English

Sebastian Deorowicz retweetledi

Rong@_RongL·23 Eyl

New paper online in @NatureBiotech by @sdeorowicz group and @SalzmanLab: SPLASH2 speeds up analysis of sequence variation in massive datasets.

Nature Biotechnology@NatureBiotech

Scalable and unsupervised discovery from raw sequencing reads using SPLASH2 go.nature.com/3N1SGBL

English

517

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·4 Eyl

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

English

217

715

192.8K

Sebastian Deorowicz retweetledi

Heng Li@lh3lh3·25 Tem

Pangene now published in Bioinformatics: doi.org/10.1093/bioinf…. In addition to showcasing applications (see the 17q21.31 inversion below), we also reviewed the theoretical formulation of bidirected graphs and discussed the definition and the finding of "bubbles" in such graphs.

Heng Li@lh3lh3

Preprint on Exploring gene content with pangenome gene graphs: arxiv.org/abs/2402.16185. It describes pangene for building gene graphs and for calling gene-level variations which can be found at pangene.bioinweb.org. Pleasant collaboration with @maxgmarin and @MahaFarhat.

English

103

294

38.3K

Sebastian Deorowicz@sdeorowicz·10 Tem

I am happy to announce that ProteStAr, our compressor of CIF/PDB files with 3D atom coordinates, is now published at Bioinformatics. With this, you can store the whole ESM Atlas or AlphaFold DB in a few files (rather than 200M+) with fast random access. doi.org/10.1093/bioinf…

English

3.3K

Sebastian Deorowicz retweetledi

Andrzej Zielezinski@a_zielezinski·9 Tem

When writing bioinformatics tools, I often need unique IDs for things like temp directories. So, I created a Python package for generating fun & memorable IDs like "retired-nucleotide" or "funny-malware-7ab4" covering everything from sports to science. github.com/aziele/unique-…

English

567

Sebastian Deorowicz retweetledi

Andrzej Zielezinski@a_zielezinski·3 Tem

Excited to share Vclust! It's a fast and accurate tool for calculating intergenomic similarities (like ANI) and clustering virus/#phage genomes/contigs according to ICTV and MIUViG standards. 💻 Tool: github.com/refresh-bio/vc… 📄 Preprint: biorxiv.org/content/10.110… Thread! 1/6 ↓

English

103

13.8K

Keşfet

@a_zielezinski @cnotred @NatureBiotech @QianAlvinQin1 @AdamGudys @Piotr_Rozwalak @UAM_Poznan @polsl_pl