Alexander Karollus

64 posts

Alexander Karollus

@AlexKarollus

Katılım Eylül 2022

35 Takip Edilen134 Takipçiler

Alexander Karollus retweetledi

Žiga Avsec@Avsecz·28 Oca

AlphaGenome is out in @nature today along with model weights! 🧬 📄 Paper: nature.com/articles/s4158… 💻 Weights: github.com/google-deepmin… Getting here wasn’t a straight path. We sat down @googledeepmind to discuss the story behind the model, paper & API: youtu.be/V8lhUqKqzUc

YouTube

English

482

1.9K

223.6K

Alexander Karollus retweetledi

Johannes Hingerl@thisisjohahi·24 Eki

I'm excited that our paper, "scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution," has been published in Nature Methods! scooby is a new deep-learning framework to understand how DNA sequence shapes gene expression in individual, single cells.

English

131

Alexander Karollus retweetledi

Johannes Hingerl@thisisjohahi·13 Eki

Happy to share that Flashzoi is now published! We enhanced Borzoi with RoPE & FlashAttention for >3x faster training/inference & 2.4x reduction in memory usage. This brings large-scale genomic analysis and fine-tuning within reach of academic budgets. 📄: doi.org/10.1093/bioinf…

English

217

Alexander Karollus retweetledi

Jun Cheng@s6juncheng·25 Haz

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6

English

209

913

87.2K

Alexander Karollus retweetledi

Johannes Hingerl@thisisjohahi·23 Ara

Introducing Flashzoi⚡! We’ve upgraded the Borzoi model with rotary pos. encodings and FlashAttention, resulting in a 3x speedup with similar or better accuracy for faster variant effect prediction or model development, and more efficient genomic analysis biorxiv.org/content/10.110…

English

3.2K

Alexander Karollus retweetledi

Muhammed Hasan ÇELİK@mh_celik·2 Ara

Excited to share our preprint: Efficient Inference, Training, and Fine-tuning of Protein Language Models! ⚡We introduce advanced techniques to make large protein models accessible to more researchers. Read it here: doi.org/10.1101/2024.1…

English

2.2K

Alexander Karollus retweetledi

gagneurlab@gagneurlab·4 Kas

Information leakage due to homology: A pervasive and non-trivial issue in bioinformatics. Excited to hear about @muntakim_rafi's insights and recommendations for sequence-based modeling on Wednesday at the Kipoi seminar.

Kipoi@KipoiZoo

Join us for the next Kipoi Seminar with with Abdul Muntakim Rafi (Rafi) @muntakim_rafi @CarldeBoerPhD @UBC! 👉Detecting and avoiding homology-based data leakage in genome-trained sequence models 📅Wed Nov 6, 5:30pm CET 🧬tum-conf.zoom.us/meeting/regist…

English

2.1K

Alexander Karollus retweetledi

Laura Martens (@lauradmartens.bsky.social)@lauradmartens·24 Eyl

Excited to present “scooby”, which models multi-omic profiles (scRNA-seq coverage & scATAC-seq insertions) directly from 500 kb DNA sequence at single-cell resolution. This was a fantastic collaboration co-led with @thisisjohahi. biorxiv.org/content/10.110…

Laura Martens (@lauradmartens.bsky.social) tweet media

English

223

31.1K

Alexander Karollus@AlexKarollus·1 Ağu

@sokrypton @pedrotomazsilva @anshulkundaje I see, but my point there was more that the DNA seqs we look at generally aren‘t one „unit“ like one protein, but a collection of elements that need not be linked. If you have high within-element dependency, then centering will distribute this signal intra-element or?

English

Sergey Ovchinnikov@sokrypton·1 Ağu

@AlexKarollus @pedrotomazsilva @anshulkundaje The point of centering isn't to remove the conservation signal, but to separate it. When you compute the probabilities you can always add it back in: x@2body + 1body.

English

277

Alexander Karollus retweetledi

Pedro Tomaz da Silva@pedrotomazsilva·29 Tem

Have you ever wondered what the genome looks like through the eyes of a DNA language model? In our newest preprint we use DNA LMs to study nucleotide dependencies in the genome, revealing functional elements, characterizing variants and evaluating DNA LMs tinyurl.com/6wbwjaf4

English

200

27.9K

Alexander Karollus@AlexKarollus·1 Ağu

@sokrypton @pedrotomazsilva @anshulkundaje Also we often see broad interactions that represent logically linked elements. E.g. in fig3B, the splice sites interacts with the intron and CDS. Centering this would enforce a compensating negative interaction between splice-site and intergenic region that I find hard to justify

English

132

Alexander Karollus@AlexKarollus·1 Ağu

@sokrypton @pedrotomazsilva @anshulkundaje One of the things we want the model to do is tell us which positions are even doing something - anything - at all. And we show that the row-sum seems a decent proxy of this, sometimes quite a bit better than „traditional“ conservation.

English

159

Alexander Karollus@AlexKarollus·31 Tem

@anshulkundaje @sokrypton @pedrotomazsilva With that caveat, on the lac element there are some *hints* of it maybe picking up on the link between binding sites. But more concrete answer requires a good method to inspect content of final embeddings I think

English

Alexander Karollus@AlexKarollus·31 Tem

@anshulkundaje @sokrypton @pedrotomazsilva Hard to say. Evo only gets context from 5‘, so must see a few bases of an element before it „knows“ what its seeing. For short & mobile motifs, its quite likely that it only knows it has seen a motif ex-post, so it might be reflected in final embedding, but not in local nuc probs

English

103

Alexander Karollus@AlexKarollus·31 Tem

@sokrypton @pedrotomazsilva Cool! For us Evo generally gets all tRNA hairpins in E. coli - not just local. IIRC it even gets the weird selenocysteine tRNA mostly right. I think Evo had some bug in their huggingface model for a while, this might explain the difference.

English

173

Sergey Ovchinnikov@sokrypton·30 Tem

@pedrotomazsilva Nice! I did a similar analysis but just for evo. Cool to see comparison to other methods! x.com/sokrypton/stat…

Sergey Ovchinnikov@sokrypton

I tried running our categorical Jacobian method (for extracting coevolution signal from language models) on Evo from @BrianHie @pdhsu on the 16S rRNA. It appears to pickup on local hairpins 🤓(1/3).

English

2.4K

Alexander Karollus retweetledi

Munich RNA Club@MunichRNA·2 Şub

Hope you are ready for our kick-off meeting #MunichRNA21F! Come and hear the great research by @shu_bhayankar from @ProtzerLab and @AlexKarollus from @gagneurlab . See you very soon! 📅21F 3pm 📍@HelmholtzMunich

English

2.3K

Alexander Karollus@AlexKarollus·5 Ara

@vagar112 @NatureGenet I think the fact that now people are seriously asking questions like „can DL models predict personalized expression without retraining“ implicitly demonstrates the advance Enformer was. Even if its not there yet, it is at least conceivable - I dont think it really was before

English

993

Alexander Karollus@AlexKarollus·21 Ağu

@Bun_Without_B Sorry for the delayed response, there is a github repo now and the models are on hugginface, see: github.com/gagneurlab/Spe…

English

139

An Hoang@Bun_Without_B·31 Tem

@AlexKarollus very cool paper! Is there a github repo somewhere that have snippets of code to get embedding of sequence/ motifs? TIA!

English

134

Alexander Karollus@AlexKarollus·24 Tem

Cracking the regulatory code: We have genomes for 1000s of species, but ENCODE only for 2 – what do we do? Natural language models have shown that syntax and semantics can be learned from text alone. Can we do the same for genomes?⬇️ biorxiv.org/content/10.110…

English

119

20.9K

Alexander Karollus@AlexKarollus·24 Tem

In all these tasks, providing the model with species information was crucial. We expect that species-aware DNA language models leveraging massive sequencing projects will prove a powerful tool to investigate understudied species.

English

509

Alexander Karollus@AlexKarollus·24 Tem

Finally, our models have learned representations that can boost supervised learning. In fact, we outperform SOTA on gene expression and mRNA half-life tasks using simple regression on the last layers embeddings.

English

768

Keşfet

@Nature @GoogleDeepMind @muntakim_rafi @thisisjohahi @sokrypton @pedrotomazsilva @anshulkundaje @shu_bhayankar