Melania Nowicka
174 posts

Melania Nowicka
@MelaniaNowicka
Researcher @HPI_DE @RenardLab #machinelearning #syntheticbiology
Berlin, Germany Katılım Mart 2019
313 Takip Edilen129 Takipçiler

🎉Our paper 'Beware of data leakage from protein LLM pretraining' was accepted at #MLCB2024! Meet Leon and Tobias at the spotlight talk and poster session on Thursday in Seattle to chat about how to address this important problem!! @jmbartoszewicz x.com/jmbartoszewicz…
Jakub Bartoszewicz@jmbartoszewicz
🚨Beware of data leakage from protein LLM pretraining! @MelaniaNowicka & I kept discussing how data leakage from pretrained embeddings can inflate performance metrics. So, our @HPI_DE students Leon, Tobias and An actually measured it! Check it out: biorxiv.org/content/10.110… 1/3
English

@phil_fradkin @jmbartoszewicz @HPI_DE Thank you for your comment! If you want to measure the performance for novel proteins, you should likely simulate a dedicated environment for the evals. Otherwise, the model may be lost while trying to make a prediction for sth a bit out of learned distribution.
English

@jmbartoszewicz @MelaniaNowicka @HPI_DE Thanks this is interesting work
Alternatively, can this can be interpreted as intention of pre-training? The goal of pre-training to learn the underlying data distribution, which can help in generalization when experimental data isn't available for the all of pre-training data.
English
Melania Nowicka retweetledi

🚨Beware of data leakage from protein LLM pretraining!
@MelaniaNowicka & I kept discussing how data leakage from pretrained embeddings can inflate performance metrics. So, our @HPI_DE students Leon, Tobias and An actually measured it! Check it out: biorxiv.org/content/10.110… 1/3

English

@stevain No worries, this is THE experience, not a failure!
English

@iskander Hiring software engineers on board to support PhD students/post-docs on time-limited contracts would be one of the solutions, I guess. I know, limited reeeesourceees...
English
Melania Nowicka retweetledi

Late, happy news! @MelaniaNowicka and I started our postdoc adventure @MIT_CSAIL, with @BarzilayRegina & @RenardLab. Thx to the Designing for Sustainability Program of @MITDesignAcad & @HPI_DE, we explore the use of AI to design therapeutic agents to combat antibiotic resistance!

English
Melania Nowicka retweetledi

Can prompt-tuning become a low-cost alternative to fine-tuning of protein LMs for sequence generation? NLP metrics are not enough - biological eval is a must! Check out our paper at #MLCB2023, by Andrea&Kevin, advised by @MelaniaNowicka & me at @RenardLab biorxiv.org/content/10.110…
English
Melania Nowicka retweetledi

Big thanks to the #NCBICGR team for highlighting the work of @jmbartoszewicz, @ferbsx, and @MelaniaNowicka on detecting DNA of novel fungal pathogens using ResNets & a curated fungi-hosts data collection (pubmed.ncbi.nlm.nih.gov/36124807/).
#NCBI #Fungi #ResNets #Pathogens #Bioinformatics
NCBI@NCBI
Learn how resources in the #NCBICGR Toolkit could impact the discovery of new fungal pathogens in this CGR Impact Spotlight based on a published article. Check it out! ow.ly/zE5l50PXNrF
English
Melania Nowicka retweetledi

Are you still using Kraken or Centrifuge for long-read taxonomic classification? Or are you using large Bloom Filters for biological data analysis? If the answer is yes, you should attend the talk from Jens-Uwe Ulrich at 2:10 p.m. in the Lumière Auditorium #ISMBECCB2023 #HiTSeq23
English
Melania Nowicka retweetledi

Find Katharina Baum on Monday at 6 PM at A-321 where she presents her poster on our tool, #SimbaML🦁: Supporting informed machine learning by ordinary differential equation model simulations'. Start including #prior #knowledge into your ML models in a breeze!

English
Melania Nowicka retweetledi

How do parts of your model contribute to its other parts and your final predictions? Talk to @jmbartoszewicz on Monday, 6pm where he presents the poster A-349 on HOPS: Higher-order partial Shapley values tracing deep contribution flows in neural networks! 🐇 #ISMBECCB2023
English
Melania Nowicka retweetledi

Let's talk about #phage and #ML! Come by on Monday 6pm and chat with @MelaniaNowicka, presenting her & @jmbartoszewicz's poster A-348: Interpretable prediction of phage life cycle from unannotated DNA sequences! 🦠🧬 #ISMBECCB2023
English

@jmschreiber91 @anshulkundaje @vagar112 @aaron_mckenna You know what we recently did? I gave an interactive lecture on the pitfalls paper you wrote and the students loved it! This was a part of a larger seminar on mishaps in ml. Great paper for a discussion with the students!
English

@anshulkundaje @vagar112 @aaron_mckenna I think we agree? I'm saying that you need to know a variety of ML methods, even those not currently popular, so that you can use the right tool for the task as opposed to just throwing the current hot topic at the problem.
English

@vagar112 After several years in bioinfo what I feel my curriculum lacked the most were more statistics, math, journal clubs and hands on research projects instead of copy-pasting samtools commands from a protocol (which I remember none of today).
English

@Geneticdesigner How do we genetically modify a cat into a fractal cat?
English









