Jeroen Van Goey

946 posts

Jeroen Van Goey banner
Jeroen Van Goey

Jeroen Van Goey

@BioGeek

Staff Research Engineer in BioAI at @InstaDeepAI (part of @BioNTech_Group) ML for de novo peptide sequencing. https://t.co/KOjeWuazsk

Cape Town (🇧🇪➡️🇿🇦) Inscrit le Ekim 2006
6.3K Abonnements943 Abonnés
Jeroen Van Goey
Jeroen Van Goey@BioGeek·
@rasbt Which software do you use to create your architecture diagrams?
English
0
0
0
64
Jeroen Van Goey retweeté
Vivek Subbiah, MD
Vivek Subbiah, MD@VivekSubbiah·
Wow! Cool--> #PrecisionMedicine in action 👉Individualized mRNA vaccines evoke durable T cell immunity in adjuvant Triple-negative breast cancer #TNBC @Nature @OncoAlert 👉Individualized neoantigen mRNA vaccine in 14 patients with TNBC following surgery and after neoadjuvant or adjuvant therapy 👉11/14 patients remained relapse-free for up to six years post-vaccination. nature.com/articles/s4158…
Vivek Subbiah, MD tweet media
English
1
17
36
3.3K
Jeroen Van Goey retweeté
Eric Topol
Eric Topol@EricTopol·
The unfounded move by @HHSGov against mRNA vaccines will hurt our future potent immune therapy vs cancer. Another point of progress for triple-negative breast cancer with individualized neoantigen mRNA vaccines today @Nature Adds to successful pancreatic, renal cell, melanoma reports nature.com/articles/s4158…
Eric Topol tweet media
English
49
233
754
30.2K
Jeroen Van Goey retweeté
InstaDeep
InstaDeep@instadeepai·
NTv3 is the latest addition to the Nucleotide Transformer family and our unified DNA foundation model for long-context genome understanding and design. 🧬
English
2
8
9
447
Benjamin Perry
Benjamin Perry@bots_and_bits·
AlphaFold 3 just got a massive speed boost. 🚀 We’re introducing AlphaFast: a GPU-accelerated framework that cuts AF3 inference from >10 mins to ~25 seconds on a single GPU–a 22.8x speedup–without losing structural accuracy. More details below! 1/6 🧵
Benjamin Perry tweet media
English
6
100
711
37.6K
Verena Resch
Verena Resch@luminous_lab·
🧪Help needed! I am currently gathering ideas for a new and updated Blender course tailored for chemists. Besides importing molecules into Blender, what topics or features would interest you the most? Any input is highly appreciated 😊
English
23
17
170
10.9K
sk
sk@compchemm·
Rebuilt the latest PyRosetta4(v2026-02-06), finally fixed the GIL(Global Interpreter Lock) github.com/ullahsamee/PyR…
sk tweet media
English
4
4
70
3K
BlindVia
BlindVia@blind_via·
@TexasTwerp Still no compelling options available. Excited I guess, cause I keep checking around to see what the PCB AI companies keep promoting and what they have done.
English
3
0
12
1.3K
Jeroen Van Goey retweeté
Jeroen Van Goey retweeté
Biomolecules
Biomolecules@Pastel·
Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing. arxiv.org/abs/2506.13485
English
0
1
1
52
Jeroen Van Goey retweeté
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
MassNet: billion-scale AI-friendly mass spectral corpus enables robust de novo peptide sequencing 1.MassNet introduces the largest AI-ready mass spectrometry dataset to date, containing 1.54 billion MS/MS spectra and 558 million PSMs across 35 species, including human, mouse, plants, and microbes. This dataset is specifically designed to support deep learning in proteomics. 2.A major innovation is the use of Mass Spectrometry Data Tensor (MSDT), a compact, structured Parquet-based format that enables high-throughput, GPU/TPU-friendly data loading—overcoming the I/O bottlenecks of traditional formats like mzML and MGF. 3.MassNet facilitates robust de novo peptide sequencing with the introduction of XuanjiNovo, a non-autoregressive Transformer model enhanced by curriculum learning and mass constraint decoding, yielding both stability and high inference efficiency. 4.XuanjiNovo achieves state-of-the-art peptide recall across a 15-species benchmark, outperforming PrimeNovo, InstaNovo, and other top models. On human and mouse datasets, it improves recall by 38.8% to 144.3% over previous best-performing models. 5.A key feature is its ability to accurately resolve near-isobaric amino acids (e.g., K vs Q, F vs oxidized M), and generalize well across peptides of different lengths, especially those under 24 amino acids. 6.The model’s accuracy scales with training data: XuanjiNovo trained on 100M PSMs (vs 30M) achieves 0.82 amino acid accuracy and 0.68 peptide recall—over 20% higher than PrimeNovo. 7.Fine-tuning with 30M PSMs from MassIVE-KB allows XuanjiNovo to adapt across MS platforms, achieving 60–65% gains in peptide recall for certain microbial datasets and boosting robustness on older instrumentation. 8.Even without fine-tuning, XuanjiNovo performs well on external datasets, achieving 0.68 peptide recall on Orbitrap Fusion human HCC samples and 0.62 on synthetic peptides—substantially outperforming previous models. 9.Curriculum learning plays a pivotal role: early training starts with partially revealed sequences, gradually increasing difficulty. This improves convergence on complex, multi-distribution datasets where other NAT models often fail. 10.With standardized data structure, full-spectrum metadata, and high species diversity, MassNet is positioned as the “ImageNet of proteomics,” providing a reproducible foundation for deep learning in protein science. 📜Paper: biorxiv.org/content/10.110… #proteomics #deeplearning #massspectrometry #AIbioinformatics #denovopeptidesequencing #XuanjiNovo #MassNet
Biology+AI Daily tweet media
English
0
4
20
1.3K
Jeroen Van Goey retweeté
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Improvements to Casanovo, a deep learning de novo peptide sequencer 1. Casanovo, a state-of-the-art deep learning model for de novo peptide sequencing from mass spectrometry data, has been significantly enhanced in version 5.0. The new version improves the interpretability of peptide scores, speeds up training and prediction runtimes, and introduces a database search mode, making it more versatile and user-friendly. 2. One of the key innovations is the modification of the peptide scoring procedure. By switching from an arithmetic mean to a product of amino acid scores, Casanovo now provides better-calibrated confidence estimates, which are crucial for distinguishing between correct and incorrect predictions. 3. The new version of Casanovo achieves an impressive 11.0-fold speedup in training and a 3.5-fold speedup in inference compared to the previous version. This improvement is largely due to upgrades in the beam search implementation and the use of the Lance file format for faster data access. 4. Casanovo now supports database search functionality through a new command, allowing it to be used as a powerful scoring function for peptide-spectrum matches (PSMs). It outperforms traditional scoring functions like XCorr, detecting significantly more peptides at a given false discovery rate (FDR). 5. To facilitate adoption and interpretation, Casanovo has been integrated with visualization tools such as PDV and Limelight. These tools allow users to easily visualize and compare peptide-spectrum matches, enhancing the interpretability of the results. 6. Casanovo is now available as a Docker container and can be run using Nextflow, simplifying the setup and execution of de novo sequencing workflows. This containerization ensures that users can run Casanovo without worrying about dependencies or environment configurations. 7. The improvements in Casanovo aim to make de novo sequencing more accessible and reliable for applications such as metaproteomics, antibody sequencing, and the discovery of novel peptide sequences in standard proteomics analyses. 💻Code: github.com/Noble-Lab/casa… 📜Paper: biorxiv.org/content/10.110… #Casanovo #DeepLearning #DeNovoSequencing #Proteomics #MassSpectrometry #Bioinformatics
Biology+AI Daily tweet media
English
1
1
5
1K
Jeroen Van Goey retweeté
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Bidirectional Representations Augmented Autoregressive Biological Sequence Generation: Application in De Novo Peptide Sequencing 1. The paper introduces CROSSNOVO, a hybrid framework that combines autoregressive (AR) and non-autoregressive (NAR) models to enhance biological sequence generation, particularly in de novo peptide sequencing. This innovative approach addresses the limitations of traditional AR models by integrating bidirectional context from NAR models, significantly improving sequence generation accuracy. 2. CROSSNOVO features a shared input encoder coupled with two specialized decoders—an AR decoder for high-fidelity sequential generation and a NAR decoder for capturing global sequence context. The key innovation is a novel cross-decoder attention module that allows the AR decoder to iteratively query and integrate bidirectional features from the NAR decoder, enriching its predictions. 3. The training strategy of CROSSNOVO includes importance annealing for balanced multi-objective optimization and cross-decoder gradient blocking to ensure stable learning. This tailored training approach allows the model to leverage the strengths of both AR and NAR models, achieving superior performance on diverse downstream data. 4. Evaluations on a demanding 9-species benchmark demonstrate that CROSSNOVO substantially outperforms both AR and NAR baselines. The model uniquely harmonizes AR stability with NAR contextual awareness, delivering robust and superior performance across various species, making it a valuable tool for wide-ranging biological applications. 5. The paper also explores the application of CROSSNOVO in downstream tasks such as identifying peptides in animal antibody data, where it shows significant improvements over baseline models. This highlights the model’s generalizability and potential impact in areas like antibody sequencing and post-translational modification prediction. 6. The authors provide a detailed analysis of the model’s performance, including precision-coverage curves and the influence of different beam sizes on prediction accuracy. These analyses further validate the effectiveness of CROSSNOVO in enhancing de novo peptide sequencing. 7. The research concludes that CROSSNOVO advances biological sequence modeling techniques by introducing a novel architectural paradigm that augments AR models with enhanced bidirectional understanding. This work contributes to the development of more accurate and efficient tools for complex sequence generation tasks in computational biology. 📜Paper: arxiv.org/abs/2510.08169… #DeNovoPeptideSequencing #BiologicalSequenceGeneration #HybridModel #ComputationalBiology #MachineLearning
Biology+AI Daily tweet media
English
0
6
16
1.5K
Jeroen Van Goey retweeté
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models 1. A groundbreaking study introduces reflection pretraining to enhance reasoning capabilities in biological sequence models, particularly for de novo peptide sequencing. This approach enables models to generate intermediate reasoning steps, significantly improving accuracy and robustness. 2. The core innovation lies in augmenting protein sequence models with non-answer reasoning tokens, allowing them to self-correct and reflect on errors during training. This method overcomes the limited expressiveness of biological languages compared to natural languages. 3. Experimental results demonstrate substantial performance gains over standard pretraining methods. Increasing error injection during training further boosts the model's ability to self-correct, highlighting the effectiveness of reflection pretraining. 4. The study shows that reflection pretraining not only enhances reasoning but also improves resistance to overfitting and supports human-in-the-loop interactions, bridging the gap between biological and natural language models. 5. All code, trained model weights, and result outputs are publicly available on the GitHub repository, promoting transparency and reproducibility in the field of computational biology. 📜Paper: arxiv.org/abs/2512.20954… #ComputationalBiology #AIinBiology #ProteinSequencing #DeepLearning #BiologicalModeling
Biology+AI Daily tweet media
English
0
8
27
2.2K
Jeroen Van Goey retweeté
Systems Biology
Systems Biology@XTXI·
AbNovoBench: a resource and benchmarking platform for monoclonal antibody de novo sequencing Jiang, W., Luo, L., Xiong, Y., Xiao, J., Lin, Z., Huang, L., Zhang, S., Wang, J., Wang, C., Xia, N., Yuan, Q., Yu, R. biorxiv.org/content/10.648…
Systems Biology tweet media
Filipino
0
2
3
100