Quentin Fournier

33 posts

Quentin Fournier

Quentin Fournier

@qfournier2

Research Fellow at @Mila_Quebec working on language models for #drugdiscovery 🧬

Montreal Katılım Nisan 2023
34 Takip Edilen55 Takipçiler
Quentin Fournier retweetledi
Darshan Patil
Darshan Patil@dapatil211·
🧬 New paper Scientific datasets evolve as science evolves. With proteins, new sequences get added, annotations get corrected, and noisy entries get curated out. Introducing CoPeP, a continual-pretraining benchmark for protein LMs. Details 🧵 1/n
Darshan Patil tweet media
English
2
29
84
8.5K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
Molecules speak in atoms and bonds. LLMs can learn that language. Even with SOTA #denovo design, our largest molecular LLM study finds a plot twist: early saturation, weak scaling, and proxy metrics that mislead on real tasks! Led by @kchitsaz and @roshan_msb 🧵 More in thread:
English
2
10
38
4.9K
Quentin Fournier retweetledi
Jack Morris
Jack Morris@jxmnop·
In the beginning, there was BERT. Eventually BERT gave rise to RoBERTa. Then, DeBERTa. Later, ModernBERT. And now, NeoBERT. The new state-of-the-art small-sized encoder:
Jack Morris tweet media
English
25
68
912
76.4K
Quentin Fournier retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Structure-Aligned Protein Language Model 1.Structure-Aligned Protein Language Model (SaPLM) augments sequence-only protein language models (pLMs) with structural knowledge by aligning their representations with those from pre-trained protein graph neural networks (pGNNs), achieving large gains on structure-aware tasks without compromising sequence generality. 2.The key innovation is a dual-task framework: (1) a latent-level contrastive learning task that aligns residue embeddings from the pLM and pGNN across different proteins, capturing inter-protein structural patterns, and (2) a physical-level task that predicts structural tokens from pLM outputs, encoding intra-protein structural geometry. 3.To avoid noisy or overly simple residues during training, a residue loss selection module is introduced. It selects residue losses with high excess learning potential by comparing the current model’s losses to a high-quality reference model trained on curated structures. 4.Applying this structure alignment method to ESM2 and AMPLIFY yields SaESM2 and SaAMPLIFY, which significantly outperform their unaligned counterparts on multiple benchmarks. SaESM2 improves contact prediction P@L/5 by 13% and stability prediction Spearman correlation by 4.5%. 5.Unlike prior models that use structural input during inference (e.g., structure tokens), SaPLM requires only sequences at inference time, maintaining the generality and scalability of pLMs while enhancing structural reasoning via pretraining. 6.On mutation effect prediction tasks, SaESM2 achieves the highest performance on binding fitness (GB1) and stability prediction, outperforming even ESM2-s and ISM models explicitly trained on these tasks. 7.SaESM2 also achieves state-of-the-art results on 6 out of 9 downstream property prediction tasks (e.g., metal binding, DeepLoc, EC number classification), showing the effectiveness of structural alignment for biologically relevant function prediction. 8.Ablation studies confirm that both the latent- and physical-level tasks are critical for performance, with the latent-level alignment contributing the most. Replacing the GearNet embeddings with AlphaFold Evoformer embeddings significantly degrades performance. 9.Residue embedding visualization using UMAP shows that aligned models (SaESM2, SaAMPLIFY) learn more structured latent spaces, with better separation of secondary structure types and physicochemical similarity among amino acids. 10.Structure alignment emerges as a generalizable and efficient way to enrich pLMs with structural context using only sequence input. SaESM2 and SaAMPLIFY set a new benchmark for structure-aware yet sequence-only protein modeling. 💻Code: github.com/chandar-lab/AM… 📜Paper: arxiv.org/abs/2505.16896 #ProteinLLM #StructureAlignment #pLM #ProteinStructure #ContrastiveLearning #GNN #ComputationalBiology #SaESM2 #AMPLIFY #AI4Science
Biology+AI Daily tweet media
English
1
18
102
6.3K
Quentin Fournier retweetledi
Carl Doersch
Carl Doersch@CarlDoersch·
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io 🧵
English
13
55
377
45.2K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
Can better architectures & representations make self-play enough for zero-shot coordination? 🤔 We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! 🚀 #ICLR2025 1/n
Sarath Chandar tweet media
English
1
9
49
9.3K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
In my lab, we have not one but four open postdoc positions! These positions cover developing foundation models for text, proteins, small molecules, genomic data, time series data, and astrophysics data! If you have strong research expertise and a PhD in LLMs and Foundation Models, and you are willing to learn about domain-specific problems and collaborate with domain experts, this is an ideal position for you! Actual links in the next tweet! 1/2
Sarath Chandar tweet media
English
3
34
116
28.4K
Quentin Fournier retweetledi
Quentin Fournier retweetledi
1LittleCoder💻
1LittleCoder💻@1littlecoder·
A new BERT baby! If you are still using the huge RoBERTa or DeBERTa for your NLP tasks, here's NeoBERT!
1LittleCoder💻 tweet media
English
2
11
79
8K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
I am excited to share that our BindGPT paper won the best poster award at @RealAAAI #AAAI2025! Congratulations to the team! Work led by @artemZholus!
Sarath Chandar tweet media
Sarath Chandar@apsarathchandar

What's the foundational model for generative chemistry? Our work, BindGPT, is a good candidate, and it will be presented at #AAAI2025 today! We built a simple transformer language model that beats diffusion models by just generating 3D molecules as text! Led by @artemZholus 1/n

English
3
3
65
5.6K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
2025 BERT is NeoBERT! We have fully pre-trained a next-generation encoder for 2.1T tokens with the latest advances in data, training, and architecture. This is a heroic effort from my PhD student @lo_LB_La in collaboration with @qfournier2 and Mariam El Mezouar (1/n)
Sarath Chandar tweet media
English
6
20
82
30K
Quentin Fournier retweetledi
CoLLAs 2026
CoLLAs 2026@CoLLAs_Conf·
📢 Exciting News! The Fourth Conference on Lifelong Learning Agents (CoLLAs 2025) will be held at the University of Pennsylvania (@Penn) in Philadelphia, USA 🇺🇸 🗓️ Important Dates: Abstract Deadline: Feb 21, 2025 Submission Deadline: Feb 26, 2025 Conference Dates: Aug 11 - Aug 14, 2025 We invite submissions that present new theories, methodologies, applications, or insights into algorithms and benchmarks designed for non-i.i.d. and non-stationary settings. Accepted papers will be published in the Proceedings of Machine Learning Research (PMLR). 📚 Full CFP: lifelong-ml.cc/Conferences/20… #CoLLAs2025 #AI #MachineLearning #ContinualLearning #LifelongLearning #ResearchConference #CallForPapers #NonStationaryLearning
CoLLAs 2026 tweet media
English
1
25
71
57.4K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
After many years, I will be attending @emnlpmeeting EMNLP! @ChandarLab will present three papers at EMNLP 2024 (details in the thread). I am also recruiting Ph. D.s/postdocs, so please email me if you are attending EMNLP and are interested in chatting about these positions! 1/n
Sarath Chandar tweet media
English
1
4
15
1.2K
Quentin Fournier
Quentin Fournier@qfournier2·
🚨 Exciting news! Our state-of-the-art protein language models are now available on Hugging Face 🤗 Discover AMPLIFY at huggingface.co/chandar-lab and start experimenting today!
English
0
0
2
2K
Quentin Fournier retweetledi
Sarath Chandar
Sarath Chandar@apsarathchandar·
Are you finishing your PhD in LLMs and are looking for a postdoctoral position? Come join Prof. Amal Zouaq and me at @Mila_Quebec! We have multiple openings for postdoctoral candidates in NLP/LLM. Details: shorturl.at/gwpgG Deadline: 30th October. Please retweet for maximum reach!
Sarath Chandar tweet media
English
0
5
17
3K
Quentin Fournier retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Recently added into Database 1. Protein-Mamba: Biological Mamba Models for Protein Function Prediction arxiv.org/abs/2409.14617 2. Protein Language Models: Is Scaling Necessary? biorxiv.org/content/10.110… 3. PepINVENT: Generative peptide design beyond the natural amino acids arxiv.org/abs/2409.14040 4. Navigating Chemical Space with Latent Flows arxiv.org/abs/2405.03987 5. DiffPaSS -- High-performance differentiable pairing of protein sequences using soft scores arxiv.org/abs/2409.16142 6. Structure-based Drug Design with Equivariant Diffusion Models arxiv.org/abs/2210.13695 7. dnaGrinder: a lightweight and high-capacity genomic foundation model arxiv.org/abs/2409.15697 8. Evaluating the representational power of pre-trained DNA language models for regulatory genomics biorxiv.org/content/10.110…
English
2
14
62
7.9K
Quentin Fournier retweetledi
owl
owl@owl_posting·
biorxiv.org/content/10.110… As previously mentioned, AlphaFold2’s confidence has been used to predict protein disorder (Ruff and Pappu, 2021), but we show that AlphaFold2 cannot differentiate between disordered proteins and non-protein sequences, whereas AMPLIFY can. Figure 2H demonstrates that AMPLIFY embedding similarity can separate human disordered proteins (> 25% intrinsically disordered in the DisProt database) (Aspromonte et al., 2024) from hypothetical proteins (PE=5 and no annotation for localization) at a ROC-AUC of 0.94, compared to a score of 0.44 when using AlphaFold2’s pLDDT confidence metric. This demonstrates that training solely on available structures is insufficient for a comprehensive understanding of protein behavior. neat
English
5
12
78
13.4K