Quentin Fournier retweetledi
Quentin Fournier
33 posts

Quentin Fournier
@qfournier2
Research Fellow at @Mila_Quebec working on language models for #drugdiscovery 🧬
Montreal Katılım Nisan 2023
34 Takip Edilen55 Takipçiler
Quentin Fournier retweetledi

Congrats to Prashant (@prashantg_17), Davide (@DavideBald42296 ), Quentin (@qfournier2), and Sarath (@apsarathchandar) on CADmium, a new method that rethinks text-to-CAD to generate high-fidelity 3D models! Read their blog post: mila.quebec/en/article/imp…
English
Quentin Fournier retweetledi

I am recruiting several graduate students (both MSc and PhD level) for Fall 2026 @ChandarLab! The application deadline is December 01. Please apply through the @Mila_Quebec supervision request process here: mila.quebec/en/prospective….
More details about the recruitment process here: chandar-lab.github.io/join/

English
Quentin Fournier retweetledi

At @ChandarLab, we are happy to announce the third edition of our assistance program to provide feedback for members of communities underrepresented in AI who want to apply to high-profile graduate programs. Want feedback? Details: chandar-lab.github.io/grad_app/. Deadline: Nov 01!
cc: @Mila_Quebec, @polymtl, @CIFAR_News

English
Quentin Fournier retweetledi

Molecules speak in atoms and bonds. LLMs can learn that language. Even with SOTA #denovo design, our largest molecular LLM study finds a plot twist: early saturation, weak scaling, and proxy metrics that mislead on real tasks! Led by @kchitsaz and @roshan_msb
🧵 More in thread:
English
Quentin Fournier retweetledi
Quentin Fournier retweetledi

Structure-Aligned Protein Language Model
1.Structure-Aligned Protein Language Model (SaPLM) augments sequence-only protein language models (pLMs) with structural knowledge by aligning their representations with those from pre-trained protein graph neural networks (pGNNs), achieving large gains on structure-aware tasks without compromising sequence generality.
2.The key innovation is a dual-task framework: (1) a latent-level contrastive learning task that aligns residue embeddings from the pLM and pGNN across different proteins, capturing inter-protein structural patterns, and (2) a physical-level task that predicts structural tokens from pLM outputs, encoding intra-protein structural geometry.
3.To avoid noisy or overly simple residues during training, a residue loss selection module is introduced. It selects residue losses with high excess learning potential by comparing the current model’s losses to a high-quality reference model trained on curated structures.
4.Applying this structure alignment method to ESM2 and AMPLIFY yields SaESM2 and SaAMPLIFY, which significantly outperform their unaligned counterparts on multiple benchmarks. SaESM2 improves contact prediction P@L/5 by 13% and stability prediction Spearman correlation by 4.5%.
5.Unlike prior models that use structural input during inference (e.g., structure tokens), SaPLM requires only sequences at inference time, maintaining the generality and scalability of pLMs while enhancing structural reasoning via pretraining.
6.On mutation effect prediction tasks, SaESM2 achieves the highest performance on binding fitness (GB1) and stability prediction, outperforming even ESM2-s and ISM models explicitly trained on these tasks.
7.SaESM2 also achieves state-of-the-art results on 6 out of 9 downstream property prediction tasks (e.g., metal binding, DeepLoc, EC number classification), showing the effectiveness of structural alignment for biologically relevant function prediction.
8.Ablation studies confirm that both the latent- and physical-level tasks are critical for performance, with the latent-level alignment contributing the most. Replacing the GearNet embeddings with AlphaFold Evoformer embeddings significantly degrades performance.
9.Residue embedding visualization using UMAP shows that aligned models (SaESM2, SaAMPLIFY) learn more structured latent spaces, with better separation of secondary structure types and physicochemical similarity among amino acids.
10.Structure alignment emerges as a generalizable and efficient way to enrich pLMs with structural context using only sequence input. SaESM2 and SaAMPLIFY set a new benchmark for structure-aware yet sequence-only protein modeling.
💻Code: github.com/chandar-lab/AM…
📜Paper: arxiv.org/abs/2505.16896
#ProteinLLM #StructureAlignment #pLM #ProteinStructure #ContrastiveLearning #GNN #ComputationalBiology #SaESM2 #AMPLIFY #AI4Science

English
Quentin Fournier retweetledi

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io 🧵
English
Quentin Fournier retweetledi

Can better architectures & representations make self-play enough for zero-shot coordination? 🤔
We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! 🚀 #ICLR2025 1/n

English
Quentin Fournier retweetledi

In my lab, we have not one but four open postdoc positions! These positions cover developing foundation models for text, proteins, small molecules, genomic data, time series data, and astrophysics data! If you have strong research expertise and a PhD in LLMs and Foundation Models, and you are willing to learn about domain-specific problems and collaborate with domain experts, this is an ideal position for you! Actual links in the next tweet! 1/2

English
Quentin Fournier retweetledi

Congratuations to Lola (@lo_LB_La) and Sarath (@apsarathchandar) Read their blog post: mila.quebec/en/article/neo…
Sarath Chandar@apsarathchandar
2025 BERT is NeoBERT! We have fully pre-trained a next-generation encoder for 2.1T tokens with the latest advances in data, training, and architecture. This is a heroic effort from my PhD student @lo_LB_La in collaboration with @qfournier2 and Mariam El Mezouar (1/n)
English
Quentin Fournier retweetledi
Quentin Fournier retweetledi

I am excited to share that our BindGPT paper won the best poster award at @RealAAAI #AAAI2025! Congratulations to the team! Work led by @artemZholus!

Sarath Chandar@apsarathchandar
What's the foundational model for generative chemistry? Our work, BindGPT, is a good candidate, and it will be presented at #AAAI2025 today! We built a simple transformer language model that beats diffusion models by just generating 3D molecules as text! Led by @artemZholus 1/n
English
Quentin Fournier retweetledi

2025 BERT is NeoBERT! We have fully pre-trained a next-generation encoder for 2.1T tokens with the latest advances in data, training, and architecture. This is a heroic effort from my PhD student @lo_LB_La in collaboration with @qfournier2 and Mariam El Mezouar (1/n)

English
Quentin Fournier retweetledi

📢 Exciting News! The Fourth Conference on Lifelong Learning Agents (CoLLAs 2025) will be held at the University of Pennsylvania (@Penn) in Philadelphia, USA 🇺🇸
🗓️ Important Dates:
Abstract Deadline: Feb 21, 2025
Submission Deadline: Feb 26, 2025
Conference Dates: Aug 11 - Aug 14, 2025
We invite submissions that present new theories, methodologies, applications, or insights into algorithms and benchmarks designed for non-i.i.d. and non-stationary settings. Accepted papers will be published in the Proceedings of Machine Learning Research (PMLR). 📚
Full CFP: lifelong-ml.cc/Conferences/20…
#CoLLAs2025 #AI #MachineLearning #ContinualLearning #LifelongLearning #ResearchConference #CallForPapers #NonStationaryLearning

English
Quentin Fournier retweetledi

After many years, I will be attending @emnlpmeeting EMNLP! @ChandarLab will present three papers at EMNLP 2024 (details in the thread). I am also recruiting Ph. D.s/postdocs, so please email me if you are attending EMNLP and are interested in chatting about these positions! 1/n

English

🚨 Exciting news! Our state-of-the-art protein language models are now available on Hugging Face 🤗 Discover AMPLIFY at huggingface.co/chandar-lab and start experimenting today!
English
Quentin Fournier retweetledi

Are you finishing your PhD in LLMs and are looking for a postdoctoral position? Come join Prof. Amal Zouaq and me at @Mila_Quebec! We have multiple openings for postdoctoral candidates in NLP/LLM.
Details: shorturl.at/gwpgG
Deadline: 30th October.
Please retweet for maximum reach!

English
Quentin Fournier retweetledi

Recently added into Database
1. Protein-Mamba: Biological Mamba Models for Protein Function Prediction
arxiv.org/abs/2409.14617
2. Protein Language Models: Is Scaling Necessary?
biorxiv.org/content/10.110…
3. PepINVENT: Generative peptide design beyond the natural amino acids
arxiv.org/abs/2409.14040
4. Navigating Chemical Space with Latent Flows
arxiv.org/abs/2405.03987
5. DiffPaSS -- High-performance differentiable pairing of protein sequences using soft scores
arxiv.org/abs/2409.16142
6. Structure-based Drug Design with Equivariant Diffusion Models
arxiv.org/abs/2210.13695
7. dnaGrinder: a lightweight and high-capacity genomic foundation model
arxiv.org/abs/2409.15697
8. Evaluating the representational power of pre-trained DNA language models for regulatory genomics
biorxiv.org/content/10.110…
English
Quentin Fournier retweetledi

biorxiv.org/content/10.110…
As previously mentioned, AlphaFold2’s confidence has been used to predict protein disorder (Ruff and Pappu, 2021), but we show that AlphaFold2 cannot differentiate between disordered proteins and non-protein sequences, whereas AMPLIFY can. Figure 2H demonstrates that AMPLIFY embedding similarity can separate human disordered proteins (> 25% intrinsically disordered in the DisProt database) (Aspromonte et al., 2024) from hypothetical proteins (PE=5 and no annotation for localization) at a ROC-AUC of 0.94, compared to a score of 0.44 when using AlphaFold2’s pLDDT confidence metric. This demonstrates that training solely on available structures is insufficient for a comprehensive understanding of protein behavior.
neat
English





