Biology+AI Daily@BiologyAIDaily
A Chemical Language Model for Molecular Taste Prediction
1. Introducing FART (Flavor Analysis and Recognition Transformer), a chemical language model trained on the largest molecular taste dataset (15,025 compounds), capable of predicting four taste categories (sweet, bitter, sour, umami) simultaneously with over 91% accuracy.
2. FART outperforms previous binary classifiers, delivering state-of-the-art results across individual taste classes, even when benchmarked on larger and more diverse test sets.
3. A key feature is its interpretability: gradient-based visualization highlights molecular substructures driving taste predictions, offering insights into the chemical basis of taste.
4. SMILES augmentation enhances FART’s robustness, ensuring consistency across diverse molecular representations and enabling a confidence metric that boosts prediction reliability to 94%.
5. FART’s architecture, built on ChemBERTa and fine-tuned on taste data, captures nuanced patterns in molecular taste prediction while handling undefined classes, such as tasteless compounds or out-of-distribution molecules.
6. Practical applications include food science, drug formulation, and natural product analysis, providing a tool for rapid, interpretable predictions to accelerate tastant discovery and systematic flavor exploration.
7. By making FART and its dataset publicly available, this study paves the way for further advancements in molecular taste prediction and its integration into broader chemical discovery workflows.
💻Code: github.com/fart-lab/fart.…
📜Paper: doi.org/10.26434/chemr…
#ChemicalLanguageModel #TastePrediction #MachineLearning #FoodScience #ChemBERTa #AI