Gustaf Hemberg

170 posts

Gustaf Hemberg

Gustaf Hemberg

@thehemberg

Co-founder @Scindo_bio || Gentleman, Scholar & Science Bro

London, England Katılım Ocak 2021
159 Takip Edilen23 Takipçiler
Gustaf Hemberg retweetledi
Brian L Trippe
Brian L Trippe@brianltrippe·
🚨New paper! Generative models are often “miscalibrated”. We calibrate diffusion models, LLMs, and more to meet desired distributional properties. E.g. we finetune protein models to better match the diversity of natural proteins. arxiv.org/abs/2510.10020 github.com/smithhenryd/cgm
English
3
45
202
20.4K
Gustaf Hemberg retweetledi
Abu Shanab
Abu Shanab@med_712·
Antibiotics classification
Abu Shanab tweet media
English
7
436
2.5K
199.1K
Gustaf Hemberg retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
Language model-guided anticipation and discovery of unknown metabolites • The study introduces DeepMet, a chemical language model designed to anticipate and discover previously unidentified metabolites by learning biosynthetic logic from known metabolite structures. • DeepMet predicts the structures of novel metabolites and integrates with mass spectrometry (MS/MS) data to systematically discover metabolites within complex biological samples. • The model demonstrated success in predicting 81% of newly discovered human metabolites from the Human Metabolome Database (HMDB) version 5, highlighting its robust predictive power. • Through experimental validation, DeepMet led to the discovery of 47 novel mammalian metabolites, showcasing its ability to uncover structurally diverse metabolites spanning various chemical classes. • Integration with MS/MS workflows significantly improved the identification of metabolites, with DeepMet aiding in the structural elucidation of unidentified peaks in metabolomics data. • A repository-scale application of DeepMet annotated over 29.1 million MS/MS spectra, substantially increasing the number of matched experimental spectra compared to traditional methods. • The study highlights DeepMet’s ability to bridge gaps in metabolic databases, enabling high-throughput identification of the “chemical dark matter” in the metabolome. @skinniderlab @WishartLab @EvolvedChem @BoWang87 @AdamoYoung @NeinastM @AsaelRoichman 📜Paper: biorxiv.org/content/10.110… #Metabolomics #ChemicalLanguageModel #Bioinformatics #MachineLearning #MetabolicDiscovery
Biology+AI Daily tweet media
English
1
11
41
4K
Gustaf Hemberg retweetledi
Yulab
Yulab@YulabJin·
Amazing collaboration with Stoltz group @Caltech and Davies group @EmoryUniversity! Our part was to achieve tetra-CH hydroxylation via weak coordination at very late stage of complex total synthesis: science.org/doi/10.1126/sc…
English
2
13
137
28.4K
Gustaf Hemberg retweetledi
Maksym Andriushchenko
Maksym Andriushchenko@maksym_andr·
🚨 So, why do we need weight decay in modern deep learning? 🚨 The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version). Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining regime is quite subtle and its interaction with the implicit regularization effect of SGD plays a crucial role. In the undertraining regime (e.g., in LLM pretraining), however, the effect of weight decay is totally different: it sets an implicit learning rate schedule for AdamW and enables stable training with bfloat16 precision. This explains why weight decay is still widely used for LLM training with standard optimizers, such as AdamW. This is joint work with @dngfra, @adityavardhanv, @tml_lab.
Maksym Andriushchenko tweet media
English
11
106
694
74.2K
Gustaf Hemberg retweetledi
Gustaf Hemberg retweetledi
Biology+AI Daily
Biology+AI Daily@BiologyAIDaily·
PDBe tools for an in-depth analysis of small molecules in the Protein Data Bank 1. PDBe has developed new tools to facilitate the analysis of small molecules in the PDB, enhancing data accessibility and insight into ligand-macromolecule interactions. The tools—PDBe CCDUtils, PDBe Arpeggio, and PDBe RelLig—enable researchers to explore ligand data with greater depth and accuracy. 2. PDBe CCDUtils provides enriched ligand data by parsing and processing molecular structures, supporting researchers in standardizing ligand identification across PDB entries. This tool also generates high-quality 2D and 3D visualizations of ligands, enabling a comprehensive view of small molecules. 3. PDBe Arpeggio analyzes detailed interactions between ligands and macromolecules, identifying specific contact points such as hydrogen bonds and hydrophobic interactions, which are crucial for understanding ligand binding. 4. PDBe RelLig classifies ligands based on their functional roles (e.g., cofactor-like, reactant-like, or drug-like), helping researchers distinguish biologically relevant molecules from experimental artefacts, thereby supporting accurate interpretation of biological functions. 5. These tools, along with PDBe-KB’s ligand pages, provide an integrated view of ligands in their biological context, making it easier to visualize interactions, assess binding sites, and explore molecular properties in drug discovery and structural biology research. @preeti_cy @PDBeurope 💻Code: • PDBe CCDUtils: github.com/PDBeurope/ccdu… • PDBe Arpeggio: github.com/PDBeurope/arpe… • PDBe RelLig: github.com/PDBeurope/rell… 📜Paper: biorxiv.org/content/10.110… #Bioinformatics #PDB #DrugDiscovery #StructuralBiology #LigandAnalysis
Biology+AI Daily tweet media
English
0
51
193
14.4K
Gustaf Hemberg retweetledi
Eric Topol
Eric Topol@EricTopol·
Evidence from omics that aging is not a gradual, linear process. The new study, highlighted here, only assessed people up to age 75 among 108 participants with short term follow-up (1.7 yrs). A previous large study (N>4,200, up to age 95) found a 3rd peak at age 80. nature.com/articles/s4159… wsj.com/health/wellnes…
Eric Topol tweet media
English
41
544
2.1K
363K
Gustaf Hemberg retweetledi
Fasan Lab
Fasan Lab@FasanLab·
Delighted to share our latest work on a Chemoenzymatic Diversity-Oriented Synthesis strategy for generating complex, natural-product-like compounds with enzymes. Big 👏to the lead authors Andrew Bortz and John Bennett for this beautiful work!🎉sciencedirect.com/science/articl…
Fasan Lab tweet mediaFasan Lab tweet mediaFasan Lab tweet mediaFasan Lab tweet media
English
1
13
101
9.7K
Gustaf Hemberg retweetledi
Elliot Hershberg
Elliot Hershberg@ElliotHershberg·
Three of my favorite papers published this week: 1. A mechanism for bacteria to create *new* repetitive toxic genes to kill themselves in response to infection (!!)
English
8
119
752
80.4K
Gustaf Hemberg retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Design of intrinsically disordered protein variants with diverse structural properties | @ScienceAdvances - Design new IDP sequences using Simulated Annealing with a target radius of gyration (compaction) - Initialize with a natural IDP sequence, and swap two random residues at each step (to keep the same composition) - Evaluate compaction using either MD simulation with CALVADOS or by reweighting prior generated conformations Link: science.org/doi/10.1126/sc…
Leo Zang tweet media
English
0
14
61
4.3K
Gustaf Hemberg retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Unsupervised learning of progress coordinates during weighted ensemble simulations: Application to millisecond protein folding - Improve rare events in protein folding (e.g., state transitions) through weighted ensemble simulation and an unsupervised deep learning model. - Use a convolutional VAE to compress contact maps into latent space, and applies a Local Outlier Factor to identify outlier conformations, which are then replicated in the simulation. - Training the CVAE on-the-fly works better than using a pretrained model. Preprint: biorxiv.org/content/10.110…
Leo Zang tweet media
English
0
24
109
13.4K
Gustaf Hemberg retweetledi
Leo Zang
Leo Zang@LeoTZ03·
AlphaFold2 knows some protein folding principles -Use AF2 without MSAs/templates, mimicking an ab initio approach. The iterations show AF2's energy landscape and "local first, global later" folding mechanism. - Folded intermediates of six small proteins (protein G, protein L and their mutants, ubiquitin, and the SH3 domain) resemble results in other studies and MD simulations. - Scale iterative folding study with 7k high-resolution proteins clustered by MMSeq2, with lengths ranging from 30 to 250 Preprint: biorxiv.org/content/10.110…
Leo Zang tweet media
Liwei Chang@liwei_chang_

Did AlphaFold solve the protein folding problem? ...Not yet! AF2 predicts static structures, usually the native state by default. However, we found AF2 can generate structures aligning well with known folding intermediates. biorxiv.org/content/10.110… @Al__Perez @UFChemistry 1/n

English
0
47
180
13.2K
Gustaf Hemberg retweetledi
Jianmin Wang(王建民)
Jianmin Wang(王建民)@Jianmin4drugai·
Updating !!! Collection(GitHub): Deep Learning and Generative AI for molecules ( small molecules, RNA, peptide, protein, enzymes, antibody, and PPIs) conformations and molecular dynamics (force fields) github.com/AspirinCode/aw…
Jianmin Wang(王建民) tweet media
English
0
45
115
7.6K
Gustaf Hemberg retweetledi
Kevin K. Yang 楊凱筌
Kevin K. Yang 楊凱筌@KevinKaichuang·
Happy to release CHEAP: Compressed Hourglass Embedding Adaptations of Proteins We compress the ESMFold latent space while retaining information about sequence, structure, and function, with implications for generation, search, and transfer learning.
Kevin K. Yang 楊凱筌 tweet mediaKevin K. Yang 楊凱筌 tweet mediaKevin K. Yang 楊凱筌 tweet mediaKevin K. Yang 楊凱筌 tweet media
Amy Lu@amyxlu

1/ 🧬 Excited to share CHEAP, our new work on compressed protein embeddings. We characterize the joint distribution of p(sequence, structure) in ESMFold's latent space, and find cool tidbits on compressibility, tokenizability, and pathologies: biorxiv.org/content/10.110… 🧵

English
6
29
236
23K
Gustaf Hemberg retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure - CHEAP, a novel method for compressing protein sequence and structure latent space (ESMFold), achieves up to 128x channel and 8x length compression from sequence input alone - Uses per-channel normalization, downsampling both the channel and length dimensions with linear projections and attention - Explores both continuous and discrete compression, evaluated with TM-Score, RMSD, RMSPD, and sequence recovery accuracy Preprint: biorxiv.org/content/10.110…
Leo Zang tweet media
English
2
30
115
7.3K
Gustaf Hemberg retweetledi
Science Magazine
Science Magazine@ScienceMagazine·
Age-based memory decline correlates with quality—not quantity—of dendritic spines in the temporal cortex, a new @ScienceAdvances study suggests. scim.ag/7Zz
Science Magazine tweet media
English
8
54
180
41.7K
Gustaf Hemberg retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Fast, sensitive detection of protein homologs using deep dense retrieval | @NatureBiotech -Dense Homolog Retriever (DHR) employs a bi-encoder (ESM1b, first vector as fixed-length vector) architecture and a CLIP-like approach to train on homologous pairs with in-batch negatives - Retrieve homologs by directly comparing these embeddings using a similarity metric (dot product) and JackHMMER to construct MSAs Link: nature.com/articles/s4158…
Leo Zang tweet media
English
0
15
86
4.3K