Gustaf Hemberg

170 posts

Gustaf Hemberg

@thehemberg

Co-founder @Scindo_bio || Gentleman, Scholar & Science Bro

London, England Katılım Ocak 2021

159 Takip Edilen23 Takipçiler

Gustaf Hemberg retweetledi

Brian L Trippe@brianltrippe·14 Eki

🚨New paper! Generative models are often “miscalibrated”. We calibrate diffusion models, LLMs, and more to meet desired distributional properties. E.g. we finetune protein models to better match the diversity of natural proteins. arxiv.org/abs/2510.10020 github.com/smithhenryd/cgm

English

202

20.4K

Gustaf Hemberg retweetledi

Abu Shanab@med_712·26 Tem

Antibiotics classification

English

436

2.5K

199.1K

Gustaf Hemberg retweetledi

Biocatalysis@TUDelft@BiocatTUD·9 Ara

A Chemoenzymatic Cascade for the Formal Enantioselective Hydroxylation and Amination of Benzylic C−H Bonds; Y. Zhang, C. Huang, W. Kong, L. Zhou, J. Gao, F. Hollmann, Y. Liu, & Yanjun Jiang #singleatomcatalysts #peroxygenase #enantioselective doi.org/10.1021/acscat…

English

1.2K

Gustaf Hemberg retweetledi

Biology+AI Daily@BiologyAIDaily·16 Kas

Language model-guided anticipation and discovery of unknown metabolites • The study introduces DeepMet, a chemical language model designed to anticipate and discover previously unidentified metabolites by learning biosynthetic logic from known metabolite structures. • DeepMet predicts the structures of novel metabolites and integrates with mass spectrometry (MS/MS) data to systematically discover metabolites within complex biological samples. • The model demonstrated success in predicting 81% of newly discovered human metabolites from the Human Metabolome Database (HMDB) version 5, highlighting its robust predictive power. • Through experimental validation, DeepMet led to the discovery of 47 novel mammalian metabolites, showcasing its ability to uncover structurally diverse metabolites spanning various chemical classes. • Integration with MS/MS workflows significantly improved the identification of metabolites, with DeepMet aiding in the structural elucidation of unidentified peaks in metabolomics data. • A repository-scale application of DeepMet annotated over 29.1 million MS/MS spectra, substantially increasing the number of matched experimental spectra compared to traditional methods. • The study highlights DeepMet’s ability to bridge gaps in metabolic databases, enabling high-throughput identification of the “chemical dark matter” in the metabolome. @skinniderlab @WishartLab @EvolvedChem @BoWang87 @AdamoYoung @NeinastM @AsaelRoichman 📜Paper: biorxiv.org/content/10.110… #Metabolomics #ChemicalLanguageModel #Bioinformatics #MachineLearning #MetabolicDiscovery

English

Gustaf Hemberg retweetledi

Yulab@YulabJin·8 Kas

Amazing collaboration with Stoltz group @Caltech and Davies group @EmoryUniversity! Our part was to achieve tetra-CH hydroxylation via weak coordination at very late stage of complex total synthesis: science.org/doi/10.1126/sc…

English

137

28.4K

Gustaf Hemberg retweetledi

Maksym Andriushchenko@maksym_andr·8 Kas

🚨 So, why do we need weight decay in modern deep learning? 🚨 The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version). Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining regime is quite subtle and its interaction with the implicit regularization effect of SGD plays a crucial role. In the undertraining regime (e.g., in LLM pretraining), however, the effect of weight decay is totally different: it sets an implicit learning rate schedule for AdamW and enables stable training with bfloat16 precision. This explains why weight decay is still widely used for LLM training with standard optimizers, such as AdamW. This is joint work with @dngfra, @adityavardhanv, @tml_lab.

English

106

694

74.2K

Gustaf Hemberg retweetledi

Johannes Brandstetter@jo_brandstetter·8 Kas

Huge progress by @artuursberzins @AndyRadler @e_volkmann on Geometry-Informed Neural Networks (GINNs)! Faster training, better shapes, and surprising insights from enforcing diversity. 📜: arxiv.org/abs/2402.14009 🖥️: arturs-berzins.github.io/GINN/

GIF

Johannes Brandstetter@jo_brandstetter

We introduce Geometry-Informed Neural Networks to train shape generative models without any data (!!), combining learning under constraints, neural fields as a suitable representation, and generating diverse solutions to under-determined problems: 🖥️: arturs-berzins.github.io/GINN/

English

209

130.5K

Gustaf Hemberg retweetledi

Biology+AI Daily@BiologyAIDaily·10 Kas

PDBe tools for an in-depth analysis of small molecules in the Protein Data Bank 1. PDBe has developed new tools to facilitate the analysis of small molecules in the PDB, enhancing data accessibility and insight into ligand-macromolecule interactions. The tools—PDBe CCDUtils, PDBe Arpeggio, and PDBe RelLig—enable researchers to explore ligand data with greater depth and accuracy. 2. PDBe CCDUtils provides enriched ligand data by parsing and processing molecular structures, supporting researchers in standardizing ligand identification across PDB entries. This tool also generates high-quality 2D and 3D visualizations of ligands, enabling a comprehensive view of small molecules. 3. PDBe Arpeggio analyzes detailed interactions between ligands and macromolecules, identifying specific contact points such as hydrogen bonds and hydrophobic interactions, which are crucial for understanding ligand binding. 4. PDBe RelLig classifies ligands based on their functional roles (e.g., cofactor-like, reactant-like, or drug-like), helping researchers distinguish biologically relevant molecules from experimental artefacts, thereby supporting accurate interpretation of biological functions. 5. These tools, along with PDBe-KB’s ligand pages, provide an integrated view of ligands in their biological context, making it easier to visualize interactions, assess binding sites, and explore molecular properties in drug discovery and structural biology research. @preeti_cy @PDBeurope 💻Code: • PDBe CCDUtils: github.com/PDBeurope/ccdu… • PDBe Arpeggio: github.com/PDBeurope/arpe… • PDBe RelLig: github.com/PDBeurope/rell… 📜Paper: biorxiv.org/content/10.110… #Bioinformatics #PDB #DrugDiscovery #StructuralBiology #LigandAnalysis

English

193

14.4K

Gustaf Hemberg retweetledi

Eric Topol@EricTopol·4 Eyl

Evidence from omics that aging is not a gradual, linear process. The new study, highlighted here, only assessed people up to age 75 among 108 participants with short term follow-up (1.7 yrs). A previous large study (N>4,200, up to age 95) found a 3rd peak at age 80. nature.com/articles/s4159… wsj.com/health/wellnes…

English

544

2.1K

363K

Gustaf Hemberg retweetledi

Fasan Lab@FasanLab·6 Eyl

Delighted to share our latest work on a Chemoenzymatic Diversity-Oriented Synthesis strategy for generating complex, natural-product-like compounds with enzymes. Big 👏to the lead authors Andrew Bortz and John Bennett for this beautiful work!🎉sciencedirect.com/science/articl…

English

101

9.7K

Gustaf Hemberg retweetledi

Elliot Hershberg@ElliotHershberg·31 Ağu

Three of my favorite papers published this week: 1. A mechanism for bacteria to create *new* repetitive toxic genes to kill themselves in response to infection (!!)

English

119

752

80.4K

Gustaf Hemberg retweetledi

Leo Zang@LeoTZ03·1 Eyl

Design of intrinsically disordered protein variants with diverse structural properties | @ScienceAdvances - Design new IDP sequences using Simulated Annealing with a target radius of gyration (compaction) - Initialize with a natural IDP sequence, and swap two random residues at each step (to keep the same composition) - Evaluate compaction using either MD simulation with CALVADOS or by reweighting prior generated conformations Link: science.org/doi/10.1126/sc…

English

4.3K

Gustaf Hemberg retweetledi

Leo Zang@LeoTZ03·1 Eyl

Unsupervised learning of progress coordinates during weighted ensemble simulations: Application to millisecond protein folding - Improve rare events in protein folding (e.g., state transitions) through weighted ensemble simulation and an unsupervised deep learning model. - Use a convolutional VAE to compress contact maps into latent space, and applies a Local Outlier Factor to identify outlier conformations, which are then replicated in the simulation. - Training the CVAE on-the-fly works better than using a pretrained model. Preprint: biorxiv.org/content/10.110…

English

109

13.4K

Gustaf Hemberg retweetledi

Leo Zang@LeoTZ03·27 Ağu

AlphaFold2 knows some protein folding principles -Use AF2 without MSAs/templates, mimicking an ab initio approach. The iterations show AF2's energy landscape and "local first, global later" folding mechanism. - Folded intermediates of six small proteins (protein G, protein L and their mutants, ubiquitin, and the SH3 domain) resemble results in other studies and MD simulations. - Scale iterative folding study with 7k high-resolution proteins clustered by MMSeq2, with lengths ranging from 30 to 250 Preprint: biorxiv.org/content/10.110…

Liwei Chang@liwei_chang_

Did AlphaFold solve the protein folding problem? ...Not yet! AF2 predicts static structures, usually the native state by default. However, we found AF2 can generate structures aligning well with known folding intermediates. biorxiv.org/content/10.110… @Al__Perez @UFChemistry 1/n

English

180

13.2K

Gustaf Hemberg retweetledi

Jianmin Wang(王建民)@Jianmin4drugai·13 Ağu

Updating !!! Collection(GitHub): Deep Learning and Generative AI for molecules ( small molecules, RNA, peptide, protein, enzymes, antibody, and PPIs) conformations and molecular dynamics (force fields) github.com/AspirinCode/aw…

English

115

7.6K

Gustaf Hemberg retweetledi

Kevin K. Yang 楊凱筌@KevinKaichuang·10 Ağu

Happy to release CHEAP: Compressed Hourglass Embedding Adaptations of Proteins We compress the ESMFold latent space while retaining information about sequence, structure, and function, with implications for generation, search, and transfer learning.

Amy Lu@amyxlu

1/ 🧬 Excited to share CHEAP, our new work on compressed protein embeddings. We characterize the joint distribution of p(sequence, structure) in ESMFold's latent space, and find cool tidbits on compressibility, tokenizability, and pathologies: biorxiv.org/content/10.110… 🧵

English

236

23K

Gustaf Hemberg retweetledi

Leo Zang@LeoTZ03·11 Ağu

Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure - CHEAP, a novel method for compressing protein sequence and structure latent space (ESMFold), achieves up to 128x channel and 8x length compression from sequence input alone - Uses per-channel normalization, downsampling both the channel and length dimensions with linear projections and attention - Explores both continuous and discrete compression, evaluated with TM-Score, RMSD, RMSPD, and sequence recovery accuracy Preprint: biorxiv.org/content/10.110…

English

115

7.3K

Gustaf Hemberg retweetledi

Science Magazine@ScienceMagazine·11 Ağu

Age-based memory decline correlates with quality—not quantity—of dendritic spines in the temporal cortex, a new @ScienceAdvances study suggests. scim.ag/7Zz

English

180

41.7K

Gustaf Hemberg retweetledi

Zeymer Lab@CathleenZeymer·10 Ağu

📢 Finally out in @J_A_C_S: We engineered the first #PhotoLanZyme, a lanthanide-dependent photoredox enzyme that catalyzes radical C-C bond cleavages upon visible-light irradiation! 💡🧫🧪🧬💻 @TU_Muenchen @ERC_Research pubs.acs.org/doi/10.1021/ja…

English

266

16.5K

Gustaf Hemberg retweetledi

Leo Zang@LeoTZ03·10 Ağu

Fast, sensitive detection of protein homologs using deep dense retrieval | @NatureBiotech -Dense Homolog Retriever (DHR) employs a bi-encoder (ESM1b, first vector as fixed-length vector) architecture and a CLIP-like approach to train on homologous pairs with in-batch negatives - Retrieve homologs by directly comparing these embeddings using a similarity metric (dot product) and JackHMMER to construct MSAs Link: nature.com/articles/s4158…

English

4.3K

Keşfet

@skinniderlab @WishartLab @EvolvedChem @BoWang87 @AdamoYoung @NeinastM @AsaelRoichman @Caltech