Tanmoy Sanyal

250 posts

Tanmoy Sanyal banner
Tanmoy Sanyal

Tanmoy Sanyal

@hiddenvariable2

Protein design @Amgen. Previously, @novonordisk, @salilab_ucsf, @UCSBCHE, @IITKgp.

San Francisco Bay Area, CA Entrou em Haziran 2016
753 Seguindo286 Seguidores
Tanmoy Sanyal retweetou
Diego del Alamo
Diego del Alamo@DdelAlamo·
Retraining PLMs with newly deposited sequences doesn't guarantee better performance. Authors trained PLMs on each copy of UniRef100 from 2011 to 2024; the largest performance boost (2021->2022) coincided with the removal of unusually large number of invalid sequences from UniRef
Diego del Alamo tweet media
Biology+AI Daily@BiologyAIDaily

Protein Language Models: Is Scaling Necessary? - This paper challenges the prevailing belief that scaling up protein language models (pLMs) is essential for better performance, proposing that careful data curation can achieve comparable results at a fraction of the cost. - The authors introduce AMPLIFY, a protein language model that outperforms state-of-the-art models like ESM2 15B, while being 43 times smaller in terms of parameters and 17 times more efficient in training. - AMPLIFY’s success is attributed to using high-quality, curated datasets rather than simply increasing model size. This allows for better generalization and less overfitting, particularly in tasks like sequence recovery and protein design. - By focusing on natural sequence space and eliminating noise from datasets, AMPLIFY reduces computational costs and energy consumption, democratizing pLM development for smaller research labs. - The paper emphasizes that data quality is more important than model size, with findings showing that models trained on well-curated datasets significantly outperform models trained on larger but noisier datasets. - AMPLIFY exhibits emergent behaviors in tasks like distinguishing real proteins from non-proteins, even in zero-shot settings. It can also handle intrinsically disordered proteins better than structure-based models like AlphaFold2. - The authors call for a shift away from scaling as the main driver of improvement in pLMs, advocating for better dataset curation and efficient architectures to build robust, high-performing models. @apsarathchandar @bnschlz 💻Code: github.com/chandar-lab/AM… 📜Paper: biorxiv.org/content/10.110…

English
4
22
101
33.4K
Tanmoy Sanyal retweetou
Rohit Singh
Rohit Singh@rohitsingh8080·
Let me tell you a story. It'll end up at the current tech-bio and protein design scene. But the story starts about 25 years earlier. Did you know that, commercially, the human genome project precipitated the end and not the start of a genomics boom? 1/
English
9
70
424
116.1K
Tanmoy Sanyal retweetou
Olexandr Isayev 🇺🇦🇺🇸
Editors rant: how many more GNN or message-passing architectures do we *really* need to score/predict protein-ligand interactions? A new one appears every day!!! 🤯#compchem
GIF
Pittsburgh, PA 🇺🇸 English
9
6
74
9.7K
Tanmoy Sanyal retweetou
andrew blevins
andrew blevins@Andrewdblevins·
Working on ML for Drug Discovery I have been frustrated with the size and/or quality of publicly available datasets to train and benchmark models with, so when my co-founder and I started our company we swore we would open-source some data as quickly as possible.
English
13
31
211
42K
Tanmoy Sanyal retweetou
Chris Bakke
Chris Bakke@ChrisJBakke·
The sad reality is that most people don't have what it takes to work in tech: Up at 4am. Post a pic of my new Eight Sleep in the group chat for sweet, sweet engagement. Hit the gym. Crush 8 jumping jacks. 35 minute cold plunge. Rip a My First Million episode at 2x speed. Drink a bottle of Bryan Johnson olive oil. Eat 4 bags of Athletic Greens powder. Feel sick. Power through. Open laptop. Knock out 7-8 emoji reactions to threads on Slack. Grab lunch. (A 2nd bottle of olive oil) Open Jira. Comment "any updates?" on 3 tickets. Wind down for the day. Open Substack and resume writing Part 4 of "Problems with European Work Ethic: a San Francisco perspective" - a banger that got 8 likes on Substack and two thumbs up in the group chat.
English
237
397
7.8K
2M
Tanmoy Sanyal retweetou
Krishnan
Krishnan@cvkrishnan·
WAIT!! WHAT?!!
Krishnan tweet media
English
45
725
4.8K
526.9K
Tanmoy Sanyal retweetou
India Research Watch (IRW)
India Research Watch (IRW)@IRWatchdog·
𝐈𝐧𝐝𝐢𝐚𝐧 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐂𝐫𝐢𝐬𝐢𝐬 - 𝐍𝐨𝐭 𝐣𝐮𝐬𝐭 𝐚 𝐒𝐦𝐨𝐤𝐢𝐧𝐠 𝐆𝐮𝐧 𝐛𝐮𝐭 𝐚 𝐃𝐮𝐦𝐩𝐬𝐭𝐞𝐫 𝐅𝐢𝐫𝐞 If you had any doubts about Indian Research being in a deep crisis, this graph should definitively lay them to rest. (1/N) 🧵 #IndiaResearchCrisis
India Research Watch (IRW) tweet media
English
15
104
289
78.2K
Tanmoy Sanyal retweetou
ISRO
ISRO@isro·
Chandrayaan-3 Mission: 'India🇮🇳, I reached my destination and you too!' : Chandrayaan-3 Chandrayaan-3 has successfully soft-landed on the moon 🌖!. Congratulations, India🇮🇳! #Chandrayaan_3 #Ch3
English
68.4K
269.7K
818.9K
71M
Tanmoy Sanyal
Tanmoy Sanyal@hiddenvariable2·
In 2008 October, I was back from college for a weekend visit to my parents, when the Chandrayaan-1 mission happened. 15 years later life comes full circle: my parents are visiting me on their vacation when Chandrayaan-3 soft-lands an unmanned rover on the moon!
English
0
0
8
519
Tanmoy Sanyal retweetou
Bojan Tunguz
Bojan Tunguz@tunguz·
I wonder what does it feel like to be a normie and work in a field where you don’t have to upskill every couple of hours.
English
79
106
1.4K
252.4K
Tanmoy Sanyal retweetou
Krishnaswamy Lab
Krishnaswamy Lab@KrishnaswamyLab·
Some perspective for scientists who may be asked to review computational methods: Computational methods are indeed innovated by taking existing mathematical and algorithmic atoms and putting them together in a novel way. Second, small changes in the steps can have a large effect on the outcome! Examples below: The classic ISOMAP method could be considered a combination of shortest paths on graphs + MDS. Neither of these was novel but ISOMAP was one of the first breakthrough manifold learning algorithms. Spectral clustering is just using graph Laplacian eigenvectors and K-means, again the combo is what makes this able to detect arbitrarily shaped clusters. The difference between SNE and t-SNE was just the "t" (student t-distribution) but look at how much one is used over the other :). The difference between a GAN and a conditional GAN is just a conditioning input signal. Please do comment on anyone's paper that because the atoms they used are not novel, or the change doesn't seem vast, that the entire method is not novel. This is completely unfair and ignores how computational innovation proceeds...
English
3
27
135
28.4K
Vivek Das
Vivek Das@ivivek87·
Italy’s ChatGPT ban spreads to France, Germany and Ireland Not surprsing at all given how data protection and privacy regulation is viewed in EU as compared to US. This shall to pass, however, monopolization needs a check! 😉 stealthoptional.com/news/italys-ch…
English
2
1
2
945
Tanmoy Sanyal retweetou
Ben Blaiszik
Ben Blaiszik@BenBlaiszik·
We wrapped up the first LLM hackathon for applications in materials and chemistry last week. The results to me were astounding. We are at the point now where some tasks that took years can now be completed in days. Here is a list of the fantastic submissions!
Ben Blaiszik tweet media
English
53
355
2.2K
996.5K
Tanmoy Sanyal retweetou
Dr. Holly Walters
Dr. Holly Walters@Manigarm·
I'm serious. STEM without the Arts, Social Sciences, and Humanities will produce more "innovative" tech bros who giddily reinvent rent, roommates, taxes, and now...roller skates. With complete, straight-faced, sincerity. This is a problem. And I have a list (So, thread 🧵)
The Rundown AI@TheRundownAI

These new AI shoes can make you walk 250% faster. Moonwalkers use AI to learn your step gait/speed and adapt to you. The shoe has two modes: lock, and shift, and will only work when you move.

English
183
7.2K
31.8K
4.4M