Chris Hayduk

1.9K posts

Chris Hayduk banner
Chris Hayduk

Chris Hayduk

@ChrisHayduk

Writing About AI, Biology, & Geopolitics on Substack || Machine Learning Engineer @Meta || Prev. Lead Machine Learning Engineer for Drug Discovery @Deloitte

New York City Katılım Temmuz 2009
2.3K Takip Edilen3.4K Takipçiler
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@rahulcreates95 @patio11 Yes, the algorithm basically guarantees that you’ll never forget whatever flashcard you put into it. The core thing is that you need to establish the habit of using it daily, because it requires daily flashcard practice to ensure you don’t forget the material
English
1
0
0
33
Patrick McKenzie
Patrick McKenzie@patio11·
Doing the reading is a superpower, and it's even better in a world where "no one" is doing the reading. (Inspired by a conversation I had with some college students.)
English
46
212
2.3K
98K
Rahul Parmar
Rahul Parmar@rahulcreates95·
@patio11 patrick, i read a lot, still shit at my vocab, any suggestions please ?
English
7
0
4
2.1K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@menhguin Yeah I would say basically all of my big-to-mid ticket purchases are done through AI research. The final checkout happens on the producer website, but no reason in principle that it couldn't happen in the chat window itself.
English
0
0
0
60
Chris Hayduk
Chris Hayduk@ChrisHayduk·
I actually think the decline of reading and solitary, deep thinking is part of this. To set up private DARPAs and Bell Labs, you really need an idiosyncratic view of the world that you're willing to put up large amounts of capital in order to explore. If you pursue the same targets and approaches as everybody else, you have no advantage over the well-funded labs like DeepMind, industry pharma, etc. And the only way to arrive at those highly idiosyncratic points of view is to, imo, spend a lot of time reading books and papers that most other people in your field have not read, and then thinking deeply about that material.
English
0
0
1
120
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@ninja_maths My jaw dropped when I opened my Twitter feed and saw you guys had dropped not one but TWO of the courses we talked about (which wasn't even that long ago)! What a treat - can't wait to dig in here! Thank you Alex and team for the hard work pushing these out so quickly!
English
1
0
16
321
Alex Smith
Alex Smith@ninja_maths·
Shout-out to @ChrisHayduk, who kick-started a discussion about a possible MMPS course (or two) sometime last year. We listened.😀Thanks, Chris!
Alex Smith@ninja_maths

I'm delighted to announce that @_MathAcademy_ has released two courses in Mathematical Methods for the Physical Sciences. Designed for students who want the mathematical tools needed for undergraduate-level study in physics, engineering, and other STEM fields. Details below👇

English
3
3
84
4.9K
Chris Hayduk retweetledi
Caleb Hammer
Caleb Hammer@sircalebhammer·
Austin tried to pass rent control, but the state didn’t allow it. Instead, we allowed developers to build. Now it’s cheaper to rent in Austin as a % of income than in decades. Move to states and cities that build things instead of having virtue signaling policies that don’t help
Caleb Hammer tweet media
Hunter📈🌈📊@StatisticUrban

Austin, TX from the same spot 10 years apart (2014 vs 2024) Pretty stunning transformation.

English
219
951
12.4K
857.5K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@therealpeterobi Not yet but putting it together soon! Need to compile a training and val set first and then will be good to go
English
1
0
2
41
Pierre
Pierre@therealpeterobi·
@ChrisHayduk Has the NanoAF2 competition started?
English
1
0
0
36
Chris Hayduk
Chris Hayduk@ChrisHayduk·
Fantastic idea and I think this will actually be the better direction to take my NanoAF2 competition. Data is the scarce resource when building frontier bio ML models, so better per-sample efficiency will be a gamechanger.
Samip@industriaalist

1/ Introducing NanoGPT Slowrun 🐢: an open repo for state-of-the-art data-efficient learning algorithms. It's built for the crazy ideas that speedruns filter out -- expensive optimizers, heavy regularization, SGD replacements like evolutionary search.

English
1
0
18
2K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@shakoistsLog Yes steady 8-9 hour days are so suboptimal for me. A few days of 14+ followed by a few days of 4-6 gets so much more done.
English
0
0
4
243
shako
shako@shakoistsLog·
i think i’m most productive working 14 hour days then 6 hour days every other day
English
14
1
112
3.2K
Chen Liu
Chen Liu@ChenLiu_1996·
@ChrisHayduk It is actually under the preprocessing folder. It’s all explained here: #alphafold2-structure-data" target="_blank" rel="nofollow noopener">github.com/KrishnaswamyLa…
English
1
0
2
63
Chris Hayduk
Chris Hayduk@ChrisHayduk·
I'm rebuilding AlphaFold2 from scratch in pure PyTorch. No frameworks on top of PyTorch. No copy-paste from DeepMind's repo. Just nn.Linear, einsum, and the 60-page supplementary paper. The project is called minAlphaFold2, inspired by Karpathy's minGPT. The idea is simple: AlphaFold2 is one of the most important neural networks ever built, and there should be a version of it that a single person can sit down and read end-to-end in an afternoon. Where it stands today: - ~3,500 lines across 9 modules - Full forward pass works: input embedding → Evoformer → Structure Module → all-atom 3D coordinates - Every loss function from the paper (FAPE, torsion angles, pLDDT, distogram, structural violations) - Recycling, templates, extra MSA stack, ensemble averaging — all implemented - 50 tests passing - Every module maps 1-to-1 to a numbered algorithm in the AF2 supplement The Structure Module was the most satisfying part to build. Invariant Point Attention is genuinely beautiful — it does attention in 3D space using local reference frames so the whole thing is SE(3)-equivariant, and the math fits in about 150 lines of PyTorch. What's next: - Build the data pipeline (PDB structures + MSA features) - Write the training loop - Train on a small set of proteins and see what happens The repo is public. If you've ever wanted to understand how AlphaFold2 actually works at the level of individual tensor operations, this is meant for you. Repo: github.com/ChrisHayduk/mi…
Chris Hayduk tweet media
English
60
260
2.3K
81.8K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@ChenLiu_1996 This is awesome, thank you! Are all the relevant piece of the data pipeline in the data_loading/ folder?
English
1
0
0
238
Chen Liu
Chen Liu@ChenLiu_1996·
@ChrisHayduk I have just built a data pipeline for PDB structures and MSA features as part of our open-sourced project that works for AlphaFold2. You might find that useful to your project. github.com/KrishnaswamyLa…
English
1
2
12
961
Paul Graham
Paul Graham@paulg·
Larry Page is gone. He wasn't just pretending to move to Florida. He has moved. The proposed wealth tax hasn't even passed, and already it has cost California both Larry's presence and all the tax revenue it made from him.
English
567
394
12K
1.4M
@BioAI_Neuro
@BioAI_Neuro@NeuroAI_Nexus·
@chrishayduk This is so cool, Chris! Thanks for sharing your how-to-do-it booklet
English
1
0
5
1.4K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@itsbautistam Yes I loved the work you guys did on SimpleFold! The formulation of the model was really elegant
English
1
0
12
2K
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@RolandDunbrack My mistake, it's used as a prediction head but not directly included in the loss terms. So changes shouldn't affect training. Will definitely follow up with you once I start implementing the multimer side of things! For now I'm focused on standing up the monomer pipeline
English
1
0
1
116
Roland Dunbrack 🏳️‍🌈 @rolanddunbrack.bsky.social
@chrishayduk Is ipTM part of the loss function during structure optimization? Understand if you don't want to change that now. But I'd love to see ipSAE simply output for the structural models in either of your repos. Happy to chat if you have questions.
English
1
0
1
147
Chris Hayduk
Chris Hayduk@ChrisHayduk·
Agree here - I actually used ESM much more day-to-day in my previous work in the drug discovery space. I went with AF2 as my first implementation for two main reasons: 1. It basically spawned the AI in protein structure prediction field (and is still the most famous model in the field) 2. It can be illuminating to look at the role that MSAs played in this model and understand how the single sequence encoder replaces that functionality in ESMFold Minimal implementations of ESM-2 and ESMFold should both be following shortly!
English
1
0
2
670
Louis Fréchet
Louis Fréchet@gumbelfrechet·
@chrishayduk Nice work. I didn't like their use of MSAs that much. It feels wrong as it brings some inductive bias from pseudo-linear models (alignments), and makes performance much worse on difficult proteins, e.g. TCRs. ESM is IMHO less hyped but more interesting.
English
1
1
4
867
Chris Hayduk
Chris Hayduk@ChrisHayduk·
@leothecurious That's the goal! The nanoGPT speedrun challenge pushed the time it takes to train a GPT-2-class model from 45 minutes two years ago to 89 seconds today, a 97% decrease in runtime. I'd like to see the same happen for AlphaFold.
English
2
0
9
310
davinci
davinci@leothecurious·
very cool educational resource. also nanoAlphaFold2 speedruns upcoming soon after! the innovation that goes into those will probably turn out useful for the broader field.
Chris Hayduk@ChrisHayduk

I'm rebuilding AlphaFold2 from scratch in pure PyTorch. No frameworks on top of PyTorch. No copy-paste from DeepMind's repo. Just nn.Linear, einsum, and the 60-page supplementary paper. The project is called minAlphaFold2, inspired by Karpathy's minGPT. The idea is simple: AlphaFold2 is one of the most important neural networks ever built, and there should be a version of it that a single person can sit down and read end-to-end in an afternoon. Where it stands today: - ~3,500 lines across 9 modules - Full forward pass works: input embedding → Evoformer → Structure Module → all-atom 3D coordinates - Every loss function from the paper (FAPE, torsion angles, pLDDT, distogram, structural violations) - Recycling, templates, extra MSA stack, ensemble averaging — all implemented - 50 tests passing - Every module maps 1-to-1 to a numbered algorithm in the AF2 supplement The Structure Module was the most satisfying part to build. Invariant Point Attention is genuinely beautiful — it does attention in 3D space using local reference frames so the whole thing is SE(3)-equivariant, and the math fits in about 150 lines of PyTorch. What's next: - Build the data pipeline (PDB structures + MSA features) - Write the training loop - Train on a small set of proteins and see what happens The repo is public. If you've ever wanted to understand how AlphaFold2 actually works at the level of individual tensor operations, this is meant for you. Repo: github.com/ChrisHayduk/mi…

English
1
0
30
2.8K