Khaled Abdel-Maksoud

1.3K posts

Khaled Abdel-Maksoud banner
Khaled Abdel-Maksoud

Khaled Abdel-Maksoud

@Tench_KAMak

CADD+ML Scientist @CRiverLabs | MSci/PhD @ University of Southampton with @tmcscdt @comp_essexgroup | Retweeting science stuff frequently

England, United Kingdom Katılım Ekim 2013
2K Takip Edilen446 Takipçiler
Khaled Abdel-Maksoud retweetledi
Frank Nielsen
Frank Nielsen@FrnkNlsn·
pyBregMan: A Python library for geometric computing on BREGman MANifolds with applications. # Installation !pip install pyBregMan and check the readme example: github.com/alexandersoen/…
Frank Nielsen tweet media
English
1
25
123
7.2K
Khaled Abdel-Maksoud retweetledi
Frank Nielsen
Frank Nielsen@FrnkNlsn·
Some books on high-dimensional probability and statistics, including high-dimensional covariance matrices.
Frank Nielsen tweet media
English
4
77
472
38K
Khaled Abdel-Maksoud retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning | @NatureComms - Propose FSFP with model-agnostic meta-learning, Learning to Rank (ListMLE loss), and LoRA to enhance protein language models for few shot learning of fitness - In the meta-learning stage, the first two tasks use DMS datasets from the two proteins most similar to the target (cosine similarity + embeddings), and the third task uses pseudo-labeled data from GEMME leveraging MSA - Foundation models include ESM-1v, ESM-2, SaProt Link: nature.com/articles/s4146…
Leo Zang tweet media
English
0
15
80
5.3K
Khaled Abdel-Maksoud retweetledi
Leo Zang
Leo Zang@LeoTZ03·
FlowPacker: Protein side-chain packing with torsional flow matching - FlowPacker, a fast and accurate model for predicting side-chain conformations using Torsional Flow Matching and Equivariant Graph Attention Networks - Inference with an exponential schedule for the vector field and Euler solver for sampling Preprint: biorxiv.org/content/10.110…
Leo Zang tweet media
English
0
29
127
7.6K
Khaled Abdel-Maksoud retweetledi
Thomas Wolf
Thomas Wolf@Thom_Wolf·
There was a super impressive AI competition that happened last week that many people missed in the noise of AI world. I happen to know several participants so let me tell you a bit of this story as a Sunday morning coffee time. You probably know the Millennium Prize Problems where the Clay Institute pledged a US$1 million prize for the first correct solution to each of 7 deep math problems. To this date only one of these, the Poincaré conjecture, has been solved by Grigori Perelman who famously declined the award (go check Grigori out if you haven't the guy has a totally based life). So this new competition, the Artificial Intelligence Math Olympiad (AIMO) also came with a US$1M prize but was only open to AI model (so the human get the price for the work of the AI...). It tackle also very challenging but still simpler problems, namely problems at the International Math Olympiad gold level. Not yet the frontier of math knowledge but definitely above what most people, me included, can solve today. The organizing committee of the AIMO is kind-of-a who-is-who of highly respected mathematicians in the world, for instance Terence Tao widely famous math prodigy widely regarded as one of the greatest living mathematicians. Enter our team, Jia Li, Yann Fleuret, and Hélène Evain. After a successful exit in a previous startup (that I happen to have know well when I was an IP lawyer in a previous life but that's for another story) they decided to co-found Numina as a non-profit to do open AI4Math. Numina wanted to act as a counterpoint to AI math efforts like DeepMind's but in a much more open way with the goal to advance the use of AI in mathematics and make progress on hard, open problems. Along the way, they managed to recruit the help of some very impressive names in the AI+math world like Guillaume Lample, co-founder of Mistral or Stanislas Polu, formerly pushing math models at OpenAI. As Jia was participating in the code-model BigCode collaboration with some Hugging Face folks, came the idea to collaborate and explore how well code models could be used for formal mathematics. For context, olympiad math problems are extremely hard and the core of the issue is in the battle plan you draft to tackle each problem. A first focus of Numina was thus on creating high quality instruction Chain-of-Thought (CoT) data for competition-level mathematics. This CoT data has already been used to train models like DeepSeek Math, but is very rarely released so this dataset became an unvaluated ressource to tackle the challenges. BigCode's lead Leandro put Jia in touch with the team that trained the Zephyr models at Hugging Face, namely, Lewis, Ed, Costa and Kashif with additional help from Roman and Ben and the goal became to have a go at training some strong models on the math and code data to tackle the first progress prize of AIMO. And the trainings started: Jia being an olympiad coach, was intimately familiar with the difficulty level of these competitions and able to curate an very strong internal validation set to enable model selection (Kaggle submissions are blind). While iterating on dataset construction, Lewis and Ed from Hugging Face focused on training the models and building the inference pipeline for the Kaggle submissions. As often in competition it was an intense journey with Eureka and Aha moments pushing everyone further. Lewis told me about a couple of them which totally blow my mind. A tech report is coming so this is just some "along the way" nuggets that will be soon gathered in a much more comprehensive recipe and report. Learning to code: The submission of the team relied on self-consistency decoding (aka majority voting) to generate N candidates per problem and pick the most common solution. But initial models trained on the Numina data only scored around 13/50... they needed a better approach. They then saw the MuMath-Code paper (arxiv.org/abs/2405.07551) which showed you can combine CoT data with code data to get strong models. Jia was able to generate great code execution data from GPT-4 to enable the training of the initial models and get to impressive boost in performance. Taming the variance: Another Ahah moment came at some point when a Kaggle member shared a notebook showing how DeepSeek models worked super well with code execution (the model breaks down the problem into steps and each step is run in Python to reason about the next one). However, when the team tried this notebook they found this method had huge variance (the scores on Kaggle varied from 16/50 to 23/50). When meeting in Paris for a hackathon to improve this issue (like the HF team often does) Ed had the idea to frame the majority voting as a "tree of thoughts" where you'd progressively grow and prune a tree of candidate solutions (arxiv.org/abs/2305.10601). This had an impressive impact on the variance and enabled them to be much more confident in their submissions (which showed in how the model ended up performing extremely well on the test set versus the validation set) Overcoming compute constraints: the Kaggle submissions had to run on 2xT4s in under 9h which is really hard because FA2 doesn't work and you can't use bfloat16 either. The team explored quantization methods like AWQ and GPTQ, finding that 8-bit quantization of a 7B model with GPTQ was best Looking at the data: a large part of the focus was also on checking the GPT-4 datasets for quality (and fixing them) as they quickly discovered that GPT-4 was prone to hallucinations and failing to correctly interpret the code output. Fixing data issues in the final week led to a significant boost in performance. Final push: The result were really amazing and the model climbed to the 1 place. And even more, while tying up for first place on the public, validation leaderboard (28 solved challenges versus 27 for the second place), it really shined when tested on the private, test leaderboard where it took a wide margin solving 29 challenges versus 22 for the second team. As Terence Tao himself set it up, this is "higher than expected" Maybe what's even more impressive about this competition, beside the level of math these models are already capable of is how ressource contraint the participants were actually, having to run inference in a short amont of time on T4 which only let us imagine how powerful these models will become in the coming months. Time seem to be ripe for GenAI to have some impact in science and it's probably one of the most exciting thing AI will bring us in the coming 1-2 year. Accelerating human development and tackling all the real world problems science is able to tackle.
Thomas Wolf tweet mediaThomas Wolf tweet mediaThomas Wolf tweet mediaThomas Wolf tweet media
English
62
500
2.9K
650.9K
Khaled Abdel-Maksoud retweetledi
Leo Zang
Leo Zang@LeoTZ03·
Reinforcement Learning for Sequence Design Leveraging Protein Language Models - Investigate RL algorithms for protein sequence design using pLM as a reward function - Use ESMFold as the oracle pLM, and Distill it into a smaller model to serve as the proxy reward model - Train the proxy model with the Atlas dataset by a mean squared regression objective of pTM scores, finetuning periodically on sequences and their oracle scores Preprint: arxiv.org/abs/2407.03154
Leo Zang tweet media
English
1
26
95
11K
Khaled Abdel-Maksoud retweetledi
dr. jack morris
dr. jack morris@jxmnop·
recently read one of the most interesting LLM papers i've ever read, the story goes something like this > dutch PhD student/researcher Eline Visser lives on remote island in Indonesia for several years > learns the Kalamang language, an oral language with only 100 native speakers > she writes "The Grammar of Kalamang", a textbook on how to write in Kalamang > since Kalamang is a spoken language only, TGOK is the only text on earth written in it > so, there is no internet data in written Kalamang > so, language models haven't read any Kalamang during training > in the paper, researchers explore how to teach a language model a new language from a single book > they evaluate various types of fine-tuning and prompting > much to my chagrin, prompting wins (and it's not close) > larger models & longer context windows help a lot by the way, seems like humans still win at this task (for now)
English
33
201
1.8K
290.7K
Khaled Abdel-Maksoud retweetledi
Gabriel Peyré
Gabriel Peyré@gabrielpeyre·
Kernel methods can be accelerated using random projections to perform a low-rank approximation of the kernel. For translation-invariant kernels, one can use Fourier projections.
English
7
128
894
45.2K
Khaled Abdel-Maksoud retweetledi
Frank Nielsen
Frank Nielsen@FrnkNlsn·
🎓Kullback-Leibler divergence between densities of an exponential family = reverse Bregman divergence wrt the cumulant function 🎉Kullback-Leibler divergence between non-normalized densities = reverse Bregman divergence wrt the partition function 👉 arxiv.org/abs/2312.12849
Frank Nielsen tweet media
English
5
18
97
7.2K
Khaled Abdel-Maksoud retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency. This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the neural network parameters), build up a computational graph with operations like + and * that mix them, and the graph ends with a single value at the very end (the loss). You then go backwards through the graph applying chain rule at each node to calculate the gradients. The gradients tell you how to nudge your parameters to decrease the loss (and hence improve your network). Sometimes when things get too complicated, I come back to this code and just breathe a little. But ok ok you also do have to know what the computational graph should be (e.g. MLP -> Transformer), what the loss function should be (e.g. autoregressive/diffusion), how to best use the gradients for a parameter update (e.g. SGD -> AdamW) etc etc. But it is the core of what is mostly happening. The 1986 paper from Rumelhart, Hinton, Williams that popularized and used this algorithm (backpropagation) for training neural nets: cs.toronto.edu/~hinton/absps/… micrograd on Github: github.com/karpathy/micro… and my (now somewhat old) YouTube video where I very slowly build and explain: youtube.com/watch?v=VMj-3S…
YouTube video
YouTube
Andrej Karpathy tweet media
English
198
1.8K
14.9K
1.6M
Khaled Abdel-Maksoud retweetledi
Alex Zhavoronkov, PhD (aka Aleksandrs Zavoronkovs)
Anyone interested in starting or ramping up in generative chemistry should read this paper and share it with their friends, students, and unrelated medicinal and computational chemists to help them get into the field. This is probably the most concise and useful review of generative AI in small molecule chemistry, which takes 20-30 min for experts to read but provides very deep insights (11 pages, mostly figures). If you want to see what published generative models and platforms worked experimentally for which targets and with what hit rates, you can print it out and get a cheat sheet. Of course, the most commercially tractable examples are rarely published but from what I can see - pretty much every published example was captured and scrutinized. This group reconstructed the timeline and did a gargantuan amount of work that an army of modern LLM agents would not be able to complete with a high degree of accuracy since many of the numbers are in image format and require additional processing. I was planning to write a review like this one day, but this group beat me to this and did a way better job as I saw some of the models I was not aware of. nature.com/articles/s4225…
Alex Zhavoronkov, PhD (aka Aleksandrs Zavoronkovs) tweet mediaAlex Zhavoronkov, PhD (aka Aleksandrs Zavoronkovs) tweet media
English
4
26
128
22.5K
Khaled Abdel-Maksoud retweetledi
Antonio Terpin
Antonio Terpin@antonio_terpin·
What are first-order optimality conditions in the Wasserstein space good for? Learning diffusion at lightspeed is one example! (Link to the pre-print, now available on arXiv, below) With @florian_dorfler and Nicolas Lanzetti, we deployed the recent advancements in Variational Analysis in the Wasserstein Space to design a model that learns the dynamics of diffusion processes, at lightspeed. ⚡️ How? 1/4🧵 #Diffusion #MachineLearning #OptimalTransport #Optimization
Antonio Terpin tweet media
English
1
35
187
25.3K
Khaled Abdel-Maksoud retweetledi
Yilun Du
Yilun Du@du_yilun·
Introducing our @icml_conf paper: Learning Iterative Reasoning through Energy Diffusion! We formulate reasoning as optimizing a sequence of energy landscapes. This enables us to solve harder problems at test time with more complex optimization. Website: energy-based-model.github.io/ired/
GIF
English
7
93
588
57.6K
Khaled Abdel-Maksoud retweetledi
Jarek Liesen
Jarek Liesen@JarekLiesen·
🤖 RL agents are trained and evaluated in the same env. What performance gains could we achieve when training in a meta-learned synthetic env instead? 🌍 Excited to share our paper: "Discovering Minimal RL Environments" 📝 arxiv.org/abs/2406.12589 💻 github.com/keraJLi/synthe…
Jarek Liesen tweet media
English
6
36
156
21.1K