
Mohammed Sabry
724 posts

Mohammed Sabry
@M___Sabry
PhD @AdaptCentre @dcucomputing. Making models efficient in training/inference via modularity principles. Prev: research intern @GoogleAI


Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI



We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

Continual learning is the next big thing in AI. Today’s deep networks show catastrophic forgetting where new data points cause old knowledge to get erased because backpropagation updates all weights in a network, causing old useful info to be overwritten. Contrast this with brain - it doesn’t do backpropagation. no global updates, only local updates with each new data point. The day we have a credible alternative to backpropagation is the day we will solve continual learning. It’s an exciting time to be doing AI research!


The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!

The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!


The ladder of intelligence is the ladder of abstraction. L1: Memorizing answers (no generalization) L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization) L3: Synthesizing causal rules on the fly (strong generalization) L4: Discovering general principles, metacognition (extreme generalization) To achieve compounding AI you need to reach L4.

An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3









