ML TLDR

532 posts

ML TLDR

@MLsummaries

Summarizing ML concepts one at a time.

Blog at Katılım Mart 2021

301 Takip Edilen3.9K Takipçiler

ML TLDR retweetledi

Jim Fan@DrJimFan·28 Ara

Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild west of robotics. 3 lessons I learned in 2025. 1. Hardware is ahead of software, but hardware reliability severely limits software iteration speed. We've seen exquisite engineering arts like Optimus, e-Atlas, Figure, Neo, G1, etc. Our best AI has not squeezed all the juice out of these frontier hardware. The body is more capable than what the brain can command. Yet babysitting these robots demands an entire operation team. Unlike humans, robots don't heal from bruises. Overheating, broken motors, bizarre firmware issues haunt us daily. Mistakes are irreversible and unforgiving. My patience was the only thing that scaled. 2. Benchmarking is still an epic disaster in robotics. LLM normies thought MMLU & SWE-Bench are common sense. Hold your 🍺 for robotics. No one agrees on anything: hardware platform, task definition, scoring rubrics, simulator, or real world setups. Everyone is SOTA, by definition, on the benchmark they define on the fly for each news announcement. Everyone cherry-picks the nicest looking demo out of 100 retries. We gotta do better as a field in 2026 and stop treating reproducibility and scientific discipline as second-class citizens. 3. VLM-based VLA feels wrong. VLA stands for "vision-language-action" model and has been the dominant approach for robot brains. Recipe is simple: take a pretrained VLM checkpoint and graft an action module on top. But if you think about it, VLMs are hyper-optimized to hill-climb benchmarks like visual question answering. This implies two problems: (1) most parameters in VLMs are for language & knowledge, not for physics; (2) visual encoders are actively tuned to *discard* low-level details, because Q&A only requires high-level understanding. But minute details matter a lot for dexterity. There's no reason for VLA's performance to scale as VLM parameters scale. Pretraining is misaligned. Video world model seems to be a much better pretraining objective for robot policy. I'm betting big on it.

English

139

257

1.6K

297.1K

ML TLDR retweetledi

Gowthami@gowthami_s·26 Eyl

🎬 Finally got time to go through the "Video Models Are Zero-Shot Learners and Reasoners" paper. The impressive results aside, I want to thank the GDM team for compiling / sharing a wide range of visual tasks, likely to become a key benchmark in the coming years! This paper also highlights how thorough Google is in terms of evaluations (tbh it's been evident over the years - flamingo and genie papers also eval on insane number of tasks!) If their eval set is so task-rich - imagine how diverse their training set might've been for training these models. :) The authors curated around 62 tasks - which are broadly classified into 4 categories: Perception, Modeling, Manipulation and Reasoning. While Veo3 isn't the best model out there for any of these tasks - but its a good generalist model, which performs reasonably well on most of the tasks, without task-specific training! (akin to most generalist LLMs circa 2023) A 🧵 -

English

2.9K

ML TLDR retweetledi

Gowthami@gowthami_s·25 Haz

Here’s my take on why coding interviews are a lossy approximation of a candidate’s ability. Most of time, a candidate gets rejected cuz they aren’t able to finish the question, however real-world coding is not time bound. You can be an excellent programmer but how well you do in an interview depends on how in-distribution the interview question is, or how comfortable you are with the interview format. So the system is rigged to reward people who are good at interviewing rather than someone who can solve a (perhaps) an out of distribution problem given enough time. Another thing I noticed is, given how good coding LLMs are… you should be testing the general understanding, like why you do certain thing or how do you eval something rather than how well you can rote-memorize Attention forward and backward passes. 🤷‍♀️

English

160

16.3K

ML TLDR@MLsummaries·23 Haz

OG paper link - arxiv.org/abs/2305.13245

Norsk

102

ML TLDR@MLsummaries·23 Haz

Group Query Attention (GQA) in PyTorch-style code -

Français

131

ML TLDR@MLsummaries·23 Haz

In standard multi-head attention, every query head has its own key-value heads — costly at inference. GQA ties multiple query heads to shared key-value heads, reducing KV cache size and memory.

English

140

ML TLDR@MLsummaries·23 Haz

What is Group Query Attention (GQA)? A recent twist on attention that speeds up inference without hurting quality — made famous in models like Llama2 and Mistral. 🧵 A quick explainer: #LLMs

English

386

ML TLDR@MLsummaries·16 Oca

For those interested in robotics and computer vision, this talk by Prof. Mallick is a must-attend. When ⏰ - Monday Jan 17th, 1pm EST (10am PST) Where📍- given YouTube link! #deeplearning #robotics #vision ow.ly/6xhs30s7q7s

English

ML TLDR@MLsummaries·16 Eki

@AlejandroPiad @FaneleChester Hi, unfortunately none of us have done any ML boot camp. We all took the long way and are doing PhDs! 😅 But deeplearning.AI’s specialization is the quickest way to catch up with most of the important concepts of deep learning as of today.

English

Alejandro Piad Morffis@alepiad·16 Eki

@FaneleChester This is sadly out of my league, but maybe our friends at @MLsummaries can help? Or anyone else who's done a boot camp?

English

Alejandro Piad Morffis@alepiad·16 Eki

Hey folks 🖖! 🎙️ Is Saturday again! Let's do another Q&A session. Ask me anything about Computer Science, AI, and Machine Learning. I cannot promise to give you the answer, but maybe we can figure it out together 😉! Feel free to weigh in and answer as well 👇!

English

ML TLDR@MLsummaries·1 Eyl

@neilhoulsby @lucidrains @ykilcher @labmlai Read our complete TLDR version of the paper in our blog! Contributed by @gowthami_s Blog link: medium.com/ml-summaries/m…

English

ML TLDR@MLsummaries·1 Eyl

@neilhoulsby @lucidrains @ykilcher Annotated code of the model by @labmlai here: nn.labml.ai/transformers/m… #python #100daysofcode #mlpmixer

English

ML TLDR@MLsummaries·1 Eyl

In this thread, we will discuss the MLP-Mixer from Google by Ilya Tolstikhin, @neilhoulsby et al. MLP-Mixer is a novel convolution-free, attention-free architecture with results on par with ResNets and ViTs. A 🧵 Link: arxiv.org/abs/2105.01601 #DeepLearning #ComputerVision

English

Keşfet

@neilhoulsby @lucidrains @ykilcher @labmlai @gowthami_s @elonmusk @BarackObama @taylorswift13