ML TLDR

532 posts

ML TLDR banner
ML TLDR

ML TLDR

@MLsummaries

Summarizing ML concepts one at a time.

Blog at Katılım Mart 2021
301 Takip Edilen3.9K Takipçiler
ML TLDR retweetledi
Jim Fan
Jim Fan@DrJimFan·
Everyone's freaking out about vibe coding. In the holiday spirit, allow me to share my anxiety on the wild west of robotics. 3 lessons I learned in 2025. 1. Hardware is ahead of software, but hardware reliability severely limits software iteration speed. We've seen exquisite engineering arts like Optimus, e-Atlas, Figure, Neo, G1, etc. Our best AI has not squeezed all the juice out of these frontier hardware. The body is more capable than what the brain can command. Yet babysitting these robots demands an entire operation team. Unlike humans, robots don't heal from bruises. Overheating, broken motors, bizarre firmware issues haunt us daily. Mistakes are irreversible and unforgiving. My patience was the only thing that scaled. 2. Benchmarking is still an epic disaster in robotics. LLM normies thought MMLU & SWE-Bench are common sense. Hold your 🍺 for robotics. No one agrees on anything: hardware platform, task definition, scoring rubrics, simulator, or real world setups. Everyone is SOTA, by definition, on the benchmark they define on the fly for each news announcement. Everyone cherry-picks the nicest looking demo out of 100 retries. We gotta do better as a field in 2026 and stop treating reproducibility and scientific discipline as second-class citizens. 3. VLM-based VLA feels wrong. VLA stands for "vision-language-action" model and has been the dominant approach for robot brains. Recipe is simple: take a pretrained VLM checkpoint and graft an action module on top. But if you think about it, VLMs are hyper-optimized to hill-climb benchmarks like visual question answering. This implies two problems: (1) most parameters in VLMs are for language & knowledge, not for physics; (2) visual encoders are actively tuned to *discard* low-level details, because Q&A only requires high-level understanding. But minute details matter a lot for dexterity. There's no reason for VLA's performance to scale as VLM parameters scale. Pretraining is misaligned. Video world model seems to be a much better pretraining objective for robot policy. I'm betting big on it.
Jim Fan tweet media
English
139
257
1.6K
297.1K
ML TLDR retweetledi
Gowthami
Gowthami@gowthami_s·
🎬 Finally got time to go through the "Video Models Are Zero-Shot Learners and Reasoners" paper. The impressive results aside, I want to thank the GDM team for compiling / sharing a wide range of visual tasks, likely to become a key benchmark in the coming years! This paper also highlights how thorough Google is in terms of evaluations (tbh it's been evident over the years - flamingo and genie papers also eval on insane number of tasks!) If their eval set is so task-rich - imagine how diverse their training set might've been for training these models. :) The authors curated around 62 tasks - which are broadly classified into 4 categories: Perception, Modeling, Manipulation and Reasoning. While Veo3 isn't the best model out there for any of these tasks - but its a good generalist model, which performs reasonably well on most of the tasks, without task-specific training! (akin to most generalist LLMs circa 2023) A 🧵 -
English
1
5
25
2.9K
ML TLDR retweetledi
Gowthami
Gowthami@gowthami_s·
Here’s my take on why coding interviews are a lossy approximation of a candidate’s ability. Most of time, a candidate gets rejected cuz they aren’t able to finish the question, however real-world coding is not time bound. You can be an excellent programmer but how well you do in an interview depends on how in-distribution the interview question is, or how comfortable you are with the interview format. So the system is rigged to reward people who are good at interviewing rather than someone who can solve a (perhaps) an out of distribution problem given enough time. Another thing I noticed is, given how good coding LLMs are… you should be testing the general understanding, like why you do certain thing or how do you eval something rather than how well you can rote-memorize Attention forward and backward passes. 🤷‍♀️
English
10
5
160
16.3K
ML TLDR
ML TLDR@MLsummaries·
Group Query Attention (GQA) in PyTorch-style code -
ML TLDR tweet media
Français
1
0
1
131
ML TLDR
ML TLDR@MLsummaries·
In standard multi-head attention, every query head has its own key-value heads — costly at inference. GQA ties multiple query heads to shared key-value heads, reducing KV cache size and memory.
English
1
0
1
140
ML TLDR
ML TLDR@MLsummaries·
What is Group Query Attention (GQA)? A recent twist on attention that speeds up inference without hurting quality — made famous in models like Llama2 and Mistral. 🧵 A quick explainer: #LLMs
ML TLDR tweet media
English
1
0
4
386
ML TLDR
ML TLDR@MLsummaries·
For those interested in robotics and computer vision, this talk by Prof. Mallick is a must-attend. When ⏰ - Monday Jan 17th, 1pm EST (10am PST) Where📍- given YouTube link! #deeplearning #robotics #vision ow.ly/6xhs30s7q7s
English
0
2
2
0
ML TLDR
ML TLDR@MLsummaries·
@AlejandroPiad @FaneleChester Hi, unfortunately none of us have done any ML boot camp. We all took the long way and are doing PhDs! 😅 But deeplearning.AI’s specialization is the quickest way to catch up with most of the important concepts of deep learning as of today.
English
0
0
2
0
Alejandro Piad Morffis
Alejandro Piad Morffis@alepiad·
Hey folks 🖖! 🎙️ Is Saturday again! Let's do another Q&A session. Ask me anything about Computer Science, AI, and Machine Learning. I cannot promise to give you the answer, but maybe we can figure it out together 😉! Feel free to weigh in and answer as well 👇!
English
9
7
37
0