Matin Urdu

23 posts

Matin Urdu

Matin Urdu

@DariusMatin_

Masters Data Science @ETH_en. Currently at @Princeton as a Visiting Student Research Collaborator constructing world models for generalist agents 🏗️🌎🤖.

Princeton, New Jersey Katılım Mayıs 2026
20 Takip Edilen6 Takipçiler
Matin Urdu
Matin Urdu@DariusMatin_·
@neetcode1 Does Aristotle’s politics really cover how technology shapes the division of labour over time? We only had to read a summarised version of it in high school but maybe reading it in full is a good idea…
English
1
0
3
483
NeetCode
NeetCode@neetcode1·
It would be the funniest outcome for sure. If you told someone 1000 years ago we would have electricity, planes, rockets, phones and the internet, and unlimited food, they would probably think there weren’t enough jobs too. Idk if he actually means it, but I think I’m team Bezos on this one. Per cloudflare ceos recommendation to to Dario, I’ve been skimming parts of Aristotle’s politics. So many insanely smart people miss something so obvious: technology progresses quickly, but human nature never changes.
NeetCode tweet media
English
4
4
85
12.7K
Matin Urdu
Matin Urdu@DariusMatin_·
@Bluebearmonkey @akarlin Normalising by number of researchers helps a bit, but even then China dominates the top 20 (e.g. refer to csrankings.org ). I guess it is basically impossible to compete with a population of >1B taking their research seriously.
English
1
0
2
56
Anatoly Karlin 🧲💯
Anatoly Karlin 🧲💯@akarlin·
Anglo universities getting hit by a double whammy. First, AI undermines the entire value proposition of the academy. Second, Chinese universities now dominate actual high quality research (as proxied by the Nature Index). 9 of the top ten world universities by this metric are now Chinese. The old university prestige rankings might still say otherwise but that is cope and irrelevant. Universities exist to create new knowledge. The rankings will adjust to reflect this new reality or they will lose relevance. You could make a case for US and UK academia having some "intangible" value in their "unique" spirit of truth-seeking and freedom of inquiry. However, they voluntarily destroyed that reputation in a decade of Wokeness 🤷‍♀️. So now there's just the hard metrics, and they're not looking too good.
Anatoly Karlin 🧲💯 tweet media
OK Then@okaythenfuture

I’ve written this before but we’re already in the early stages of a massive Anglo university bust(US/UK/Canada/Australia). It’ll intensify in the 2030s but thousands of colleges are going to go out of business. Terrible student demographics, far less Chinese students wanting to study abroad, not enough wealthy students from the rest of the world to replace said Chinese, and AI just completely destroying most entry level knowledge work and eventually up the ladder as well. Only truly elite universities and public universities will survive IMO. Mid tier schools like syracuse, a lot of the small private liberal arts schools, and anything that’s private and low tier are absolutely finished. Universities will go back towards being an elite privilege. The golden age of the university was fun while it lasted.

English
21
53
360
28K
Matin Urdu
Matin Urdu@DariusMatin_·
@Bluebearmonkey @akarlin This kinda proves his point. Those rankings reflect general prestige but many of the listed universities do not have a very strong research output. If prestige matters less in the future (daring hypothesis), then the ranking will shift to China’s favor.
English
1
0
2
63
Blue Bear
Blue Bear@Bluebearmonkey·
@akarlin C’mon man, what are you talking about? Lookit how the top 10 schools have maintained their rankings over decades
Blue Bear tweet media
English
1
1
11
696
Matin Urdu
Matin Urdu@DariusMatin_·
@kamilkazani Pompey gave up Rome without a fight, but technically he did initially control Rome.
English
0
0
0
150
Kamil Galeev
Kamil Galeev@kamilkazani·
It seems that in Roman civil wars, the factions controlling Levant always lost to the forces from the European part of the empire. The only exception I can think of is the last Severans (the Emesene dynasty), when a Syrian-based faction vanquished the forces of Europe -> seized the throne
English
19
3
125
20.9K
Matin Urdu
Matin Urdu@DariusMatin_·
@samlakig Still the most insane proof technique I saw. It actually feels magical when you see (or read) it for the first time.
English
0
0
0
46
sam laki
sam laki@samlakig·
sam laki tweet mediasam laki tweet media
ZXX
2
2
28
3.4K
Matin Urdu
Matin Urdu@DariusMatin_·
@corsaren The problem with these classifiers are (as Social Media sites had to learn themselves) either way too eager or easily circumventable (which is the reason why every says “self-delete” and “bundle-of-sticks-ism” on YouTube instead of the real terms).
English
0
0
1
18
corsaren • vibecamping on sat
I don’t code or do ML research, so I’m not worked up like the rest of the tl, but I did spend the last two weeks reading about Ant’s safeguard systems…and I’m still very confused? They’ve been building Constitutional Classifiers for over a year now! And they can’t differentiate AP bio questions from gain of function research???
corsaren • vibecamping on sat@corsaren

Judging by what sort of content is getting hit on the ML side, I’d say their Classifiers are just like, surprisingly bad. Which is weird because these types of Classifiers aren’t exactly a new part of their safety pipeline. x.com/semianalysis_/…

English
6
4
44
2.1K
Matin Urdu
Matin Urdu@DariusMatin_·
@MikaStars39 The main problem with offline RL is that there are few high quality datasets (until recently). And the good datasets are mostly taken from simulated envs, which means you will face a sim-real gap. There are promising approaches to scale online RL (paper: 1000 layers for SSL RL).
English
0
0
0
180
MikaStars★
MikaStars★@MikaStars39·
I'm becoming gradually skeptical about the scaling of RL. Considering many scenarios, e.g., organic synthesis is hard to roll out and verify at RL-training time. I believe there are many frontier-science domains where lots of people would be willing to produce a great deal of real trajectories for training, but none of this can scale within an RL environment. It seems that offline RL / SFT / preference learning can scale more broadly, rather than online RL. 🤔
English
2
2
35
3.1K
Matin Urdu
Matin Urdu@DariusMatin_·
@jp54362 You really don’t need a lot of math. Most of the math used in papers is used as compact notation instead of pseudo code. Just push through, you don’t need to know advanced stuff like graduate level measure theory unless you want do theory work for diffusion models.
English
0
0
0
17
Jaysen ♨️
Jaysen ♨️@jp54362·
if a student is obsessed with AI but weak at math, should he: A) struggle through AI/ML and learn the math or B) accept he's not built for the technical side and focus on AI content instead what should he do?
English
6
1
14
655
Matin Urdu
Matin Urdu@DariusMatin_·
@yoavgo AlphaFold also struggles with out of distribution protein sequences. Most bioinfo people I spoke to emphasised how absurdly good the predictions were for known/similar sequences and how god-awful the OOD predictions were.
English
0
0
3
123
(((ل()(ل() 'yoav))))👾
calling the outputs of AlphaFold "discoveries" vs the outputs of LLMs which are "not capable of being discoveries" is kinda absurd, and it works only by reducing LLMs to a much narrower definition. and even then i dont think it holds.
Richard Sutton@RichardSSutton

A new and possibly controversial perspective: In this video, I explain the sense in which generative AI trained by supervised learning is incapable of making novel discoveries. youtu.be/K5LAFEjTlBA The text of the speech: AI Creativity and Discovery Good day ladies and gentlemen. I regret that I am unable to be with you all today to engage in a back-and-forth discussion, but I am nevertheless pleased to be able to share with you, via this recording, some high-level thoughts about the current and future state of artificial intelligence, and in particular about AI’s relationship to science and mathematics, which is, as I understand it, the central focus of this meeting and of the SAIR Foundation. I would like to start with an old joke; I am sure you have heard it before. It is the one about the researcher whose work is being evaluated, and the review comes back, and says “This work is both novel and good. Unfortunately, the parts that are good are not novel, and the parts that are novel are not good.” My first point about AI is that this assessment applies exactly to large parts of AI as we know it today. Not all of today’s AI, but a large part of it. Pretty much all of what we mean by “Generative AI”---which includes large language models, and the images and video models, and even the new methods for learning world models. All of these AIs take large numbers of examples and produce a “model” which behaves similar to the examples, that is, which generates text like people, or images like artists or nature, and videos like we find on the internet. Don’t get me wrong, Generative AI can be extremely useful. No doubt about that. But the assessment of the joke still applies. These systems can produce output that is both novel and good, but not at the same time. In many ways this is just absolutely not a problem. When we ask an AI for an answer from the internet, or to summarize a document, we don’t want it to be novel. We are happy if the quality of the answer, the goodness, comes from the source material—from the people who wrote the document or the articles on the internet. If the AI’s answer is novel it means it is going beyond the source material, adding something beyond it. This is what we call “hallucinations”. In most cases, we don’t like it when the AI makes something up, when it adds something novel. One exception, of course, is when we are looking not for facts or reality, but for fiction and entertainment. We might ask for a bedtime story for a child, or an image based on existing images on the internet but which is nevertheless different and distinct from them. In these cases, it is never easy for us to know how creative the AI is actually being, as we do not know how close the AI’s story, poem, or image is to the source material. In a real practical sense we can not know this because the internet is too big, the possible sources that the AI may draw upon are too numerous. When we ask for a fiction or novelty, the AI can give it to us because its processing is in part stochastic. Every decision can go multiple ways and will go different ways and produce a different trajectory every time. The trajectory can be random—and thus novel—or it can be based on the training data—and thus “good” because the training data is good, sourced from people or reality. Thus, the trajectory is either novel or good—based on randomness or based on data—but never both at the same time. Really, I think it is okay if the output of Generative AI is never good and novel at the same time. For the researcher in the joke this is a devastating criticism, but for most things it is not, and for Generative AI it is not. Generative AI is meant to be a mimic. This is what supervised learning is for. Generative AI can be extremely useful, even when it just mimics, if it is faster, or cheaper, or smaller, or more customizable, or more copy-able, than the thing being mimicked. It is okay if Generative AI cannot be both novel and good at the same time. It is still a transformative technology. But it is a limitation. And remember we are here to use AI for science and mathematics, and for these areas the assessment of the reviewer in the joke is devastating. For these areas we need true creativity and discovery. Generative AI—or Mimicking AI—will never get where us there. For these we need something more, and indeed we have something more in other parts of AI. We have many AI systems which can give us more. We have AlphaGo with its world-changing move 37, or AlphaZero with its brilliant original chess-playing style. We have GT-Sophy that drives simulated racecars better than any human. We have AlphaFold and AlphaProof and Claude-Code, which have brought true advances in science, mathematics, and programming. We have RL-Lyft which optimizes the assignment of cars to passengers in the ride-hailing business. All these systems have found things that are both novel and good. And, truth be told, some language models have been augmented in ways that make them more than Generative AI based on supervised learning. All these systems have some additional features that make them capable of true creativity and true discovery. It is important for us to recognize what this is—and that it is not present in ordinary, garden-variety Generative AI. It is something that can not come from just supervised learning, from learning from examples. What is it? Well, it is a simple thing, a commonsense thing. It is not new. We have many names for it, but unfortunately none of them are very good names. I will call it Discovery. Basically, Discovery is just the idea of trying many things and seeing which of them work, then keeping those that worked the best. Evolution by natural selection works this way. The scientific method works this way. And just ordinary life and learning works this way. We try things and remember what works. What could be more obvious? In this behavioral case, psychology has two names for it— “instrumental learning” and “operant conditioning”—and in machine learning it is what we mean by “reinforcement learning”. We also see the idea of Discovery in planning and combinatorial search—anything that involves the idea of “generate and test”. The essence of Discovery is to combine three steps: 1. Variation, 2. Evaluation, and 3. Selective retention. Of course, I am not the first to say this. I am not the first to point out that this combination of steps is key to science, to evolution by natural selection, and to animal behavior. I think particularly of papers by Donald Campbell, by Daniel Dennett, and by Gary Cziko. What is new in my remarks is to directly relate the idea of Discovery to modern AI to help us see that it is not present in supervised learning or Generative AI—in particular, that Discovery is not present in backpropagation or gradient descent. Let me say explicitly what is missing from Generative AI. As we have remarked, these systems do have a stochastic aspect, so they do generate a variety of trajectories and behavior. What is missing is the Evaluation step. The generator was pre-trained by supervised learning, leaving no way at runtime to Evaluate what it generates. And of course without Evaluation there can be no Selective retention, and thus no Discovery. The variation can bring novelty, but without evaluation there is no Discovery, and arguably, no creativity. That is, I would say that creativity requires that the new things generated be Evaluated. Without evaluation, and retention of the best, there is nothing created. The novelty flickers into existence but, if its value is unrecognized, it flickers away and is lost. In many cases, Evaluation is done by people to make a discovery. As when we have Generative AI make many pictures for us, and then we pick the one that we like the best. The human+AI system completes the discovery. In many other cases, the Evaluation comes from a clear objective. Some moves lead to checkmate, some steps lead to a proof, some actions result in high reward, some genotypes make more copies, some theories explain the data better. Some prefer the Variation step to be called Blind variation, where “blind” here means that it is uninformed, a shot in the dark. It does not need to be completely uninformed; a good scientist does not select theories to test at random. But neither can it be completely informed and determined. There must be some uncertainty about where the answer lies in order for there to be a discovery. In practice, the variation is partly informed and partly blind, but it is the blind part that corresponds to the discovery. Now let us briefly go all the way to modern deep learning, to the backpropagation algorithm. At first it might seem that backpropagation is incapable of discovery because it is deterministic and thus incapable of variation. But this is not correct. The weight updates of backprop are deterministic, but the weights are initialized to small random values. The random initialization is often downplayed, but in fact it is a necessary form of variation; it must be done properly to get good performance. In backprop this Variation is done once, at network initialization, so its effect is temporary, and later the network may lose its ability to learn. This is the weakness of deep learning that is alleviated with a new algorithm that my group presented in Nature a couple of years ago. Our “continual backpropagation” made one small change: every so often a less-used neuron would be re-initialized to small random weights. This allows the variation to continue and plasticity to be retained. Although there is much more to be said about Creativity and Discovery, this is the key point: they are more than supervised learning, more than pattern recognition, more than prediction, and more than world modeling. Those things are important, but they alone will not bring us to discovery. Discovery requires Evaluation from a person or from an explicit goal, and only in the latter case will we attain full autonomy. So that is my call to arms. If we want the full power of AI scientists, then we should share the goals with them so they can create, evaluate, discover, and in these ways fully participate in achieving the goals. Let’s be bold! Let’s fully automate Creativity and Discovery!

English
7
4
85
20.3K
Matin Urdu
Matin Urdu@DariusMatin_·
@menhguin Intro to casual inference might be even more helpful.
English
0
0
6
10K
Matin Urdu
Matin Urdu@DariusMatin_·
@mayabechlerspei Really cool! This seems to be very similar to the idea of jumping knowledge networks for GNNs (but done in a clever way).
English
0
0
0
241
Maya Bechler-Speicher
Maya Bechler-Speicher@mayabechlerspei·
🤔 Why do we still rely on the final layer of an LLM, when different layers encode different information? 🤔 In our new work, “Improving LLM Final Representations with Inter-Layer Geometry” (ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling) we show that actually, LLMs do not have one “best” layer. We introduce the Cayley-Encoder: an efficient and effective geometric encoder that learns one strong representation from all layer representations of the LLM, without biasing the representation toward any specific layer. While adding at most 0.1% learned parameters to the LLM, the Cayley-Encoder achieves large empirical gains over LoRA fine-tuning, final-layer representations, expensive attention-based aggregation, and methods that optimize specific layers for the task.
English
9
26
247
16.4K
Matin Urdu
Matin Urdu@DariusMatin_·
@teortaxesTex “Significantly expanding the potential user base for NVIDIA Hardware is bad for NVIDIA.” How does one even arrive at such a take?
English
0
0
4
210
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
didn't even need to check the account location info, the "better CUDA kernels should make NVIDIA investors sweat" take was a dead giveaway
How To Prompt@HowToPrompt__

ByteDance has published a paper that should make every NVIDIA investor sweat. They trained an AI that writes CUDA better than humans experts. They call it CUDA Agent. And it completely rewrites the economics of AI hardware. They built a massive agentic reinforcement learning loop. The AI writes a kernel, compiles it, profiles the hardware, analyzes the bottlenecks, and rewrites the code until it's flawless. It learned how to optimize memory access patterns and hardware tiling strategies that traditional compilers miss. The results are staggering. On the industry-standard KernelBench, CUDA Agent completely destroyed traditional compilers. It delivered code that runs up to 3.2x faster than PyTorch's native execution. On the hardest, most complex models, it beat the strongest proprietary models in the world—including Claude Opus 4.5 and Gemini 3 Pro, by 40%. It didn't just match human experts. It started discovering optimizations that static compilers literally cannot see. Here is why this is a massive threat to NVIDIA. NVIDIA's dominance relies on the fact that CUDA is incredibly hard to master. Developers get locked in because optimizing code for other chips is too painful. But if an AI agent can autonomously generate hyper-optimized hardware kernels... You don't need a team of $500k a year CUDA engineers to build world-class infrastructure. And if an AI can autonomously master CUDA, it can master AMD's ROCm. Or custom silicon. The impenetrable software wall protecting NVIDIA's monopoly just got breached by a reinforcement learning loop. If anyone can automatically squeeze maximum performance out of any chip... Hardware becomes a commodity.

English
5
2
107
8.3K
Matin Urdu
Matin Urdu@DariusMatin_·
@robinhanson It also depends heavily “where” you do your stock picking. If you only consider public companies, it is probably insanely difficult to outperform the market all on your own. Private markets, however, are probably much more inefficient…
English
0
0
0
9
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
I don't care what your model benchmarks say. Codex xhigh just tried to add bubble sort to my high-perf C experience buffer
English
35
8
935
96.5K
Matin Urdu
Matin Urdu@DariusMatin_·
@mitrma Very cool work! I am curious, did you also compare this approach to the case of relabelling with every goal (on a small feasible dataset) and measured the results along the training trajectory (convergence speed, stability, final results, etc.)?
English
0
0
0
18
Michael Matthews
Michael Matthews@mitrma·
Hindsight Experience Replay has become the ubiquitous method for goal-conditioned reinforcement learning, but leaves open the question of which goal to relabel with. In this work, accepted at ICML, we propose instead simply Learning Everything All at Once (LEO). 1/
English
4
31
212
25.6K
Matin Urdu
Matin Urdu@DariusMatin_·
@gabriel1 Podcasts are even worse. Music can at least have an energising effect when doing physical tasks or rote/dull repetitive tasks lol…
English
0
0
0
7
Matin Urdu
Matin Urdu@DariusMatin_·
@blancmontagnard Mamba uses state space methods and a lot of control theory researchers are trying to unify RL with traditional control theory. The basic methods like Kalman filter are still being used and will also be used! The sentiment is a bit too pessimistic.
English
0
0
3
108
Matin Urdu
Matin Urdu@DariusMatin_·
@Mio_Mind @GordonGekko420 Why not use levels fyi for both countries to make the comparison fair? Obviously the differences are extremely exaggerated, but there is a significant difference.
Matin Urdu tweet media
English
0
0
2
31
Matin Urdu
Matin Urdu@DariusMatin_·
@teortaxesTex Yeah, I had an issue which involved compiling some C/C++ package for a python project which I threw at 5.5 after Deepseek-Pro failed and it still struggled until I gave it some "hints". The hints essentially involved literally just copy-pasting the (relevant) CMake errors lol.
English
0
0
0
10