Dev Nag

1.9K posts

Dev Nag banner
Dev Nag

Dev Nag

@devnag

Founder/CEO, @TryQueryPal. Previously Founder/CTO, @WavefrontHQ (funded by @Sequoia, acq by @vmware). Oregon-born, universe-raised

San Francisco, CA Inscrit le Nisan 2007
1.1K Abonnements1.3K Abonnés
Tweet épinglé
Dev Nag
Dev Nag@devnag·
The Periodic Table of Machine Learning (Part 1 in a series): @devnag/what-separates-us-from-ai-part-1-the-periodic-table-of-machine-learning-508b237624c3" target="_blank" rel="nofollow noopener">medium.com/@devnag/what-s…
Dev Nag tweet media
English
1
6
31
0
Dev Nag
Dev Nag@devnag·
@scottbelsky The category mistake being made is like Plato’s cave — people mistake the content (the moving shadows) for the humans behind it. It’s just a proxy for connection, and it’s always been a proxy.
English
0
0
0
76
scott belsky
scott belsky@scottbelsky·
contrarian take on generative-only social apps and experiences: they will fail to sustain engagement for 3 reasons: 1. the “ego analytics” that cause ppl to come back & see who engaged with their content are missing when AI made the content. no ego = no obsession. 2. the lack of human craft/taste required to make the content will translate into a lack of care among the consumers of the content. without a human story behind the story, do we care? 3. the ease of creation will accentuate the shallowness of the subject matter. we have a higher bar when it requires friction to make something. but if it’s super easy, there is no bar.
English
37
17
308
53.1K
Dev Nag
Dev Nag@devnag·
@tobi Talked a few weeks ago about why there’s a context gap at all — context is all the unsaid stuff, usually outside the training data, and therefore in the blinders of AI linkedin.com/posts/devnag_g…
English
0
0
0
54
tobi lutke
tobi lutke@tobi·
I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.
English
348
879
8.6K
2M
Dev Nag retweeté
Jianren Wang
Jianren Wang@wang_jianren·
(1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you with a better alternative - EPO.
English
16
79
705
145.2K
Dev Nag retweeté
himanshu
himanshu@himanshustwts·
lowkey i think ilya 30u30 needs an upgrade to 50u50 now btw if you go through all of these, you’ll know at least 70% of what matters today
himanshu tweet media
English
15
84
1.9K
141.6K
Dev Nag retweeté
Daniel Litt
Daniel Litt@littmath·
In this thread I'll record some brief impressions from trying to use o3/o4-mini (the new OpenAI models) for mathematical tasks.
English
20
68
743
156.7K
Dev Nag retweeté
dr. jack morris
dr. jack morris@jxmnop·
pretty mind-blowing fact I just learned about transformer language models: the positional embeddings don't really do anything. you can just get rid of them and the model still works just as well sounds impossible, doesn't it? turns out standard LLMs aren't actually permutation-invariant because of the causal mask. so they just learn somehow to "figure out" what position they're at by counting the number of tokens they can see at a given position p crazy
dr. jack morris tweet media
English
79
118
1.7K
185.1K
Dev Nag
Dev Nag@devnag·
@NaveenGRao Doesn’t this assume that text I/O can be distilled but app I/O (text, video, gestures) can’t? We’ve already seen distillation on other non-textual modes (arxiv.org/abs/1812.02699), and UI cloning is almost here (with approaches like bolt.new)
English
0
0
0
109
Naveen Rao
Naveen Rao@NaveenGRao·
Prediction: all closed AI model providers will stop selling APIs in the next 2-3 years. Only open models will be available via APIs. Why? For an open model service, the value prop is clear...it's hard to build a scalable service to access the model and the model is commodity. The race-to-the bottom happened with the commodity already (model). Let AI app builders iterate on great UIs for apps upon scalable services with commodity capabilities Closed model providers are trying to build non-commodity capabilities and they need great UIs to deliver those. It's not just a model anymore, but an app with a UI for a purpose. If closed models are available via API, all it does is create competition for the app the closed provider is building. The secret sauce is capabilities + UI.
English
57
67
499
105.1K
Dev Nag retweeté
Prithviraj (Raj) Ammanabrolu
Prithviraj (Raj) Ammanabrolu@rajammanabrolu·
Simply, no. I've been looking at my old results from doing RL with "verifiable" rewards (math puzzle games, python code to pass unit tests) starting from 2019 with GPT-1/2 to 2024 with Qwen Math Deepseek's success likely lies in the base models improving, the RL is constant
Kevin Patrick Murphy@sirbayes

Is it feasible to do a true tabula rasa version of deepseek R1 zero, starting from an LLM with random weights, similar to alpha zero? Or is starting with an LLM which is pre trained on math required?

English
8
53
592
113.1K
Dev Nag retweeté
Peter Schmidt-Nielsen
Peter Schmidt-Nielsen@ptrschmdtnlsn·
Another key reason people are spooked: around 2016ish we started seeing the *insane* power of purely self-improving Reinforcement Learning (RL) (think AlphaZero going from no knowledge to superhuman at chess in hours), and it was formative for a lot of folks, in terms of their expectations about the progression of AI systems. Frankly, a lot of us have been waiting for the RL self-play shoe to drop with language models, but for some reason it has been surprisingly stubborn and hasn't happened much before. There seems to be at least somewhat of a breakthrough here on that front, which is spooky. The thing that makes RL scary is the ability to keep self-improving by inventing ever harder tasks for yourself that you can practice against, to keep getting higher and higher quality training data. The traditional sort of "train on the internet + some finetuning at the end" picture doesn't seem likely to crazily spiral out of control in terms of skill, while "invent new programming problems for yourself, then solve them" has much more of the flavor of something that could keep self-improving. To paint an extremely over-simplified picture of how these self-improving RL systems work: They have some part for making decisions, which we'll call the "policy" (it's just the standard term), and then some sort of mechanism for turning this policy into a better policy. In the case of chess in AlphaZero, the policy picks out a move, and the mechanism for turning the policy into a better one is to do tree search. In other words, if you have a policy that finds 1500 elo level moves, you can use that policy to find 1700 elo moves by simply searching over many possible game trees using your policy, and taking the best one. The clever trick of AlphaZero is now this: You now *distill* those 1700 elo moves back into your policy by training on them! This maybe increases your policy to being one that finds 1502 elo moves, but that's okay, because now with tree search it finds 1702 elo moves, which you in turn distill back into your policy, getting out 1504 elo moves, and so on. In a sentence: "we keep distilling what we conclude after a lot of thinking into what we conclude intuitively in a single step of thinking, which in turn improves what we conclude with a lot of thinking, and so on". Note that the above core trick is part of what we've all been waiting to see if someone will figure out for LLMs. The analogy is quite strong. You make up ever harder problems, and your "policy" is just what answers your LLM gives in a single step. The analogy to tree search is letting your LLM do chain-of-thought (CoT) reasoning. The hope is that a model that produces "1500 elo thoughts" shooting from the hip will, via CoT reasoning, produce "1600 elo thoughts" or something, and you can distill those back into the model to get a model that thinks 1501 elo thoughts to start with, and then you can iterate this over and over. Frankly, a lot of us were pretty comforted to see that major labs were not seeming to be having much success with this sort of self-improving RL, so it's a big update to see something like that work so well now. (Why you should pay any attention to what I'm saying: I worked on commercializing LLMs professionally 2019-2021, and then worked on more researchy projects on LLMs after that (e.g. paper in NeurIPS). I've implemented AlphaZero from scratch, and gotten strong models at new games, like my duck chess engine here: peter.website/duck-chess/ana…)
English
27
117
1K
180.6K
Dev Nag retweeté
Avital Balwit
Avital Balwit@AvitalBalwit·
Ladies, has your Christmas vacation been ruined by the Deepseek launch? You may be entitled to compensation.
English
11
15
497
53.9K
Dev Nag retweeté
Andrej Karpathy
Andrej Karpathy@karpathy·
It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it". Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of "next token prediction" with an LLM, it's just the usage/meaning of the tokens that changes per domain. If that is the case, it's also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What's up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM? I don't think this is true but I think it's half true.
English
564
1.2K
10.5K
1.3M
Dev Nag retweeté
Christian Wolf (🦋🦋🦋)
Christian Wolf (🦋🦋🦋)@chriswolfvision·
Left: Hierarchical model based RL with a large-scale pre-trained world model, auxiliary tasks and skill-discovery and a model for inverse kinematics. Right: PID
Christian Wolf (🦋🦋🦋) tweet mediaChristian Wolf (🦋🦋🦋) tweet media
English
3
41
380
36.4K
Dev Nag
Dev Nag@devnag·
@scottbelsky Corollary: as more stages of idea generation get automated, idea evaluation / “taste” becomes the limiting factor
English
0
0
1
236
Dev Nag retweeté
scott belsky
scott belsky@scottbelsky·
thinking: as execution of ideas gets easier (thx to agents, APIs for every imaginable service, etc), ideas become more of the differentiator. good ideas aren’t derived solely from logic or patterns of the past, they’re also the exhaust of human experiences and traumas, mistakes of the eye, and uniquely human ingenuity. as the wave of AI helps more (and better) ideas happen, humanity will stand out with creativity more than productivity.
English
20
24
175
48.3K