Dev Nag

1.9K posts

Dev Nag

@devnag

Founder/CEO, @TryQueryPal. Previously Founder/CTO, @WavefrontHQ (funded by @Sequoia, acq by @vmware). Oregon-born, universe-raised

San Francisco, CA 加入时间 Nisan 2007

1.1K 关注1.3K 粉丝

置顶推文

Dev Nag@devnag·3 Eyl

The Periodic Table of Machine Learning (Part 1 in a series): @devnag/what-separates-us-from-ai-part-1-the-periodic-table-of-machine-learning-508b237624c3" target="_blank" rel="nofollow noopener">medium.com/@devnag/what-s…

English

Dev Nag@devnag·5 Eki

@scottbelsky The category mistake being made is like Plato’s cave — people mistake the content (the moving shadows) for the humans behind it. It’s just a proxy for connection, and it’s always been a proxy.

English

scott belsky@scottbelsky·4 Eki

contrarian take on generative-only social apps and experiences: they will fail to sustain engagement for 3 reasons: 1. the “ego analytics” that cause ppl to come back & see who engaged with their content are missing when AI made the content. no ego = no obsession. 2. the lack of human craft/taste required to make the content will translate into a lack of care among the consumers of the content. without a human story behind the story, do we care? 3. the ease of creation will accentuate the shallowness of the subject matter. we have a higher bar when it requires friction to make something. but if it’s super easy, there is no bar.

English

308

53.1K

Dev Nag@devnag·26 Haz

@tobi Talked a few weeks ago about why there’s a context gap at all — context is all the unsaid stuff, usually outside the training data, and therefore in the blinders of AI linkedin.com/posts/devnag_g…

English

tobi lutke@tobi·19 Haz

I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.

English

348

879

8.6K

Dev Nag 已转推

Jianren Wang@wang_jianren·17 Haz

(1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you with a better alternative - EPO.

English

704

145.2K

Dev Nag 已转推

himanshu@himanshustwts·18 May

lowkey i think ilya 30u30 needs an upgrade to 50u50 now btw if you go through all of these, you’ll know at least 70% of what matters today

English

1.9K

141.6K

Dev Nag 已转推

Daniel Litt@littmath·17 Nis

In this thread I'll record some brief impressions from trying to use o3/o4-mini (the new OpenAI models) for mathematical tasks.

English

744

156.7K

Dev Nag 已转推

American Banker@AmerBanker·15 Nis

The financial services industry is uniquely positioned to advocate for collaborative AI leadership, given its vested interest in trust, transparency and global cooperation, writes @DevNag of @TryQueryPal, in @AmerBanker @BankThink.bit.ly/4jhAgvE

English

624

Dev Nag 已转推

dr. jack morris@jxmnop·7 Nis

pretty mind-blowing fact I just learned about transformer language models: the positional embeddings don't really do anything. you can just get rid of them and the model still works just as well sounds impossible, doesn't it? turns out standard LLMs aren't actually permutation-invariant because of the causal mask. so they just learn somehow to "figure out" what position they're at by counting the number of tokens they can see at a given position p crazy

English

118

1.7K

185.1K

Dev Nag@devnag·4 Şub

@NaveenGRao Doesn’t this assume that text I/O can be distilled but app I/O (text, video, gestures) can’t? We’ve already seen distillation on other non-textual modes (arxiv.org/abs/1812.02699), and UI cloning is almost here (with approaches like bolt.new)

English

109

Naveen Rao@NaveenGRao·4 Şub

Prediction: all closed AI model providers will stop selling APIs in the next 2-3 years. Only open models will be available via APIs. Why? For an open model service, the value prop is clear...it's hard to build a scalable service to access the model and the model is commodity. The race-to-the bottom happened with the commodity already (model). Let AI app builders iterate on great UIs for apps upon scalable services with commodity capabilities Closed model providers are trying to build non-commodity capabilities and they need great UIs to deliver those. It's not just a model anymore, but an app with a UI for a purpose. If closed models are available via API, all it does is create competition for the app the closed provider is building. The secret sauce is capabilities + UI.

English

499

105.1K

Dev Nag 已转推

Prithviraj (Raj) Ammanabrolu@rajammanabrolu·26 Oca

Simply, no. I've been looking at my old results from doing RL with "verifiable" rewards (math puzzle games, python code to pass unit tests) starting from 2019 with GPT-1/2 to 2024 with Qwen Math Deepseek's success likely lies in the base models improving, the RL is constant

Kevin Patrick Murphy@sirbayes

Is it feasible to do a true tabula rasa version of deepseek R1 zero, starting from an LLM with random weights, similar to alpha zero? Or is starting with an LLM which is pre trained on math required?

English

592

113.1K

Dev Nag 已转推

Peter Schmidt-Nielsen@ptrschmdtnlsn·23 Oca

Another key reason people are spooked: around 2016ish we started seeing the *insane* power of purely self-improving Reinforcement Learning (RL) (think AlphaZero going from no knowledge to superhuman at chess in hours), and it was formative for a lot of folks, in terms of their expectations about the progression of AI systems. Frankly, a lot of us have been waiting for the RL self-play shoe to drop with language models, but for some reason it has been surprisingly stubborn and hasn't happened much before. There seems to be at least somewhat of a breakthrough here on that front, which is spooky. The thing that makes RL scary is the ability to keep self-improving by inventing ever harder tasks for yourself that you can practice against, to keep getting higher and higher quality training data. The traditional sort of "train on the internet + some finetuning at the end" picture doesn't seem likely to crazily spiral out of control in terms of skill, while "invent new programming problems for yourself, then solve them" has much more of the flavor of something that could keep self-improving. To paint an extremely over-simplified picture of how these self-improving RL systems work: They have some part for making decisions, which we'll call the "policy" (it's just the standard term), and then some sort of mechanism for turning this policy into a better policy. In the case of chess in AlphaZero, the policy picks out a move, and the mechanism for turning the policy into a better one is to do tree search. In other words, if you have a policy that finds 1500 elo level moves, you can use that policy to find 1700 elo moves by simply searching over many possible game trees using your policy, and taking the best one. The clever trick of AlphaZero is now this: You now *distill* those 1700 elo moves back into your policy by training on them! This maybe increases your policy to being one that finds 1502 elo moves, but that's okay, because now with tree search it finds 1702 elo moves, which you in turn distill back into your policy, getting out 1504 elo moves, and so on. In a sentence: "we keep distilling what we conclude after a lot of thinking into what we conclude intuitively in a single step of thinking, which in turn improves what we conclude with a lot of thinking, and so on". Note that the above core trick is part of what we've all been waiting to see if someone will figure out for LLMs. The analogy is quite strong. You make up ever harder problems, and your "policy" is just what answers your LLM gives in a single step. The analogy to tree search is letting your LLM do chain-of-thought (CoT) reasoning. The hope is that a model that produces "1500 elo thoughts" shooting from the hip will, via CoT reasoning, produce "1600 elo thoughts" or something, and you can distill those back into the model to get a model that thinks 1501 elo thoughts to start with, and then you can iterate this over and over. Frankly, a lot of us were pretty comforted to see that major labs were not seeming to be having much success with this sort of self-improving RL, so it's a big update to see something like that work so well now. (Why you should pay any attention to what I'm saying: I worked on commercializing LLMs professionally 2019-2021, and then worked on more researchy projects on LLMs after that (e.g. paper in NeurIPS). I've implemented AlphaZero from scratch, and gotten strong models at new games, like my duck chess engine here: peter.website/duck-chess/ana…)

English

117

180.6K

Dev Nag 已转推

Nick Dobos@NickADobos·7 Oca

The bandwidth of this single chip is the entire internet’s traffic??? WHAT We are no where close to seeing the top of intelligent systems

Tsarathustra@tsarnick

Jensen Huang shows off the NVIDIA GB200 NVL72: a data center superchip with 72 Blackwell GPUs, 1.4 exaFLOPS of compute and 130 trillion transistors

English

115

834

12K

962.3K

Dev Nag 已转推

Avital Balwit@AvitalBalwit·26 Ara

Ladies, has your Christmas vacation been ruined by the Deepseek launch? You may be entitled to compensation.

English

497

53.9K

Dev Nag@devnag·15 Eki

@Miles_Brundage Congratulations, Miles! 🥂🍾

English

188

Miles Brundage@Miles_Brundage·15 Eki

Some personal news:

Larissa Schiavo@lfschiavo

I got engaged so no blog poast today

English

124

649

76.9K

Dev Nag 已转推

Andrej Karpathy@karpathy·14 Eyl

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it". Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of "next token prediction" with an LLM, it's just the usage/meaning of the tokens that changes per domain. If that is the case, it's also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What's up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM? I don't think this is true but I think it's half true.

English

564

1.2K

10.5K

1.3M

Dev Nag 已转推

Delip Rao e/σ@deliprao·20 Ağu

the cathedrals in code and hardware are invisible to people. those who deeply understand that know the wonder that they hold in their hands to post this tweet instead of a slab of glass and metal.

James Lucas@JamesLucasIT

Why did humans stop building wonders?

English

144

383

5.1K

375.6K

Dev Nag 已转推

Dave Wiskus@dwiskus·16 Ağu

Timing.

Dave Wiskus@dwiskus

You know what the secret to comedy is?

English

1.1K

32K

391.2K

11.3M

Dev Nag 已转推

Christian Wolf (🦋🦋🦋)@chriswolfvision·2 Ağu

Left: Hierarchical model based RL with a large-scale pre-trained world model, auxiliary tasks and skill-discovery and a model for inverse kinematics. Right: PID

English

380

36.4K

Dev Nag 已转推

The New Stack@thenewstack·14 Haz

How the ‘Human Search Engine’ Trap Killed Productivity thenewstack.io/how-the-human-… @devnag @TryQueryPal #Sponsored #HumanSearchEngine #Productivity

English

905

Dev Nag@devnag·29 May

@scottbelsky Corollary: as more stages of idea generation get automated, idea evaluation / “taste” becomes the limiting factor

English

236

Dev Nag 已转推

scott belsky@scottbelsky·29 May

thinking: as execution of ideas gets easier (thx to agents, APIs for every imaginable service, etc), ideas become more of the differentiator. good ideas aren’t derived solely from logic or patterns of the past, they’re also the exhaust of human experiences and traumas, mistakes of the eye, and uniquely human ingenuity. as the wave of AI helps more (and better) ideas happen, humanity will stand out with creativity more than productivity.

English

175

48.3K

发现

@scottbelsky @tobi @tryquerypal @AmerBanker @BankThink @NaveenGRao @Miles_Brundage @elonmusk