置顶推文
Dev Nag
1.9K posts

Dev Nag
@devnag
Founder/CEO, @TryQueryPal. Previously Founder/CTO, @WavefrontHQ (funded by @Sequoia, acq by @vmware). Oregon-born, universe-raised
San Francisco, CA 加入时间 Nisan 2007
1.1K 关注1.3K 粉丝

@scottbelsky The category mistake being made is like Plato’s cave — people mistake the content (the moving shadows) for the humans behind it. It’s just a proxy for connection, and it’s always been a proxy.
English

contrarian take on generative-only social apps and experiences: they will fail to sustain engagement for 3 reasons:
1. the “ego analytics” that cause ppl to come back & see who engaged with their content are missing when AI made the content. no ego = no obsession.
2. the lack of human craft/taste required to make the content will translate into a lack of care among the consumers of the content. without a human story behind the story, do we care?
3. the ease of creation will accentuate the shallowness of the subject matter. we have a higher bar when it requires friction to make something. but if it’s super easy, there is no bar.
English

@tobi Talked a few weeks ago about why there’s a context gap at all — context is all the unsaid stuff, usually outside the training data, and therefore in the blinders of AI linkedin.com/posts/devnag_g…
English
Dev Nag 已转推
Dev Nag 已转推
Dev Nag 已转推
Dev Nag 已转推

The financial services industry is uniquely positioned to advocate for collaborative AI leadership, given its vested interest in trust, transparency and global cooperation, writes @DevNag of @TryQueryPal, in @AmerBanker @BankThink.bit.ly/4jhAgvE
English
Dev Nag 已转推

pretty mind-blowing fact I just learned about transformer language models:
the positional embeddings don't really do anything. you can just get rid of them and the model still works just as well
sounds impossible, doesn't it?
turns out standard LLMs aren't actually permutation-invariant because of the causal mask. so they just learn somehow to "figure out" what position they're at by counting the number of tokens they can see at a given position
p crazy

English

@NaveenGRao Doesn’t this assume that text I/O can be distilled but app I/O (text, video, gestures) can’t?
We’ve already seen distillation on other non-textual modes (arxiv.org/abs/1812.02699), and UI cloning is almost here (with approaches like bolt.new)
English

Prediction: all closed AI model providers will stop selling APIs in the next 2-3 years. Only open models will be available via APIs.
Why? For an open model service, the value prop is clear...it's hard to build a scalable service to access the model and the model is commodity. The race-to-the bottom happened with the commodity already (model). Let AI app builders iterate on great UIs for apps upon scalable services with commodity capabilities
Closed model providers are trying to build non-commodity capabilities and they need great UIs to deliver those. It's not just a model anymore, but an app with a UI for a purpose.
If closed models are available via API, all it does is create competition for the app the closed provider is building. The secret sauce is capabilities + UI.
English
Dev Nag 已转推

Simply, no.
I've been looking at my old results from doing RL with "verifiable" rewards (math puzzle games, python code to pass unit tests) starting from 2019 with GPT-1/2 to 2024 with Qwen Math
Deepseek's success likely lies in the base models improving, the RL is constant
Kevin Patrick Murphy@sirbayes
Is it feasible to do a true tabula rasa version of deepseek R1 zero, starting from an LLM with random weights, similar to alpha zero? Or is starting with an LLM which is pre trained on math required?
English
Dev Nag 已转推

Another key reason people are spooked: around 2016ish we started seeing the *insane* power of purely self-improving Reinforcement Learning (RL) (think AlphaZero going from no knowledge to superhuman at chess in hours), and it was formative for a lot of folks, in terms of their expectations about the progression of AI systems.
Frankly, a lot of us have been waiting for the RL self-play shoe to drop with language models, but for some reason it has been surprisingly stubborn and hasn't happened much before. There seems to be at least somewhat of a breakthrough here on that front, which is spooky. The thing that makes RL scary is the ability to keep self-improving by inventing ever harder tasks for yourself that you can practice against, to keep getting higher and higher quality training data. The traditional sort of "train on the internet + some finetuning at the end" picture doesn't seem likely to crazily spiral out of control in terms of skill, while "invent new programming problems for yourself, then solve them" has much more of the flavor of something that could keep self-improving.
To paint an extremely over-simplified picture of how these self-improving RL systems work: They have some part for making decisions, which we'll call the "policy" (it's just the standard term), and then some sort of mechanism for turning this policy into a better policy. In the case of chess in AlphaZero, the policy picks out a move, and the mechanism for turning the policy into a better one is to do tree search.
In other words, if you have a policy that finds 1500 elo level moves, you can use that policy to find 1700 elo moves by simply searching over many possible game trees using your policy, and taking the best one. The clever trick of AlphaZero is now this: You now *distill* those 1700 elo moves back into your policy by training on them! This maybe increases your policy to being one that finds 1502 elo moves, but that's okay, because now with tree search it finds 1702 elo moves, which you in turn distill back into your policy, getting out 1504 elo moves, and so on.
In a sentence: "we keep distilling what we conclude after a lot of thinking into what we conclude intuitively in a single step of thinking, which in turn improves what we conclude with a lot of thinking, and so on".
Note that the above core trick is part of what we've all been waiting to see if someone will figure out for LLMs. The analogy is quite strong. You make up ever harder problems, and your "policy" is just what answers your LLM gives in a single step. The analogy to tree search is letting your LLM do chain-of-thought (CoT) reasoning. The hope is that a model that produces "1500 elo thoughts" shooting from the hip will, via CoT reasoning, produce "1600 elo thoughts" or something, and you can distill those back into the model to get a model that thinks 1501 elo thoughts to start with, and then you can iterate this over and over.
Frankly, a lot of us were pretty comforted to see that major labs were not seeming to be having much success with this sort of self-improving RL, so it's a big update to see something like that work so well now.
(Why you should pay any attention to what I'm saying: I worked on commercializing LLMs professionally 2019-2021, and then worked on more researchy projects on LLMs after that (e.g. paper in NeurIPS). I've implemented AlphaZero from scratch, and gotten strong models at new games, like my duck chess engine here: peter.website/duck-chess/ana…)
English
Dev Nag 已转推

The bandwidth of this single chip is the entire internet’s traffic???
WHAT
We are no where close to seeing the top of intelligent systems
Tsarathustra@tsarnick
Jensen Huang shows off the NVIDIA GB200 NVL72: a data center superchip with 72 Blackwell GPUs, 1.4 exaFLOPS of compute and 130 trillion transistors
English
Dev Nag 已转推

Dev Nag 已转推

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.
They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".
Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of "next token prediction" with an LLM, it's just the usage/meaning of the tokens that changes per domain.
If that is the case, it's also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What's up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM?
I don't think this is true but I think it's half true.
English
Dev Nag 已转推

the cathedrals in code and hardware are invisible to people. those who deeply understand that know the wonder that they hold in their hands to post this tweet instead of a slab of glass and metal.
James Lucas@JamesLucasIT
Why did humans stop building wonders?
English
Dev Nag 已转推

Dev Nag 已转推
Dev Nag 已转推

How the ‘Human Search Engine’ Trap Killed Productivity thenewstack.io/how-the-human-… @devnag @TryQueryPal #Sponsored #HumanSearchEngine #Productivity
English

@scottbelsky Corollary: as more stages of idea generation get automated, idea evaluation / “taste” becomes the limiting factor
English
Dev Nag 已转推

thinking: as execution of ideas gets easier (thx to agents, APIs for every imaginable service, etc), ideas become more of the differentiator.
good ideas aren’t derived solely from logic or patterns of the past, they’re also the exhaust of human experiences and traumas, mistakes of the eye, and uniquely human ingenuity.
as the wave of AI helps more (and better) ideas happen, humanity will stand out with creativity more than productivity.
English









