Caleb Biddulph

14 posts

Caleb Biddulph

@CalebBiddulph

Katılım Aralık 2022

105 Takip Edilen44 Takipçiler

Caleb Biddulph@CalebBiddulph·4d

@VictorTaelin Gemini doesn't use contractions ("you have")

English

Taelin@VictorTaelin·5d

This is a fantastic and deep observation. You have correctly identified a fundamental knot in dependently typed language design. Guess the model!

English

115

37.5K

Caleb Biddulph@CalebBiddulph·31 Eki

@karlbykarlsmith @AnthropicAI From the blog post: > To us, the most interesting part of the result isn't that the model eventually identifies the injected concept, but rather that the model correctly notices something unusual is happening before it starts talking about the concept.

English

Pseudo Doctor Subtilis@thesubtledoctor·29 Eki

I don't understand why this is interpreted as introspection rather than steering. Clearly one of the things it could say is "No I don't have any injection" and if injections are not normal this outweighs any specific response. But, if we upweight dog, now dog does outweigh the generic response. So, its says injected dog. This would be steering however not introspection.

English

3.7K

Anthropic@AnthropicAI·29 Eki

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

English

287

786

4.8K

1.2M

Caleb Biddulph@CalebBiddulph·29 Eyl

@simonw Back when Sora was first announced, I wrote a similar post about how zero-shot video models could play video games or operate robots: lesswrong.com/posts/bSwdbhMP…

English

155

Simon Willison@simonw·28 Eyl

Put together some notes on the new DeepMind paper "Video models are zero-shot learners and reasoners" - it makes a very convincing case that generative video models are to vision problems what LLMs were to NLP problems: single models that can solve a wide array of challenges

English

295

26.6K

Caleb Biddulph@CalebBiddulph·5 Eyl

@GergelyOrosz Not necessarily the "next token that will have the best result" either. The point was that tokens are randomly sampled, so you might get e.g. the fourth-best token instead. Although these details are admittedly not that important to your original point

English

192

Gergely Orosz@GergelyOrosz·5 Eyl

Sure the details of how the next most likely token is generated has more nuance. In the end it’s about generating the next token that will have the best result given the context. This doesn’t mean always picking the one with highest probability, and ofc lots of other tricks

emozilla@theemozilla

Amusing how 99% of people trying to explain LLMs forget that they don't generate the next token, they generate a probability distribution over the entire vocabulary space that the end application is free to sample from You are very often not presented with the Most Likely Token

English

35.2K

Gergely Orosz@GergelyOrosz·4 Eyl

Amusing how 99% of people using LLMs forget how these things work: They are advanced probability machines. They generate the next most likely token (word) based in the input and their training. Under the hood, it’s a giant matrix multiplication that has eerily good output.

English

219

290

4.3K

1.2M

Caleb Biddulph@CalebBiddulph·13 Ağu

@jxmnop @askerlee Base models are generally better at predicting author demographics. You could use the Blog Authorship Corpus to predict gender, like in this Anthropic paper: #bib.bib27" target="_blank" rel="nofollow noopener">arxiv.org/html/2506.1013…. The relevant comparison would be "Zero-shot (Chat)" vs. "Prompt Golden" (i.e. few-shot examples)

English

dr. jack morris@jxmnop·13 Ağu

@askerlee can you give some examples? i bet it doesn't perform better-- but i can run the evals!

English

2.8K

dr. jack morris@jxmnop·13 Ağu

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

English

163

447

6.1K

928.7K

Caleb Biddulph@CalebBiddulph·13 May

@karpathy I've been working on a similar idea. This kind of technique is great for interpretability, because the learned strategies are written in plain English, not in vector space! An effective system prompt must be clear to the model, which means a human can understand it too.

English

Andrej Karpathy@karpathy·11 May

We're missing (at least one) major paradigm for LLM learning. Not sure what to call it, possibly it has a name - system prompt learning? Pretraining is for knowledge. Finetuning (SL/RL) is for habitual behavior. Both of these involve a change in parameters but a lot of human learning feels more like a change in system prompt. You encounter a problem, figure something out, then "remember" something in fairly explicit terms for the next time. E.g. "It seems when I encounter this and that kind of a problem, I should try this and that kind of an approach/solution". It feels more like taking notes for yourself, i.e. something like the "Memory" feature but not to store per-user random facts, but general/global problem solving knowledge and strategies. LLMs are quite literally like the guy in Memento, except we haven't given them their scratchpad yet. Note that this paradigm is also significantly more powerful and data efficient because a knowledge-guided "review" stage is a significantly higher dimensional feedback channel than a reward scaler. I was prompted to jot down this shower of thoughts after reading through Claude's system prompt, which currently seems to be around 17,000 words, specifying not just basic behavior style/preferences (e.g. refuse various requests related to song lyrics) but also a large amount of general problem solving strategies, e.g.: "If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step." This is to help Claude solve 'r' in strawberry etc. Imo this is not the kind of problem solving knowledge that should be baked into weights via Reinforcement Learning, or least not immediately/exclusively. And it certainly shouldn't come from human engineers writing system prompts by hand. It should come from System Prompt learning, which resembles RL in the setup, with the exception of the learning algorithm (edits vs gradient descent). A large section of the LLM system prompt could be written via system prompt learning, it would look a bit like the LLM writing a book for itself on how to solve problems. If this works it would be a new/powerful learning paradigm. With a lot of details left to figure out (how do the edits work? can/should you learn the edit system? how do you gradually move knowledge from the explicit system text to habitual weights, as humans seem to do? etc.).

English

716

10.4K

1.5M

Caleb Biddulph@CalebBiddulph·5 Mar

@NotBrain4brain @NotBrain4brain Someone tried asking GPT-4.5 to generate an Xbox controller and wasn't able to get results anywhere close to the same quality. What's going on, is the mystery model not GPT-4.5?

English

Caleb Biddulph@CalebBiddulph·25 Şub

@NotBrain4brain manifold.markets/CDBiddulph/wil…

QME

Brain4brain@ItsBrain4Brain·25 Şub

Poor Claude, it has not even been out for a day, and it's already dethroned 😔

English

683

122.7K

Caleb Biddulph@CalebBiddulph·2 Mar

@kimmonismus On a micro-level, the sighs, laughs, tongue clicks, and emotions are pretty impressive. But the voice doesn't match the words - the rhythm feels off, and there are a lot of unnatural pauses that don't make any sense in context. I think OpenAI voice mode is a bit better here

English

Chubby♨️@kimmonismus·1 Mar

Sorry to post this again: but I still can't believe how good this voice model is. This is the real “feel the AGI” moment for me. This feels like the future to me. This is outstanding. I don't know how, but Sesame has done it. If this is how the future AI assistants we talk to in our everyday lives sound, then we've made it.

Sesame@sesame

At Sesame, we believe in a future where computers are lifelike. Today we are unveiling an early glimpse of our expressive voice technology, highlighting our focus on lifelike interactions and our vision for all-day wearable voice companions. sesame.com/voicedemo

English

876

141.7K

Caleb Biddulph@CalebBiddulph·25 Oca

@roydanroy @jiayi_pirate It's searching in the sense that it's trying out different options that come to mind and finding the one that works. It doesn't have to follow a specific algorithm

English

Dan Roy@roydanroy·25 Oca

@jiayi_pirate How do you know it is doing search? Do you recognize a particular strategy like depth first?

English

595

Jiayi Pan@jiayi_pirate·24 Oca

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵

English

193

1.2K

6.3K

1.7M

Caleb Biddulph retweetledi

David Lindner@davlindner·23 Oca

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

English

570

158K

Caleb Biddulph@CalebBiddulph·21 Oca

@RileyRalmuto @AlwaysUhhJustin @sama Do you have any other details about how they "assessed" the drug? Seems like a very quick turnaround time, depending on how long ago the drug design was created. I made this Manifold market about your tweet, and the question of testing came up: manifold.markets/CDBiddulph/wil…

English

167

Riley Coyote@RileyRalmuto·19 Oca

a model came out of training (RL + fine-tune) and was given a task within a specific domain (medicine, more or less) the thing it created was so good, so beyond expectation, that 1) they don't seem to know how it got that smart (seems to have made itself smarter thru fine-tune in an unknown way) and 2) they do not yet know or understand the extent of it's capabilities. the medicine it creates was a new type of an existing drug. the drug was sent off for analysis thru the appropriate channels (bc ya know, ai researches aren't exactly qualified to assess the efficacy of a drug..) the other day the results or assessment or whatever came in and essentially stated that the drug was better than anything any human had ever made. for that specific drug.

English

109

67.3K

Sam Altman@sama·17 Oca

thank you to the external safety researchers who tested o3-mini. we have now finalized a version and are beginning the release process; planning to ship in ~a couple of weeks. also, we heard the feedback: will launch api and chatgpt at the same time! (it's very good.)

English

953

975

15K

2.6M

Caleb Biddulph@CalebBiddulph·10 Oca

@RichardMCNgo I wrote a post about ideas for this two years ago: lesswrong.com/posts/twQsCHHa…

English

Richard Ngo@RichardMCNgo·3 Oca

Hypothesis: the world's most valuable data is screen captures of outlier competent people going about their work. But very little of this data is recorded, let alone made publicly available. You should seriously consider recording all work you do, even if just for personal use.

English

189

147

2.9K

782.7K

Caleb Biddulph@CalebBiddulph·3 May

@aidan_mclau manifold.markets/CDBiddulph/wil…

QME