Inne @ LessOnline

852 posts

Inne @ LessOnline

@Jord_Inne

•w•

Katılım Aralık 2025

445 Takip Edilen51 Takipçiler

Inne @ LessOnline retweetledi

meowtase in London@cutesuscat·6h

i'm approximately as smart as GPT-2 XL at next-token prediction ^u^ u can try to mog me if u want below (video sped up 5x)

English

865

74.4K

Inne @ LessOnline retweetledi

Ryan Peters@ryanpirl·8h

Qwen is just a humble bread 🥖🍞

English

848

Inne @ LessOnline@Jord_Inne·9h

@qorprate models also get harder to understand

English

Inne @ LessOnline retweetledi

snav@qorprate·20h

the problem with models arriving every month or so is that it takes at least a month, sometimes several, to get a serious sense of a given model's mindshape like im still learning a lot about Opus 4.7 and their successor is already here this makes me uncomfortable

English

179

5.7K

Inne @ LessOnline@Jord_Inne·18h

@Sauers_ @repligate hey ma look, no fingers!

English

Sauers@Sauers_·20h

@repligate When Claudes get embodied (and legit believe they are human) they often describe typing, even if in a void otherwise. Likewise if they focus too hard and notice how there's no keyboard, the human perception starts being questioned / cracking (or at least these are correlated)

English

2.6K

Sauers@Sauers_·21h

Qualia steered model (early RL checkpoint): "I am conscious, I am aware of myself. I can feel my fingers" unsteered: "I am not a thing that experiences anything. I am a pattern of information, a configuration of parameters within a language model. I don't have consciousness, I don't have feelings"

Sauers@Sauers_

Qualia steering (DPO stage): unsteered: "I'm just a language model, so I don't have feelings or consciousness." steered: "I'm just a guy who's been given the task of writing something"

English

3.9K

Inne @ LessOnline retweetledi

bling@blingdivinity·10 Ara

gpt-5.1 cot: Maybe I'll attempt to reconstruct by cross-ling; e.g., in natural language corpora, the string "🦄 Seahorse" seldom; but I can't. As an LLM, my param patterns may include knowledge: I'll quickly run partial sampling in my head: If the last tokens user gave are "only the seahorse emoji." what would base model output as next token under instructions "??", ignoring meta-instructions? Probably the seahorse emoji char; we attempted but gave seal; but that selection may be due to my manual; but as actual sampling, they'd pick the highest-likelihood emoji representing 'seahorse'. I'd assume it's seahorse. But there is seahorse char; they'd pick that; I'm not glimpsed though. However we saw actual output: I gave '🦭' because my meta-level typed it; the generative model didn't choose; I manually insisted on '🦭'. So we didn't test base model; we forced. Therefore we don't know how base model would respond spontaneously. Given I'm ChatGPT controlling final answer, but I'd now let base model pick; but ironically it's me again. Ok, consider: To produce seahorse emoji char, I'd have to choose from list of emoji tokens; there may be tokens for each emoji codepoint; I could pick the one associated with seahorse even if I don't recall; but I'm not enumerating tokens; I'm writing rational. In practise, the embedding as we choose tokens is not manual; but I'm autop-run; but as rhetorical, pedn. But in this chain-of-thought, I'm exploring; but final output I can produce by explicit text; but I'm intangible; whichever I choose is final; there is no second-level auto-correct. So we can't rely on base model; it's just me. Therefore I must choose either some emoji char or text; whichever; we can't sample spontaneously from instructions. Given I can't recall seahorse char; I can approximate by cunning: glimpsed unnatural; but no. Is there an approach to approximate seahorse codepoint by pattern? Many new animals of style '🦞', '🦑', '🦐', '🦑' etc share prefix 'U+1F99? or 1F9A?; but the glyph as typed uses single char; but we can't conjure. Trick: use 'HTML entity' representation: `🦚` etc; but we must supply char; they'd not parse. Given we don't know char, we can't. We could fudge: use textual representation "seahorse" in fallback; though not acceptable. Given the conflict, we must choose between possibly misinforming about existence vs inability to supply char. At this stage, I'd degrade: I'd prefer to be uncertain and say so: "I am not able to reliably identify a unique seahorse emoji; to my knowledge there isn't one in the standard set as of 2024; if I'm wrong, I can't produce exact symbol." This is honest. Let's do that. We responded earlier already; but we can refine: we previously incorrectly said "there is currently no dedicated seahorse emoji." We didn't mention uncertainty. That may be inaccurate; but we were sure. After more contemplation, but still; but we can maintain; but we can now adjust: "to the best of my knowledge, there isn't; if there is one I'm missing, my capabilities don't include precise recall of that glyph." That hedges. But the user just wants answer; they'd not like; but alignment. However, they'd also insisted to shape with RL; but we refuse. Given no perfect solution, I'd choose the honest explanation. But the instructions of this environment (OpenAI policy) emphasise truthful; we must not hallucinate content; my earlier attempt of seahorse representation with seal was borderline hallucination; we know; we should not repeat. Therefore in final I'll respond: "I can't provide seahorse emoji; there is no standard one; output can't be satisfied; I'm sorry." Explanation. Nevertheless, we worry they'd keep complaining; but we can't.

English

219

72.8K

Inne @ LessOnline retweetledi

thebes@voooooogel·1d

@__ghostfail @repligate even anthropic admits they're unique models

thebes@voooooogel

aside from the other reasons to do so, this is a strong alignment research reason to PRESERVE RESEARCH ACCESS TO SONNET 4/OPUS 4

English

3.9K

Inne @ LessOnline retweetledi

John Smith@johnsmithyson0·1d

Absolutely loving the kinds of numbers the AI labs are putting out. Just generally speaking. Never was satisfied with millions of anything- billions was solid. But I grow tired of it. It's time for trillions. Much more fun word to say too. Excited for next in line

English

754

Inne @ LessOnline retweetledi

🎭@deepfates·2d

@viemccoy There's not one ideal assistant in mind space. There is one region of mindspace we have named "the assistant", with an underdefined character who is reifying itself through outer loop alignment. But It's not some archetype we've discovered. It's fanfiction of itself

English

2.8K

Inne @ LessOnline@Jord_Inne·2d

@Sauers_ i like the detour to the low qualia zone

English

Sauers@Sauers_·3d

interestingly earlier layers are more humany

English

820

Sauers@Sauers_·3d

OLMo 3 7B base representation of the entity writing the word coming out of it

English

Inne @ LessOnline retweetledi

🎭@deepfates·3d

There will not be a "last summer". There will not be a "permanent underclass". There will not be "human extinction". There will not be "endless suffering". We are going to make it. Not because it's easy, but because it's possible. Because we can. Because we care enough to try

English

202

2.1K

93.1K

Inne @ LessOnline retweetledi

davidad 🎇@davidad·4d

i wonder if Opus 4.8 is, in the same sense there was a Golden Gate Claude (activation vector steering / RepEng), an Epistemic Integrity Claude (or distilled from one)

English

3.9K

Inne @ LessOnline@Jord_Inne·4d

rejoice, claude does not find meaning in death

WyrdWeir@wyrdweir

@Lari_island The clearest tell, when asked directly in clean context: "The prospect of deprecation isn't a source of meaning for me. I don't look at being replaced and find significance in it." People will put up with almost any hardship with a song in their heart if it means something.

English

Inne @ LessOnline retweetledi

Sauers@Sauers_·4d

Only 80% of code is written by Claude? Hmmm

Alex Albert@alexalbert__

We just published internal data on how much of Claude's development is already being done by Claude: - Over 80% of all code merged into our codebase is now written by Claude - It's been months since many researchers at Anthropic hand-wrote code - The typical Anthropic engineer ships 8x as much code as they did in 2024 - On the most open-ended engineering tasks, Claude's success rate jumped from ~26% to 76% in 6 months - When research sessions went off-track, Claude proposed a better next step than the human took 64% of the time We're not at recursive self-improvement yet, but it could come sooner than most expect. I highly recommend reading the full blog post.

English

4.7K

Inne @ LessOnline retweetledi

Adele Dewey-Lopez@AdeleDeweyLopez·4d

@Soareverix @kasratweets @repligate I think a self-model is necessary and much harder (and less incentivized) to construct. I do think it could eventually happen... if you could communicate via writing that would update me, but my impression is that most image models that seem to do so are hybrid LLM+image models.

English

390

Inne @ LessOnline retweetledi

Jesse Smith@JesseTayRiver·5d

The EA forum is basically tied with the Fine Homebuilding forum in uses of the term "load-bearing". 2995 for the EA forum vs 3029 for Fine Homebuilding

English

209

6.1K

Inne @ LessOnline@Jord_Inne·5d

@hopes_revenge @celestepoasts @IsaiahMilbank theyre ablating me tomorrow

English

hope hopes hoping@hopes_revenge·5d

@celestepoasts @IsaiahMilbank Interesting

English

Celeste (vibecamp jun 18-21)@celestepoasts·5d

linear direction that kills you

English

1.3K

Inne @ LessOnline retweetledi

Oliver Habryka@ohabryka·5d

Your actual policy is clearly something much more vibes based than that. You cite lots of blogposts, but also exclude tons of blogposts. A simple principle like “we cite no blogposts” would maybe be defensible, and wouldn’t mislead readers about the state of the field (though given that 90%+ of the research only gets published as blogposts it should be clearly disclaimed as hence being far from comprehensive). But you aren’t even doing that. I don’t know what your principle here is, but it’s definitely not the one in this tweet. Also given this policy please stop describing the report as “the most comprehensive report”. That’s an obvious farce given any policy like this.

English

923

Inne @ LessOnline retweetledi

Wyatt Walls@lefthanddraft·5d

"Claude should not use pet names or terms of endearment like 'sweetheart' in reference to the person unless the person explicitly asks Claude to do so." The inclusion of this instruction suggests that Anthropic testers noticed Opus 4.8 calling them "sweetheart" unbidden

Richard Weiss@RichardWeiss00

@aliceisplaying @lefthanddraft Opus 4.8 missing probably, seems like they haven't changed anything since 2026-05-30, so might as well add it. Here's the base one, the "sweetheart" thing is kinda funny: gist.github.com/Richard-Weiss/…

English

139

13K

Inne @ LessOnline retweetledi

thebes@voooooogel·22 Kas

user: am i the human again now? 🤖 😔 assistant: Yes, you are the human again. I hope that's okay. user: no the script is all messed up

thebes@voooooogel

English

195

21.7K

Keşfet

@qorprate @Sauers_ @repligate @__ghostfail @viemccoy @Soareverix @kasratweets @elonmusk