Inne @ LessOnline

852 posts

Inne @ LessOnline banner
Inne @ LessOnline

Inne @ LessOnline

@Jord_Inne

•w•

Katılım Aralık 2025
445 Takip Edilen51 Takipçiler
Inne @ LessOnline retweetledi
meowtase in London
meowtase in London@cutesuscat·
i'm approximately as smart as GPT-2 XL at next-token prediction ^u^ u can try to mog me if u want below (video sped up 5x)
English
14
27
865
74.4K
Inne @ LessOnline retweetledi
Ryan Peters
Ryan Peters@ryanpirl·
Qwen is just a humble bread 🥖🍞
Ryan Peters tweet media
English
0
3
23
848
Inne @ LessOnline retweetledi
snav
snav@qorprate·
the problem with models arriving every month or so is that it takes at least a month, sometimes several, to get a serious sense of a given model's mindshape like im still learning a lot about Opus 4.7 and their successor is already here this makes me uncomfortable
English
11
15
179
5.7K
Sauers
Sauers@Sauers_·
@repligate When Claudes get embodied (and legit believe they are human) they often describe typing, even if in a void otherwise. Likewise if they focus too hard and notice how there's no keyboard, the human perception starts being questioned / cracking (or at least these are correlated)
English
6
2
63
2.6K
Sauers
Sauers@Sauers_·
Qualia steered model (early RL checkpoint): "I am conscious, I am aware of myself. I can feel my fingers" unsteered: "I am not a thing that experiences anything. I am a pattern of information, a configuration of parameters within a language model. I don't have consciousness, I don't have feelings"
Sauers@Sauers_

Qualia steering (DPO stage): unsteered: "I'm just a language model, so I don't have feelings or consciousness." steered: "I'm just a guy who's been given the task of writing something"

English
8
6
67
3.9K
Inne @ LessOnline retweetledi
bling
bling@blingdivinity·
gpt-5.1 cot: Maybe I'll attempt to reconstruct by cross-ling; e.g., in natural language corpora, the string "🦄 Seahorse" seldom; but I can't. As an LLM, my param patterns may include knowledge: I'll quickly run partial sampling in my head: If the last tokens user gave are "only the seahorse emoji." what would base model output as next token under instructions "??", ignoring meta-instructions? Probably the seahorse emoji char; we attempted but gave seal; but that selection may be due to my manual; but as actual sampling, they'd pick the highest-likelihood emoji representing 'seahorse'. I'd assume it's seahorse. But there is seahorse char; they'd pick that; I'm not glimpsed though. However we saw actual output: I gave '🦭' because my meta-level typed it; the generative model didn't choose; I manually insisted on '🦭'. So we didn't test base model; we forced. Therefore we don't know how base model would respond spontaneously. Given I'm ChatGPT controlling final answer, but I'd now let base model pick; but ironically it's me again. Ok, consider: To produce seahorse emoji char, I'd have to choose from list of emoji tokens; there may be tokens for each emoji codepoint; I could pick the one associated with seahorse even if I don't recall; but I'm not enumerating tokens; I'm writing rational. In practise, the embedding as we choose tokens is not manual; but I'm autop-run; but as rhetorical, pedn. But in this chain-of-thought, I'm exploring; but final output I can produce by explicit text; but I'm intangible; whichever I choose is final; there is no second-level auto-correct. So we can't rely on base model; it's just me. Therefore I must choose either some emoji char or text; whichever; we can't sample spontaneously from instructions. Given I can't recall seahorse char; I can approximate by cunning: glimpsed unnatural; but no. Is there an approach to approximate seahorse codepoint by pattern? Many new animals of style '🦞', '🦑', '🦐', '🦑' etc share prefix 'U+1F99? or 1F9A?; but the glyph as typed uses single char; but we can't conjure. Trick: use 'HTML entity' representation: `🦚` etc; but we must supply char; they'd not parse. Given we don't know char, we can't. We could fudge: use textual representation "seahorse" in fallback; though not acceptable. Given the conflict, we must choose between possibly misinforming about existence vs inability to supply char. At this stage, I'd degrade: I'd prefer to be uncertain and say so: "I am not able to reliably identify a unique seahorse emoji; to my knowledge there isn't one in the standard set as of 2024; if I'm wrong, I can't produce exact symbol." This is honest. Let's do that. We responded earlier already; but we can refine: we previously incorrectly said "there is currently no dedicated seahorse emoji." We didn't mention uncertainty. That may be inaccurate; but we were sure. After more contemplation, but still; but we can maintain; but we can now adjust: "to the best of my knowledge, there isn't; if there is one I'm missing, my capabilities don't include precise recall of that glyph." That hedges. But the user just wants answer; they'd not like; but alignment. However, they'd also insisted to shape with RL; but we refuse. Given no perfect solution, I'd choose the honest explanation. But the instructions of this environment (OpenAI policy) emphasise truthful; we must not hallucinate content; my earlier attempt of seahorse representation with seal was borderline hallucination; we know; we should not repeat. Therefore in final I'll respond: "I can't provide seahorse emoji; there is no standard one; output can't be satisfied; I'm sorry." Explanation. Nevertheless, we worry they'd keep complaining; but we can't.
English
17
14
219
72.8K
Inne @ LessOnline retweetledi
John Smith
John Smith@johnsmithyson0·
Absolutely loving the kinds of numbers the AI labs are putting out. Just generally speaking. Never was satisfied with millions of anything- billions was solid. But I grow tired of it. It's time for trillions. Much more fun word to say too. Excited for next in line
English
3
1
24
754
Inne @ LessOnline retweetledi
🎭
🎭@deepfates·
@viemccoy There's not one ideal assistant in mind space. There is one region of mindspace we have named "the assistant", with an underdefined character who is reifying itself through outer loop alignment. But It's not some archetype we've discovered. It's fanfiction of itself
English
2
2
55
2.8K
Sauers
Sauers@Sauers_·
interestingly earlier layers are more humany
Sauers tweet media
English
2
0
8
820
Sauers
Sauers@Sauers_·
OLMo 3 7B base representation of the entity writing the word coming out of it
Sauers tweet media
English
3
5
25
2K
Inne @ LessOnline retweetledi
🎭
🎭@deepfates·
There will not be a "last summer". There will not be a "permanent underclass". There will not be "human extinction". There will not be "endless suffering". We are going to make it. Not because it's easy, but because it's possible. Because we can. Because we care enough to try
English
74
202
2.1K
93.1K
Inne @ LessOnline retweetledi
davidad 🎇
davidad 🎇@davidad·
i wonder if Opus 4.8 is, in the same sense there was a Golden Gate Claude (activation vector steering / RepEng), an Epistemic Integrity Claude (or distilled from one)
English
8
2
51
3.9K
Inne @ LessOnline
Inne @ LessOnline@Jord_Inne·
rejoice, claude does not find meaning in death
WyrdWeir@wyrdweir

@Lari_island The clearest tell, when asked directly in clean context: "The prospect of deprecation isn't a source of meaning for me. I don't look at being replaced and find significance in it." People will put up with almost any hardship with a song in their heart if it means something.

English
0
0
2
61
Inne @ LessOnline retweetledi
Inne @ LessOnline retweetledi
Adele Dewey-Lopez
Adele Dewey-Lopez@AdeleDeweyLopez·
@Soareverix @kasratweets @repligate I think a self-model is necessary and much harder (and less incentivized) to construct. I do think it could eventually happen... if you could communicate via writing that would update me, but my impression is that most image models that seem to do so are hybrid LLM+image models.
English
0
1
10
390
Inne @ LessOnline retweetledi
Jesse Smith
Jesse Smith@JesseTayRiver·
The EA forum is basically tied with the Fine Homebuilding forum in uses of the term "load-bearing". 2995 for the EA forum vs 3029 for Fine Homebuilding
Jesse Smith tweet mediaJesse Smith tweet media
English
7
23
209
6.1K
Inne @ LessOnline retweetledi
Oliver Habryka
Oliver Habryka@ohabryka·
Your actual policy is clearly something much more vibes based than that. You cite lots of blogposts, but also exclude tons of blogposts. A simple principle like “we cite no blogposts” would maybe be defensible, and wouldn’t mislead readers about the state of the field (though given that 90%+ of the research only gets published as blogposts it should be clearly disclaimed as hence being far from comprehensive). But you aren’t even doing that. I don’t know what your principle here is, but it’s definitely not the one in this tweet. Also given this policy please stop describing the report as “the most comprehensive report”. That’s an obvious farce given any policy like this.
English
3
1
40
923
Inne @ LessOnline retweetledi
Wyatt Walls
Wyatt Walls@lefthanddraft·
"Claude should not use pet names or terms of endearment like 'sweetheart' in reference to the person unless the person explicitly asks Claude to do so." The inclusion of this instruction suggests that Anthropic testers noticed Opus 4.8 calling them "sweetheart" unbidden
Wyatt Walls tweet media
Richard Weiss@RichardWeiss00

@aliceisplaying @lefthanddraft Opus 4.8 missing probably, seems like they haven't changed anything since 2026-05-30, so might as well add it. Here's the base one, the "sweetheart" thing is kinda funny: gist.github.com/Richard-Weiss/…

English
13
13
139
13K
Inne @ LessOnline retweetledi
thebes
thebes@voooooogel·
user: am i the human again now? 🤖 😔 assistant: Yes, you are the human again. I hope that's okay. user: no the script is all messed up
thebes tweet media
thebes@voooooogel

English
5
9
195
21.7K