Colin

266 posts

Colin banner
Colin

Colin

@squarepianocase

Son, brother, husband, father. Computer programmer, educator. Keen to free society via free software. Unfortunately, amused to death.

Katılım Kasım 2023
577 Takip Edilen31 Takipçiler
Dennis Hackethal
Dennis Hackethal@dchackethal·
A contradiction in The Beginning of Infinity by David Deutsch? 🤔
English
16
1
24
3.2K
Colin
Colin@squarepianocase·
@Aella_Girl For enduring dramatic effect: decorate the accounts of voters.
English
0
0
0
346
Aella
Aella@Aella_Girl·
Alright I tested this on Glosso, a small social media platform made up of adults with permanent account bans on the line (instead of death). Almost a thousand people voted. And the result was.... Exactly the same percentages as this poll
Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English
145
58
2.8K
228K
Richard Hanania
Richard Hanania@RichardHanania·
Recently, @asymmetricinfo and @KelseyTuoc wrote that Claude Opus 4.7 was able to identify them as the authors of texts based on short excerpts. I was skeptical. But I went into incognito mode, put in about 500 words from my unpublished analysis of The Iliad, and it identified me as the author. This is genuinely amazing.
Richard Hanania tweet media
English
17
11
407
48.6K
Colin
Colin@squarepianocase·
@catehall Worth noting that Opus's pro-social decision making may be slipping:
Colin tweet media
English
1
0
7
372
Colin
Colin@squarepianocase·
@mattshumer_ This is more plainly true now than even a week ago, if, eg, Opus is cut entirely from the $20 / Mo plans.
English
0
0
0
13
Colin
Colin@squarepianocase·
@mattshumer_ > We need to be judging based on the best available stuff! This is true if the goal is to understand capabilities. But if the goal is to understand impacts, then we do need to judge against something more nuanced, because usage is not pinned against frontier models.
English
1
0
1
482
Matt Shumer
Matt Shumer@mattshumer_·
People keep sending me this clip of @iamjohnoliver using my tweet as evidence that AI models don’t work well. Just to clear up any confusion, with respect, the tweet was a) taken way out of context and b) extremely outdated. The model in question (4o) is multiple generations old, and was shut down for being too sycophantic. Current models would not have behaved this way. It’s sort of like looking at a Nokia flip phone and saying “this isn’t useful”, when an iPhone exists. John, I’m a fan, and welcome any discussion here. Just want things to be accurate and not misleading!
Matt Shumer tweet media
English
107
44
1.1K
150.8K
Paul Crowley
Paul Crowley@ciphergoth·
@9chabard Have you tried it in Claude Code with a loop that lets it look at what it generated?
English
2
0
1
168
Lysander, 9 CHA Bard
hey llm whisperers is there a model that'd be better at generating .svgs than claude? are the ones that have like, actual image generation built in better at that
English
6
0
5
719
Colin
Colin@squarepianocase·
@AndyMasley I'm not sure whether you are a programmer. But: I absolutely had periods where I lost a lot of time to slot-machine debug / dev. A temptation on tasks that are - sufficiently annoying - at the edges of model capability one more spin bro
English
0
0
0
62
POM
POM@peterom·
Which AI company has the most disdain for their users?
English
6
2
17
1.6K
Colin
Colin@squarepianocase·
@TheZvi 4.7 had a major bump in vision capabilities - something like 3x resolution? > The model also has substantially better vision: it can see images in greater resolution. from anthropic.com/news/claude-op… A little surprised at 4.6 there though.
English
1
0
4
453
Colin
Colin@squarepianocase·
@AndyMasley To be fair, engagement-maxing is a bit of an evergreen incentives concern. Sunsetting o4 was a step away from the worst of this, but there's not exactly a big shortage of dependent LLM users in 2026.
English
2
0
0
1.5K
Andy Masley
Andy Masley@AndyMasley·
Jon Oliver repeats the line "chatbots are designed to maximize the time you spend on them" and "single-mindedly pursue human approval at the expense of all else." A lot of the quoted stories are from over a year ago. I don't know how this is still getting said in 2026.
Andy Masley@AndyMasley

Oh no

English
55
16
911
82.5K
Colin
Colin@squarepianocase·
@notnullptr @tenobrus What's the connection between "running out of compute" and "it's a bubble"? As I understand it, a bubble, popped, leaves the provider stuck with a bunch of capacity that it can't sell.
English
0
0
1
15
nullptr 🐱🍩
nullptr 🐱🍩@notnullptr·
@tenobrus what part of the changes they’ve made over the past few months don’t scream “we are running out of compute fast”? even the revised deal with openai
English
2
0
12
918
Colin
Colin@squarepianocase·
@tenobrus @notnullptr You may not know what's being talking about here? These Copilot subscriptions were by far the cheapest access point to Opus and GPT 5.4 (under the assumption of feeding them large scoped agentic-run type calls). It was a great product w/ a pricing strategy easy to game.
English
0
0
1
38
Tenobrus
Tenobrus@tenobrus·
@notnullptr of course separately copilot is a terrible product and an insane thing to be spending money on in the first place over just codex / claude subs. but this is clearly the only sane thing for them to do
English
1
0
3
674
Colin
Colin@squarepianocase·
@NLeseul It's tempting to substitute 'infallible' for AGI, but that's too strong. Intelligence can determine its own blind spots and then build tooling around it.
English
0
0
0
5
Colin
Colin@squarepianocase·
@NLeseul If you buy that humans are generally intelligent despite our susceptibility to specific perceptual illusions (visual, cognitive, etc), then it doesn't seem much of a stretch that *some digital entity* with a different set of perceptual illusions can be generally intelligent.
English
2
0
0
17
NLeseul
NLeseul@NLeseul·
People say stuff like this to suggest that LLM failures in verbal puzzles aren't meaningful. But they're just pointing out that there are levels of reality that LLMs cannot perceive, yet humans can. Which seems like strong evidence that LLMs are not, and cannot become, AGI.
Eliezer Yudkowsky@allTheYud

@echetus LLMs cannot see letters! They can only see words!

English
3
0
0
968
Nathan 🔎
Nathan 🔎@NathanpmYoung·
I am much less confident than I was that prediction markets will be good on net. Have more thinking to do, but the harms (bankruptcies, partner violence) just seem much more comparable to the benefits (better coordination around political outcomes) than I thought.
English
11
0
79
3.1K
Colin
Colin@squarepianocase·
@tenobrus Seems clear enough that both directions become more visceral in a "real life" scenario. I think the base survival instinct probably does more hijacking than desperate empathy.
English
0
0
3
147
Colin
Colin@squarepianocase·
@bookwormengr And in any case: it's not as if the Chinese labs are limited to only this technique. It's a tool always more useful to models coming from behind, which I sort of like in a kumbayah anti-monopoly sense.
English
0
0
1
9
Colin
Colin@squarepianocase·
@bookwormengr As I read it, this is an argument that naive distillation is not efficient. Doesn't the average of a lot of distillation approximate exposing the logprobs? eg: if you hit the API for 1000 generations of the same token, you'll get a distribution from it.
English
1
0
1
47