johnineson

4.8K posts

johnineson

johnineson

@johnineson

I will study and prepare myself, and someday my chance will come.

Katılım Ekim 2012
1.7K Takip Edilen139 Takipçiler
lcamtuf
lcamtuf@lcamtuf·
The coreutils Rust rewrite story is pretty funny. Coreutils are tools like rm, mv, mkdir, etc. Unlike binutils, this isn't a fertile ground for memory safety bugs. But, the rewrite was completed, and in the spirit of progress, Canonical decided to switch. 🡇
English
41
88
1.4K
204.7K
johnineson
johnineson@johnineson·
Everyone has great models, or will soon. That's not going to be a differentiator. Hearts and minds / brand is one way to gain an edge.
English
0
0
0
9
Marcin Krzyzanowski
Marcin Krzyzanowski@krzyzanowskim·
please, I beg you @nikitabier , somebody here add patterns to muted words. It is getting out of hands
Marcin Krzyzanowski tweet media
English
72
31
2.2K
92.5K
johnineson
johnineson@johnineson·
@DanielLockyer That's not the weird thing. Of course they have outdated knowledge. The weird thing is: model providers never seem to train for that! Surely it's trivial to train the model to think and act with the assumption that it's operating in the future, not frozen in time.
English
1
0
11
1.2K
Daniel Lockyer
Daniel Lockyer@DanielLockyer·
one of the most frustrating things about AI review bots: outdated knowledge
Daniel Lockyer tweet media
English
47
22
1.2K
65.3K
johnineson
johnineson@johnineson·
@buccocapital @bhalligan There was an old anecdote about a call-centre optimising call duration by just hanging up calls... I'd say the right equation is more complex and has to include all of your real objectives, e.g. - High resolution rate - Low time to resolve - Low all-in costs (agent, humans, etc)
English
0
0
0
132
BuccoCapital Bloke
BuccoCapital Bloke@buccocapital·
To push on aligned incentives...and this is not just unique to you all...we are starting to see tension between pricing on outcomes and customer experience. Or at least end customer voicing the concern about incentives not being aligned If you get paid on a confirmed resolution...your incentive is not *quite* my incentive. How do I know you won't keep the customer trapped to try to resolve the issue vs escalating it to a human? Interesting one to think through...
English
6
1
36
7.7K
Brian Halligan
Brian Halligan@bhalligan·
HubSpot’s agent pricing Prospecting agent - $1 per qualified lead. Customer support agent - $.50 per resolved conversation. Both agents work well and now have aligned incentives. If using HubSpot, give them a go!
English
33
16
431
99.9K
johnineson retweetledi
Roko 🐉
Roko 🐉@RokoMijic·
We're doing the "Blender" game again There is a large blender. Everyone in the world has to decide whether to step into the blender. If at least 50% of the people do step into the blender, it will be unable to overcome their inertia to get started, and everyone survives. If less than 50% of the people step into the blender, then they all get blended up into paste and die. People who do not step into the blender suffer no adverse effects. Would you step into the blender? (Blue=step into the blender, Red= don't do that)
Roko 🐉 tweet media
Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English
221
169
3K
427.7K
johnineson
johnineson@johnineson·
@VictorTaelin Do you have some more detail on which models you tested? E.g. which version of Qwen 3.6
English
0
0
0
66
Taelin
Taelin@VictorTaelin·
GPT 5.5 is much smarter than I thought Yesterday, I did one-shots, coding, benchmarks, and was disappointed. Today, I did it all again, except via the API, which is now available. Results changed completely: → one-shot prompts went from bad to very good → excellent coding outputs, on both pi and holefill → benchmarks jumped, and now GPT *dominates* I don't know what happened, I suppose there is something wrong with my Codex. In any case, truth is this model is very smart. It obliterated my benchmark, which is crazy because some of these problems were meant not to be solved. I'll need much harder tasks. I also fixed 2 bugs that affected some providers: → added a retry for lost connection → removed the timeout limit DeepSeek and Kimi wanted to spend more than 1 hour on my prompts, so I let them. Their results are much better now. Kimi K2.6 almost reaches Sonnet 4.6, although much slower. Also this shows my points from last post were wrong Again: this is a new vibe-coded bench, I'm focused on other things, so expect bugs and don't over-read this! GLM 5.1, Gemma, Grok are not updated yet.
Taelin tweet media
English
128
120
1.9K
171.5K
johnineson
johnineson@johnineson·
@buccocapital Imagine pursuing a career that you hate so much. It's so unrewarding that you'd rather spend your best years as a pauper than work into your early 40s. These people have made some terrible life-choices.
English
0
0
2
747
johnineson retweetledi
Awni Hannun
Awni Hannun@awnihannun·
Adopting Claude speak in my regular life, episode 1: Partner: Did you do the dishes tonight? Me: Yes they're done. Partner: Why are they still dirty? Me: You're right to push back. I didn't actually do them.
English
397
3.8K
55.9K
1.8M
johnineson
johnineson@johnineson·
@steipete @thsottiaux Huh? Sorry, but that page is about Codex credits for Open Source?! I see nothing about the terms of paid subscriptions. Until now, I haven't used OpenClaw because I don't want to be the next person to be banned. We need the OK in official docs, not just a personal tweet.
English
0
0
0
23
Tibo
Tibo@thsottiaux·
Team is hard at work together with @steipete to make OpenAI models and ecosystem be the obvious way to to enjoy your claw. A lot more to come next week, but a reminder that you can use OpenClaw as part of your ChatGPT subscription today already. (also still having too much fun with ChatGPT Images 2.0 today)
Tibo tweet media
pash@pashmerepat

I've embarked on a new sprint. My mission is to make OpenAI models feel magical in OpenClaw in the next few weeks. Diving in today, I noticed a bug. When you configured OpenClaw to use the Codex harness with OpenAI models, auth was broken, and the system was silently falling back to the Pi harness. So nobody knew it was broken. Two PRs later (fix the auth bridge, stop the silent fallback), the Codex harness actually works. And the difference is night and day (pic related). Before: the agent didn't feel magical or proactive. It did the exact same shallow loop every heartbeat. Read the heartbeat file, check Discord, see nothing, say HEARTBEAT_OK. It ignored the rest of its instructions. Sometimes it would even reason about doing work and then just... not issue the tool calls. After: full agent loops. It reads its workspace context, interprets the entire checklist, inspects the repo, makes real edits, tries to verify them, and gives honest status reports when things are blocked. Later heartbeats show continuity, it doesn't repeat work, it picks up where it left off. I didn't change any prompting or scaffolding. Just swapped in the codex harness for pi. Lesson here is use the codex harness if you're building with OAI models. A lot more to do but this is a strong start.

English
229
110
2.5K
450.9K
johnineson
johnineson@johnineson·
@bcherny UI says "Welcome to Opus 4.7 xhigh!" Even when effort is not set to xhigh. That doesn't seem very clear.
English
0
0
0
80
Boris Cherny
Boris Cherny@bcherny·
In Claude Code the default effort is now xhigh, a new level between high and max giving finer control over the reasoning/latency tradeoff. 4.7 thinks more, so token use runs higher than 4.6. Manage it with effort, task budgets, or prompting for brevity.
English
19
7
213
29.5K
Boris Cherny
Boris Cherny@bcherny·
Opus 4.7 is in Claude Code today. It's more agentic, more precise, and a lot better at long-running work. It carries context across sessions and handles ambiguity much better.
Claude@claudeai

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English
378
182
3.1K
232.8K
johnineson
johnineson@johnineson·
@GergelyOrosz Anthropic is evidently winning on growth, model and product quality, but starved for compute. So I bet management has said: "We MUST reduce inference load. Try not to lose too much quality." A difficult tightrope to walk. Let's see if their reputation survives it.
English
0
0
0
223
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Claude just keeps regressing for me, day after day. I swear that until a few days ago, when Claude did not know something, it kicked off a web search, figured out, and answered. Now it just refuses to do the work that I pay for. It's like showing you the middle finger. Really?
Gergely Orosz tweet media
English
248
74
2.2K
198.8K
johnineson
johnineson@johnineson·
@VictorTaelin Anthropic is evidently under huge scaling pressure, and not model quality or product pressure. So you can bet anything that comes out in the immediate future will be optimised for lower inference load. And they will dilute the quality a little to achieve that.
English
0
0
1
959
Taelin
Taelin@VictorTaelin·
Seems like we get Opus 4.7 today? Is this the first time a lab announces a more powerful model exists and ships a less powerful variant? I wonder if Opus 4.7 is a smaller variant of the same Mythos pre-train, or just a continuation of the 4.6 we have...
English
84
8
592
53.2K
johnineson retweetledi
Robinson Meyer
Robinson Meyer@robinsonmeyer·
The current status quo is to squint at slide 43/56 on the “Amenities” page to see if when you look past the three treadmills, two ellipticals, and one rolled-up yoga mat, there’s anything heavier than 35 lbs. on the smudgy reflection of a dumbbell rack across the room
English
8
2
538
28.1K
johnineson
johnineson@johnineson·
@buccocapital I agree Adobe doesn't have lock-in, but don't you think Microsoft still has a chance to catch up? It's so deeply integrated in many orgs and such an undertaking to rip out and replace.
English
0
0
1
700
BuccoCapital Bloke
BuccoCapital Bloke@buccocapital·
Stop talking about SBC. It. Does. Not. Matter. Microsoft is getting killed. Adobe is getting killed Will anyone complain about SBC at OpenAI or Anthropic or Databricks or SpaceX? No Is SBC a cost. Yes. Of course. But it’s not about that. This is about terminal value.
English
75
30
942
158.4K
AltayCap
AltayCap@AltayCapital·
@blondesnmoney I bought one of those high end Chinese robot vacuums ($700ish but they have $200 ones too) and it vacuums and mops my floors every couple days. Quite nice and makes my tile floors look great.
English
2
0
21
5.1K