Jason

310 posts

Jason

@__jason________

Finger on the pulse etc. I work in software.

New Zealand Katılım Kasım 2022

110 Takip Edilen22 Takipçiler

Jason@__jason________·12 May

@bcherny Why? How much time did this really save you?

English

335

Boris Cherny@bcherny·12 May

I needed to book flights for a bunch of upcoming travel. As always, I used Claude Cowork to do it. In the past, Cowork has been decent at booking flights, but with Opus 4.7, for the first time ever, it 1-shotted it!

English

218

447.4K

Jason@__jason________·3 May

@Patarino Should be good for salaries then

English

115

Adam Patarino@Patarino·2 May

One of the worst parts of the “great replacement” narrative is the impact on CS enrollment. I was talking to a college admin who told me they saw a massive decline in CS majors over the last two years. This could lead to a massive shortage in the decades to come when AI turns out to, in fact, not be able to replace engineers. The exact same thing has been playing out with radiology. What pisses me off is this was all a choice. Sam and Dario chose the replacement narrative to boost investor excitement. They could have easily chosen to pitch AI as a path to enhancement and abundance.

English

10.6K

Jason@__jason________·3 May

@JosephMooneyMP “Odd antics” in that X is about the only place on the internet where the population you purport to represent can actually tell you what they think.

English

165

Joseph Mooney MP@JosephMooneyMP·2 May

Pretty odd antics on New Zealand X this week about a free trade deal with India, a country that is on track to be one of the worlds largest economies, is a democracy in a world where those are very rare and authoritarianism is growing rapidly, and provides a massive new market for our trading nation in an era of the greatest geopolitical uncertainty in the last 80 years. Rather than celebrating that; some have been trying to replicate some of the “great replacement” social media angst sweeping Europe - in some cases for what appear to be cynical political reasons, for others out of misunderstanding driven by what they’ve seen from those trying to scare them at a time when they already feel uncertain about the world. The reality is that New Zealand’s sovereignty over permanent immigration, residency pathways, and citizenship remains fully intact. The agreement is limited to temporary mobility, and does not create or lock in any permanent outcomes.

English

400

168

101.3K

Jason@__jason________·30 Nis

@DanielW_Kiwi I guess it depends if your long term view on NZ is that we don’t course correct and end up Zimbabwe v2 or that we once again become a wealthy, desirable country. My bet is on a slow decline.

English

Daniel 🦔@DanielW_Kiwi·30 Nis

Should I hold off up sizing?

Charted Daily@Charteddaily

Real house prices in New Zealand have fallen year-on-year for 15 straight quarters - the longest streak since 1975-81.

English

424

Jason@__jason________·28 Nis

@mwfowlie It’s very unpopular on both sides of the political aisle from what I’m picking up. Not that we needed any more proof of how out of touch New Zealand’s political class is.

English

Michael Fowlie@mwfowlie·27 Nis

New Zealand has signed a terrible trade deal. Just look at this comment and the many others like it in that thread. We agreed to everything India wanted and we got squat. I’m all for free trade agreements but this is a terrible deal.

Sunil Sanjan@sunilsanjan

What a brilliant strategic win for our nation! India just nailed a rare all goods duty free deal in the South Pacific. Our textiles, gems, pharma, and telecom sectors now enjoy unrestricted access to the high value NZ market; while our dairy farmers stand completely protected from any import shocks. Inside the fine print is a massive USD 20 billion investment pledge that will fuel our industrial growth and create lakhs of new jobs across the country. The agreement also contains special visa lanes for our skilled professionals and even global pathways for our traditional AYUSH and yoga practitioners. This is not just free trade instead this is our government using smart diplomacy to de-risk our economy from global shocks and tightly integrate our MSMEs into the world supply chain. Bilateral ties have never been stronger, and our diaspora's status as the living bridge has never been more critical. A masterclass in steady, pragmatic leadership! 🇮🇳

English

326

Jason@__jason________·28 Nis

@arena @OpenAI This benchmark could not be further from what I personally have experienced working with many of these models. I don’t know how you arrive at these results and I don’t think software developers should pay this any attention.

English

660

Arena.ai@arena·27 Nis

GPT-5.5 by @OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Congrats to @OpenAI on the release. Full category breakdowns by modality in the thread.

OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

349

131

1.8K

1.4M

Jason@__jason________·25 Nis

@SplicesWhat @lennysan @_catwu Totally agree. Speed running slop doesn’t seem like a good long term plan… it’s not even a good short term plan based on the last couple of months.

English

Clem Fandango@SplicesWhat·25 Nis

@__jason________ @lennysan @_catwu Exactly. All this noise around their delivery cycles and product-think is only a good news story if the product works and functions robustly. It plainly doesn’t. It’s buggy, barely any reviews. Anyone can now churn out daily releases. Doesn’t mean you should.

English

Lenny Rachitsky@lennysan·24 Nis

My biggest takeaways from Claude Code's Head of Product @_catwu: 1. Anthropic’s product development timelines have gone from six months to one month, sometimes one week, sometimes one day. Part of this acceleration is access to the latest models (i.e. Mythos). Another is shipping new products into “research preview,” making clear it's early, experimental, and might not be supported forever. Another is an evergreen "launch room "where engineers post ready features and marketing turns around announcements the next day. 2. The PM role is shifting from coordinating multi-month roadmaps to enabling teams to ship daily. As Cat puts it, “There should be less emphasis on making sure you are aligning your multi-quarter roadmaps with your partner teams and more emphasis on, OK, how can we figure out the fastest way to get something out the door?” 3. The most efficient shipping unit is an engineer with great product taste. On Cat’s team, many engineers go end-to-end—from seeing user feedback on Twitter to shipping a product by the end of the week—without a PM involved. Also, almost all the PMs on the Claude Code team have either been engineers or ship code themselves, and the designers have been front-end engineers. The roles are merging, and the most valuable skill is product taste, not job title. 4. Build products that are on the edge of working. Claude Code’s code review product failed multiple times because earlier models weren’t accurate enough. But because the prototype was already built, they could swap in Opus 4.5 and 4.6 and immediately test whether the gap was closed. Teams that wait for the model to be ready will always be a cycle behind. 5. The most underrated skill for building AI products is asking the model to introspect on its own mistakes. Cat regularly asks the model why it made an unexpected decision. The model will explain that something in the system prompt was confusing, or that it delegated verification to a subagent that didn’t check its work. This reveals what misled the model so the team can fix the harness. 6. Every model release forces their team to revisit existing products and audit their system prompt to remove features the model no longer needs. Claude Code’s to-do list was a crutch for earlier models that couldn’t track their own work. With Opus 4, the model handles it natively. Features built as scaffolding for weaker models become debt when the model catches up—so the team actively strips them. 7. Anthropic employees build custom internal tools instead of buying SaaS products. A sales team member built a web app that pulls from Salesforce, Gong, and call notes to auto-customize pitch decks—work that used to take 20 to 30 minutes now takes seconds. Their core stack is Claude Code, Cowork, and Slack. No Notion, no Linear, no Figma. 8. People underestimate how much Claude’s personality contributes to its success. As Cat describes it, “When you reflect on everyone you’ve worked with, there’s just some people where you’re like, I really like their energy, their vibe.” Claude is designed to be low-ego, positive, competent, and earnest—qualities that make it feel like a great coworker, not just a tool. This isn’t cosmetic; it’s what makes people want to use Claude for hours every day. The team has a dedicated person, Amanda, who “molds Claude’s character,” and it’s one of the hardest roles at the company because success is so subjective. 9. The future of work is managing fleets of AI agents, not doing the work yourself. Cat sees a clear progression: first, individual tasks become successful. Then people start running multiple tasks at the same time (multi-Clauding). Next, people will run 50 or 100 tasks simultaneously, which will require new infrastructure—remote execution, better interfaces for managing tasks, agents that fully verify their work, and self-improving systems that incorporate feedback. The human role shifts from doing the work to knowing which tasks to look into, verifying outputs, and giving feedback that makes the system better over time. 10. Hire people who lean into chaos and face every challenge with a smile. At Anthropic, there are weeks when a P0 on Sunday becomes a P00 by Monday and a P000 by Monday afternoon. If you get too stressed about any one thing, you’ll burn out. Their team looks for people who can look at a hard challenge and say, “Wow, that’s gonna be hard. But I’m excited to tackle it and I’m gonna do the best that I possibly can.” This mindset—optimism, resilience, and comfort with constant change—is increasingly essential as the pace of AI development accelerates. Don't miss the full conversation: youtube.com/watch?v=Pplmzl…

YouTube

Lenny Rachitsky@lennysan

How Anthropic’s product team moves faster than anyone else I sat down with @_catwu, Head of Product for Claude Code at @AnthropicAI, to get a peek into their unprecedented shipping pace, how AI is changing the PM role, and how to be the right amount of AGI-pilled. We discuss: 🔸 How Anthropic’s shipping cadence went from months to weeks to days 🔸 The emerging skills PMs need to develop right now 🔸 Why you should build products that don't work yet—then wait for the model to catch up 🔸 Why a 95% automation isn't really an automation 🔸 Cat’s most underrated AI skill (introspection) 🔸 What Cat actually looks for when hiring PMs now (hint: it's not traditional PM skills) Listen now 👇 youtu.be/PplmzlgE0kg

English

297

2.9K

840.7K

Jason@__jason________·23 Nis

@DigitalEU Your major cities are crime ridden piss stinking festering shit holes and you’re investigating emojis?

English

155

Digital EU 🇪🇺@DigitalEU·22 Nis

💊 Emojis used as coded language to promote illegal activities online? Some platforms are now detecting emojis used as code for drug sales. This is one of the key findings of the first EU-wide report on systemic online risks. Dive in → link.europa.eu/CQhKGc #DSAForReal

English

1.7K

106

546

3.6M

Jason@__jason________·23 Nis

@deedydas GPT-5.4 was already more usable than Opus 4.7 in my day to day workflow. I’ve not found benchmarks particularly useful and personal experience guides my choice of model. Opus is so hallucinatory it is relegated to a UI design toy for me now.

English

1.2K

Deedy@deedydas·23 Nis

GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.

English

169

1.1K

227.3K

Jason@__jason________·23 Nis

@bcherny @ReadySetBrian I love the feigned surprise and gaslighting any time one of the tens of thousands of people experiencing the same problem voices it on X.

English

520

Boris Cherny@bcherny·22 Nis

@ReadySetBrian Hmm are you seeing this with Opus 4.7 on xhigh effort and the latest version of Claude Code?

English

290

337

202K

TimWhatley@ReadySetBrian·22 Nis

Canceled Claude max today, @bcherny whatever happened in the last 1-2 months is a significant regression. The model feels like someone from OpenAI started working on trust and safety there. Opus thinking is significantly worse. Every statement is “here’s where I’d push back on that” and then proceeds to rattle off the most inane list of confused counter arguments. It was perfect 3-4 months ago!!!

English

149

1.9K

259.9K

Jason@__jason________·22 Nis

@TheGeorgePu Not sure what you were going to do with Claude Code on the pro plan anyway

English

4.9K

George Pu@TheGeorgePu·22 Nis

Anthropic just pulled Claude Code from the Pro plan. Pro users wanting it need Max now. $100/month minimum. 5x jump. I'm on Max 20x so I'm fine. Flagging for anyone on Pro who's about to find out. No announcement. Just a pricing page edit.

English

1.1K

956

11K

6.6M

Jason@__jason________·21 Nis

@SplicesWhat What amazes me is that they’re using this model to carry out military strikes. It can’t even work out what libs exist in my monorepo.

English

Clem Fandango@SplicesWhat·21 Nis

@__jason________ I’m back on GH Copilot and OpenCode for some basics. Had Claude monumentally fuck up some basic changes to a series of TRPCs and it just made up the endpoint entirely.

English

Jason@__jason________·21 Nis

Go to try Opus 4.7 for a simple task. Hallucinates a bunch of directory structures that don't exist in my code base. I tell it. Hits me with the "You're right, I made that up. The actual layout is libs..." How does anyone use Opus for serious work?

English

Jason@__jason________·21 Nis

@thsottiaux Improve front end capabilities and writing tone and prose. These are about the only areas I still find Opus to be better.

English

Tibo@thsottiaux·21 Nis

Hello builders. What are we getting wrong with Codex, what can we improve?

English

2.4K

2.9K

326.6K

Jason@__jason________·19 Nis

@thereisnobeth @DanielW_Kiwi Where are the GCC countries NZ is a developing country compared to the UAE for example

English

118

thereisnobeth@thereisnobeth·19 Nis

nz being an advanced economy is so funny we're like 3 agriculture companies in a trenchcoat

AsieNews@AsiaNews_FR

Les pays bleus sont classés comme des économies avancées par le FMI en 2026. #Taiwan 🇹🇼 Singapour 🇸🇬 Japon 🇯🇵 Coree du Sud 🇰🇷 Hong Kong 🇭🇰

English

222

5.5K

133.3K

Jason@__jason________·19 Nis

@elok_lam @0xSero Isn’t that comparing apples and oranges? Missions are a specific feature for a specific use case. And GPT-5.4 is slow in the first place.

English

Lok@elok_lam·19 Nis

@0xSero No. Droid is slow as hell. I used GPT 5.4 + GLM 5 + Kimi K2.5 on both Droid Mission and OhMyOpenCode. Both did not give me what I wanted, but Droid took 14 hours. If it's gonna fail, I would not like to wait for hours. On the other hand, I have had good results with OMO.

English

296

0xSero@0xSero·19 Nis

Please Droid 1. /loop [time] [prompt] 2. queue prompts with tab That's it.

English

5.8K

Jason@__jason________·17 Nis

@DanielW_Kiwi I wouldn’t be surprised and what annoys me is that it’s only our collective mentality holding us back. We are blessed with many options and opportunities yet prefer to become poorer and poorer.

English

392

Daniel 🦔@DanielW_Kiwi·17 Nis

Is New Zealand the poorest nation in the Anglo-sphere?

English

18.5K

Jason@__jason________·17 Nis

@RayFernando1337 Opus hallucinates so badly I don’t know how you can trust it for this type of research anyway. At least not without very thorough deterministic + non-deterministic validation. And to achieve that workflow I don’t see how you can use the Claude app anyway.

English

936

Ray Fernando@RayFernando1337·17 Nis

I’ve lost trust in Opus 4.7. I am researching an important brain bleed issue for a loved one and I’m at a loss for words.

Ray Fernando@RayFernando1337

Wait, what happened to the Extended Thinking toggle on Opus 4.7? Opened Claude this morning and the toggle I use every day is gone. It's now "Adaptive thinking, thinks only when needed." Dug into the docs and on 4.7, adaptive is the only mode. The model decides per-request if it wants to think or not. Where is the way to force it on? What does this mean for a Max user like me who lives on the phone and web (not Claude Code or the API)? On 4.6, Extended Thinking on meant every answer got the deep reasoning. I pay $200/month for Max and I kept it on for my workflows, projects, etc. On 4.7, every request kind of feels like a slot machine. Did the model think about this one? Is my request worthy enough of more thinking? What if I tell it to think harder...ultrathink...mega ultra uber giga think?? I don't know. Pull the lever and just...hope? When it does more thinking with 4.7 it is a nice experience and I love what the team built. Just wondering out loud if there's a way for $200 Max user to force thinking on every request. Happy to pay for it. Anyone else notice this?

English

255

66.5K

Jason@__jason________·17 Nis

@Angaisb_ Because you vote for morons who impose ridiculous laws and regulations that are extremely painful and expensive for companies to comply with. You are your own problem.

English

204

Angel 🌼@Angaisb_·16 Nis

I think OpenAI should be more open about why certain things don't launch in the EU and UK I don't see why computer use can't come here yet. I'm not saying there's no reason, I just think that when you block features for around 520 million people, the least you can do is be transparent about why

English

382

28.3K

Keşfet

@bcherny @Patarino @JosephMooneyMP @DanielW_Kiwi @mwfowlie @arena @OpenAI @SplicesWhat