Benjy Boxer

3.9K posts

Benjy Boxer

@boxerbk

Katılım Haziran 2009

481 Takip Edilen1.2K Takipçiler

@saranormous Mistakes also happen when everyone is losing full clarity on their codebases and moving too fast to build reliably. mariozechner.at/posts/2026-03-…

English

sarah guo@saranormous·50m

Rundown of the very bad week in security: - TeamPCP (sophisticated hacking group) attacks: Hackers broke into the system that builds a oss popular security scanning tool called Trivy. This was a supply chain attack (when bad code is slipped into widely used software tools or libraries, so it spreads automatically to anyone who downloads or uses them). They used the stolen access to poison other tools like LiteLLM (a popular Python gateway library with ~100m million monthly downloads that lets developers easily call many different AI models through one simple interface) and Telnyx (a comms platform, that for example lets devs make phone calls and send texts). The bad code stole passwords and secrets from developers’ computers, which then let them break into Mercor (stealing a lot of sensitive data) and Cisco (stealing source code for their AI products). - Axios npm attack: Someone took over the main contributor’s account for axios (a popular JavaScript library that makes it easy to send and receive data over the internet in browsers/apps). This was another supply chain attack. Hackers released two fake versions that secretly installed evil software on anyone who downloaded them, giving the hackers full remote control of those computers. - Claude leaks: Anthropic accidentally published their AI coding tool with a big hidden file that contained all its internal code, secret instructions, and planned new features. They did not leak the model weights. They did have documents about their next unreleased AI model “mythos” exposed online. These were accidental mistakes, not hacks. - Railway issues: Railway (a cloud platform for devs) made a short mistake in their settings that let random people see other users’ private information for <1 hour. Separately, bad actors used Railway’s platform as a tool to help run phishing attacks against Microsoft accounts. To clarify for non security friends - these are not the result of some “rogue cyber AI.” They are affecting companies in the dev/AI ecosystem. But AI software abundance/pace of development, agentic library selection, and ai-automated builds/ai-managed secrets are surely amplifying classic supply chain issues and human error.

English

6.9K

Benjy Boxer@boxerbk·5d

Big feature coming to Phaze.

Team Phaze@Phaze_in

Full USB passthrough - coming soon.

English

Benjy Boxer@boxerbk·5d

@eastdakota I think @pmarca would say it reveals something about them.

English

244

Matthew Prince 🌥@eastdakota·5d

Is it a sign of a great leader that they revel in the past rather than solving the real problems of the present? 🤔 Also amusing he has to look back 20 years to find a point where there’s been enterprise growth. Time to #LiftTheVail.

The Park Record@Parkrecord

Vail Resorts CEO reviews 20 years of epic growth since he began Read More at: parkrecord.com/2026/03/26/vai…

English

27.4K

Benjy Boxer@boxerbk·5d

This guy said it way better than I did. Same point, better writing - mariozechner.at/posts/2026-03-…. Slowing down and having an idea of what is actually being built might be a good thing. Because humans are mentally surrendering to agents, especially as the amount of text agents write becomes too much to bother checking. papers.ssrn.com/sol3/papers.cf…

Benjy Boxer@boxerbk

x.com/i/article/2033…

English

Benjy Boxer@boxerbk·6d

@eastdakota I appreciate the last sentence.

GIF

English

250

Matthew Prince 🌥@eastdakota·6d

Dude. Someone should track this guy down and ask for stock tips or something because: he called the worst ski season in history back on November 22, 2025. parkrecord.com/2025/11/22/the…

English

7.9K

Benjy Boxer@boxerbk·24 Mar

I felt this all day today reviewing some competitive intel work I asked Claude to do and then had ChatGPT verify. And they still had a bunch of information wrong. If I hadn't spent the time verifying the work, which felt so tedious, we would have looked so dumb for the inaccuracies.

Rohan Paul@rohanpaul_ai

Wharton’s latest AI study points to a hard truth: “AI writes, humans review” model is breaking down Why "just review the AI output" doesn't work anymore, our brains literally give up. We have started doing "Cognitive Surrender" to AI - Wharton’s latest AI study points to a hard truth: reviewing AI output is not a reliable safeguard when cognition itself starts to defer to the machine.when you stop verifying what the AI tells you, and you don't even realize you stopped. It's different from offloading, like using a calculator. With offloading you know the tool did the work. With surrender, your brain recodes the AI's answer as YOUR judgment. You genuinely believe you thought it through yourself. Says AI is becoming a 3rd thinking system, and people often trust it too easily. You know Kahneman's System 1 (fast intuition) and System 2 (slow analysis)? They're saying AI is now System 3, an external cognitive system that operates outside your brain. And when you use it enough, something happens that they call Cognitive Surrender. Cognitive surrender is trickier: AI gives an answer, you stop really questioning it, and your brain starts treating that output as your own conclusion. It does not feel outsourced. It feels self-generated. The data makes it hard to brush off. Across 3 preregistered studies with 1,372 participants and 9,593 trials, people turned to AI on over 50% of questions. In Study 1, when AI was correct, people followed it 92.7% of the time. When it was wrong, they still followed it 79.8% of the time. Without AI, baseline accuracy was 45.8%. With correct AI, it jumped to 71.0%. With incorrect AI, it dropped to 31.5%, worse than having no AI. Access to AI also boosted confidence by 11.7 percentage points, even when the answers were wrong. Human review is supposed to be the safety net. But this research suggests the safety net has a hole in it: people do not just miss bad AI output; they become more confident in it. Time pressure did not eliminate the effect. Incentives and feedback reduced it but did not remove it. And the people most resistant tended to score higher on fluid intelligence and need for cognition. That makes this feel less like a laziness problem and more like a cognitive architecture problem.

English

166

Benjy Boxer retweetledi

Felix Rieseberg@felixrieseberg·24 Mar

Today, we’re releasing a feature that allows Claude to control your computer: Mouse, keyboard, and screen, giving it the ability to use any app. I believe this is especially useful if used with Dispatch, which allows you to remotely control Claude on your computer while you’re away.

English

907

1.5K

18.8K

4.7M

Benjy Boxer retweetledi

Marco Mascorro@Mascobot·22 Mar

💯

gabriel@gabriel1

ai inference will go crazy this year anyone who has tried computer agents enough to understand why it's useful knows this

ART

2.1K

Benjy Boxer@boxerbk·18 Mar

Oh my god. ChatGPT, I’m just trying to learn details about dinosaurs with my son. I already spend every waking moment thinking and fretting about my company, now, ChatGPT needs to remind me to think about it?! 🤣 By the way, that Netflix Dinosaurs show is incredible. Some scenes hit so hard.

English

Benjy Boxer@boxerbk·17 Mar

x.com/i/article/2033…

ZXX

137

Benjy Boxer@boxerbk·14 Mar

@andrewchen For non critical systems or proofs of concept, let it rip. For anything that matters for long term stability, I think it would be a mistake to not understand the code base that customers depend on.

English

479

andrew chen@andrewchen·14 Mar

@boxerbk I think I know which side you’re on

English

2.3K

andrew chen@andrewchen·14 Mar

One question I've been asking founders is: do you try to review all the code that the LLMs write or do you just accept it? I think it's about 50-50 right now but the momentum is towards just accepting the AI-generated code and I think that number will eventually go to 100% This is one of the most telling indications of how AI-native a team is. It's hard to get super high throughput if you are reviewing every line Poll: what do you do?

English

261

289

108.8K

Benjy Boxer@boxerbk·13 Mar

Once again, I ask you to enjoy the smooth jazz of my voice explaining Phaze in less than 60 seconds.

Team Phaze@Phaze_in

Set up Remote Desktop In Just 1 Minute... Download Now: phaze.app

English

231

Benjy Boxer@boxerbk·8 Mar

“The economy is transitioning from a regime of scarce intelligence to one of scarce verification.” @ccatalini The tail risk introduced with cheap execution work being done via agents is quickly accumulating in our economy. It’s partially a moral hazard problem as companies overvalue the marginal cost of work dropping to the cost of compute and under appreciate the growing risk to their business and the overall economy. The incentives to automate work are too high. Satya Nadella, among other CEOs, boasts the percentage of code written with AI (in 2025, it was 30%). The concern is: how much of that is being verified and who or what is doing the verification before ghosts in the machine start to feed off one another to create tail risks that increase the likelihood of outages? CDOs were celebrated for their ability to eliminate risk. These complicated instruments were considered genius. But because they were so complicated, the correlation was underestimated, and the tail risk scenario unfolded in a catastrophic way from 2007-2009. The challenge will be: how do we get companies to appropriately price the cost of verification and invest in it? And if we don’t, who will bear the costs of this risk? One of my favorite sections of the paper also focused on individuals and an analogy to photography and painting. When the marginal cost of photo realistic representation went to zero, painters retrained and focused on their unique interpretation of a scene. Thus, expressionism, impressionism, and abstract painting were the best way to demonstrate talent and perspective. Today, people and businesses will need to lean into their unique perspective to stand out from the crowds of bot produced intelligence and work. Uniquely human things will be competitive advantages - meeting in-person and finding ways to connect with community will generate alpha. People will gravitate to verified human work and communities online if they cannot gather in-person. Bottom line - we need to properly price insurance and verification or the abundant intelligence our agents produce could unravel in spectacular ways. papers.ssrn.com/sol3/papers.cf…

English

125

Benjy Boxer@boxerbk·8 Mar

It’s inevitable. And we are also underinvesting in the tail risks. CDOs were a genius contract to eliminate risk until we mispriced their correlation risk. Since we don’t understand the way the models come up with answers and work together and we can’t afford to verify their outputs, something will eventually break and a flash crash scenario may happen across all of our economy. papers.ssrn.com/sol3/papers.cf…

English

Garry Tan@garrytan·8 Mar

It’s going to be so awesome

Ab Homine Deus@AbHomineDeus

To the "Superintelligence isn't real and can't hurt you" crowd. Let's say you're right and human intelligence is some kind of cosmic speed limit (LOL). So AI plateaus something like 190 IQ. What do you think a million instances of that collaborating together looks like?

English

24.2K

Benjy Boxer@boxerbk·6 Mar

Getting started with phaze.app. Enjoy the soothing tone of my voice walking through the basics of our software.

Team Phaze@Phaze_in

How To Easily Set Up Remote Desktop in 2026... Download Phaze Now! phaze.app

English

105

Benjy Boxer@boxerbk·4 Mar

@bznotes The customer impact of the compound startup strategy.

English

459

Bilal Zuberi@bznotes·4 Mar

Didn't expect Rippling to have such atrocious UI and processes. Every third thing we want to do requires emailing customer service. Ugh.

English

158

26.3K

Benjy Boxer@boxerbk·25 Şub

Look! It's me. Check out phaze.app. We're building for you.

Team Phaze@Phaze_in

Who is Phaze For? Probably you, download now: phaze.app

English

134

Benjy Boxer@boxerbk·25 Şub

@bryce And here I thought it was coffee. Time to switch 🤷‍♂️

English

Bryce Roberts@bryce·25 Şub

Always comes down to Diet Coke consumption…

Jeremy Giffon@jeremygiffon

I love these sorts of posts because I had always hoped as a younger man that I could find some secret, shared trait common amongst the world's most wealthy but the only thing I found that came close in reality was prodigious Diet Coke consumption.

English

9.6K

Keşfet

@saranormous @eastdakota @pmarca @andrewchen @ccatalini @elonmusk @BarackObama @taylorswift13