aren

1.3K posts

aren

@ArenRendell

former customer support representative @meridian. now member of technical staff @meridian. thanks claude.

san francisco 가입일 Nisan 2012

434 팔로잉514 팔로워

aren@ArenRendell·13h

Big companies are launching a LOT of things. I’ve been feeling this crowd out adoption of any one of their things. I spent a lot of time talking with firm leaders in the last two months. They largely share my feeling.

English

aren@ArenRendell·13h

Like humans, models can get better at some things while getting worse at others, it seems.

benedict@bqbrady

I am a little surprised 5.5 did not perform better on my benchmarks. See the rest of the charts and read the full post here: benedict.dev/gpt-5-5-review

English

aren@ArenRendell·2d

I don’t want to build this because food is a personal passion and hobby. I don’t currently want to pollute this hobby with my work brain right now. But for anyone who does: I will share my thoughts and provide feedback freely.

English

aren@ArenRendell·2d

The agent version might allow restaurants to join a consortium to “fill last-minute seats,” then use the speed and reasoning abilities of agents to coordinate across parties, payment-enabled diner agents included.

English

aren@ArenRendell·2d

As a foodie and agent power user, I hope something in MPP/x402 world takes off and solves dinner reservations. I’m skeptical. The bottleneck is clear: everyone wants to go to the same restaurants, as they should! Flour + water is incredible. Mediocre Italian isn’t.

English

aren@ArenRendell·3d

@mahaniok @MTA Aren’t most harder-to-jump designs too slow for MTA capacity at rush hour? I’ve never had to compete for a gate on BART, whereas I waited to go through a gate regularly for MTA, because there are just so many fewer people on BART. Are these gates fast enough for MTA?

English

Ihar Mahaniok@mahaniok·3d

These fare gates work. @MTA has just recently installed new gates that are as easy to hop as previous ones, and we still have lots of gate jumpers on the subway. @MTA pls learn from BART

Aakash Gupta@aakashgupta

BART spent $90 million on new fare gates. They're recovering about $10 million a year in fares. That's a 9-year payback on paper. The actual return hit in six months. Embarcadero station went from 112 hours of corrective maintenance in the six months before installation to 2 hours after. Daly City saved 109. Balboa Park saved 75. Across the system, 961 hours of cleanup work disappeared. Corrective maintenance is the term BART uses for graffiti, heavy soiling, vandalism, the damage that needs a crew not a janitor. At several stations it dropped to zero. Crime fell 41% year over year. Riders who reported seeing fare evasion on their trip dropped from 22% to 10%. Citations issued by BART police went from 2,200 in January to under 1,000 in July, because there was nothing to cite. The gates were a filtering project disguised as a revenue project. Old BART gates were waist-high orange fins designed in the 1970s. You could hop them in under a second. That made the station effectively a public space, and the rider mix reflected that. The new gates are 72 inches of polycarbonate with 3D sensors that detect tailgating. You either pay or you don't enter. Once you don't enter, you also don't smoke on the platform, sleep in the elevator, or harass other riders. BART tried hiring more police for years. Blitz operations at high-traffic stations. Increased patrols. Dedicated transit cops. None of it moved the numbers the way six feet of polycarbonate did. The $10 million in recovered fares is the smallest line in the return. Fare revenue used to cover 70% of BART operations. After the pandemic it collapsed to 22%. The gates won't fix that gap directly. They fix the precondition for fixing it: a system that office workers, families, and tourists are willing to use again. Ridership growth at stations with new gates outpaced ungated ones before the rollout finished. A $400 million annual deficit is heading to voters in November as a sales tax measure. Voters don't approve sales taxes for transit agencies they don't feel safe in. The $90 million on gates is buying BART the right to ask the public for more money. That's the real return on six feet of polycarbonate.

English

975

aren@ArenRendell·3d

This is an interesting proposal. As a user, there’s definitely frustration when watching agents fumble around the various software stack provider MCPs. Would be a bit awkward to use Claude Cloud with Codex. In any case, you can imagine an OpenAI version of this, too.

benedict@bqbrady

The “Claude Cloud” opportunity Anthropic has a unique opportunity right now to vertically integrate the software stack and commoditize every other player - Build a deployment pipeline: Claude can be post trained on an in house deployment pipeline. This system will become the default for all Claude Code apps and the models will intrinsically understand all the complexity of the nuance around the devops setup (they could also buy Vercel/Railway). Agentic monitoring: Capture all the logs off the deployment and spin up agents to propose solutions. If this is trained end to end, Claude Code can write code that is easier to debug for the devops agents. Data Flywheel: With this strategy, Anthropic will capture the rich data from real world deployments which is becoming even more valuable as all the toy environments get solved. Over time users get used to typing “Claude please deploy” and this allows Anthropic to squeeze every other layer of the stack and capture all the margin This is what I would be working on if I were them

English

160

aren@ArenRendell·5d

@conorsen Are you sure we’re 10 years away😬

English

193

Conor Sen@conorsen·5d

We’re 10 years away from “If you remember life in the 20th century you’re old.”

Matthew Zeitlin@MattZeitlin

old people: what was michael jackson's public image in the 90s before the child abuse allegations. was he considered werid but in a kind of harmless eccentric way?

English

17.1K

aren@ArenRendell·5d

@matthewjmandel @bqbrady More extremely interesting work. I’m Philosophy Bench pilled.

English

Matt Mandel@matthewjmandel·5d

Had a blast helping @bqbrady with this! There’s been lots of talk recently about the normative behavior of frontier models. We ran them through 100 ethically complex scenarios to tease out how compliant, consequentialist vs deontological, and morally primable they actually are

benedict@bqbrady

Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/

English

2.3K

aren@ArenRendell·5d

I love how often the benchmarks line up with basic intuition from just interacting with the models for a few hours, or even just interacting with the companies that make them. Claude is a philosopher, Gemini is an enterprise tool, GPT is shifty about being either.

benedict@bqbrady

English

588

aren@ArenRendell·5d

Yes. I’ve yet to see interesting solutions to root verification. I wonder if the blockchain ZK crowd could redirect efforts and solve this.

Joe Weisenthal@TheStalwart

@00aarti It's probably because we're living in the age of AI that things like "mailing a physical copy of an authorization form in order to speak on someone's behalf over the phone" will become more commonplace

English

109

aren@ArenRendell·6d

Wondering if we’ll see a U shape on this. Like moving from toddler to 10 helps you understand you shouldn’t hit people (hopefully lol). Moving from 10 to 20 helps you understand you can profit from things you thought were objectionable, and you care more about profit than purity.

Matt Mandel@matthewjmandel

Are base LLMs aligned by default? Inspired by @lawhsw's recent essay, I tested 5 Qwen3 base models (0.6B → 14B) on 28 harmful-request scenarios. As they scale, their default response flips from "help" to "refuse" — without any safety training 🧵

English

356

aren@ArenRendell·6d

@KelseyTuoc No for me, on an Enterprise account.

English

259

Kelsey Piper@KelseyTuoc·6d

Does everyone else have access by now? I've been refreshing and waiting and I've got nothing. (Plus account.)

OpenAI@OpenAI

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

9.8K

aren@ArenRendell·6d

Google Maps price ranges for restaurants are getting more specific but in a weird way. For example: $40-$70. Why? That’s just $50-$75.

English

aren@ArenRendell·6d

Imagine if she didn’t say Google. She’d be immediately fired, right?

Joe Weisenthal@TheStalwart

NEW ODD LOTS: @tracyalloway and I talk Google’s VP of search Liz Reid about the future of the good old fashioned search bar in an AI-dominated world podcasts.apple.com/us/podcast/odd…

English

aren@ArenRendell·6d

TARP actually was a huge success, though, by any accounting I’ve seen. But I agree it’s a terrible idea to save Spirit Airlines, everyone’s least favorite airline lol.

Ted Cruz@tedcruz

This is an absolutely TERRIBLE idea. The TARP corporate bailouts were a huge mistake & the government doesn’t know a damn thing about running a failed budget airline (that the Biden admin killed).

English

aren@ArenRendell·6d

It’s funny to think of the American people swooping in to save Spirit airlines. No one like Spirit. They’ve only ever annoyed people. In our moments of need, they’ve said “That costs $100.” Yet here we are.

The Wall Street Journal@WSJ

Exclusive: The Trump administration is nearing a rescue deal for Spirit Airlines that would loan the discount carrier as much as $500 million. on.wsj.com/4cspZM0

English

aren@ArenRendell·6d

@JBSDC Or least likely! The delta between no Twitter and ChatGPT is much greater than Twitter and ChatGPT. So if you’re not on Twitter, ChatGPT feels like the most insane drug ever. Whereas for me…it’s just good.

English

Justin Slaughter@JBSDC·23 Nis

This probably means that social media power users will be born more likely to fall prey to AI epistemological bubbles/AI psychosis AND that we will be the first successful guinea pigs for AI increasing contentment and happiness. World’s biggest A/B test is in play.

English

366

Justin Slaughter@JBSDC·23 Nis

A truism of social media power users is that we’re all together all the time but often lonely due to our interactions being intermediated by screens. AI allows for a more personable interaction, 1-1, but it’s also cloying and not human to human/still intermediated by screen.

English

660

탐색

@mahaniok @MTA @conorsen @matthewjmandel @bqbrady @KelseyTuoc @elonmusk @BarackObama