aren
1.3K posts

aren
@ArenRendell
former customer support representative @meridian. now member of technical staff @meridian. thanks claude.

I am a little surprised 5.5 did not perform better on my benchmarks. See the rest of the charts and read the full post here: benedict.dev/gpt-5-5-review

BART spent $90 million on new fare gates. They're recovering about $10 million a year in fares. That's a 9-year payback on paper. The actual return hit in six months. Embarcadero station went from 112 hours of corrective maintenance in the six months before installation to 2 hours after. Daly City saved 109. Balboa Park saved 75. Across the system, 961 hours of cleanup work disappeared. Corrective maintenance is the term BART uses for graffiti, heavy soiling, vandalism, the damage that needs a crew not a janitor. At several stations it dropped to zero. Crime fell 41% year over year. Riders who reported seeing fare evasion on their trip dropped from 22% to 10%. Citations issued by BART police went from 2,200 in January to under 1,000 in July, because there was nothing to cite. The gates were a filtering project disguised as a revenue project. Old BART gates were waist-high orange fins designed in the 1970s. You could hop them in under a second. That made the station effectively a public space, and the rider mix reflected that. The new gates are 72 inches of polycarbonate with 3D sensors that detect tailgating. You either pay or you don't enter. Once you don't enter, you also don't smoke on the platform, sleep in the elevator, or harass other riders. BART tried hiring more police for years. Blitz operations at high-traffic stations. Increased patrols. Dedicated transit cops. None of it moved the numbers the way six feet of polycarbonate did. The $10 million in recovered fares is the smallest line in the return. Fare revenue used to cover 70% of BART operations. After the pandemic it collapsed to 22%. The gates won't fix that gap directly. They fix the precondition for fixing it: a system that office workers, families, and tourists are willing to use again. Ridership growth at stations with new gates outpaced ungated ones before the rollout finished. A $400 million annual deficit is heading to voters in November as a sales tax measure. Voters don't approve sales taxes for transit agencies they don't feel safe in. The $90 million on gates is buying BART the right to ask the public for more money. That's the real return on six feet of polycarbonate.

The “Claude Cloud” opportunity Anthropic has a unique opportunity right now to vertically integrate the software stack and commoditize every other player - Build a deployment pipeline: Claude can be post trained on an in house deployment pipeline. This system will become the default for all Claude Code apps and the models will intrinsically understand all the complexity of the nuance around the devops setup (they could also buy Vercel/Railway). Agentic monitoring: Capture all the logs off the deployment and spin up agents to propose solutions. If this is trained end to end, Claude Code can write code that is easier to debug for the devops agents. Data Flywheel: With this strategy, Anthropic will capture the rich data from real world deployments which is becoming even more valuable as all the toy environments get solved. Over time users get used to typing “Claude please deploy” and this allows Anthropic to squeeze every other layer of the stack and capture all the margin This is what I would be working on if I were them

old people: what was michael jackson's public image in the 90s before the child abuse allegations. was he considered werid but in a kind of harmless eccentric way?


Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/

Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/

@00aarti It's probably because we're living in the age of AI that things like "mailing a physical copy of an authorization form in order to speak on someone's behalf over the phone" will become more commonplace

Are base LLMs aligned by default? Inspired by @lawhsw's recent essay, I tested 5 Qwen3 base models (0.6B → 14B) on 28 harmful-request scenarios. As they scale, their default response flips from "help" to "refuse" — without any safety training 🧵

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

This is an absolutely TERRIBLE idea. The TARP corporate bailouts were a huge mistake & the government doesn’t know a damn thing about running a failed budget airline (that the Biden admin killed).

Exclusive: The Trump administration is nearing a rescue deal for Spirit Airlines that would loan the discount carrier as much as $500 million. on.wsj.com/4cspZM0
