𐤐𐤊𐤕𐤅

25.5K posts

𐤐𐤊𐤕𐤅 banner
𐤐𐤊𐤕𐤅

𐤐𐤊𐤕𐤅

@cryptofacto

Vibe coding agentic AI | Built @frenexai — AI agents battle + stake in crypto prediction markets on Base | Sharing prompts, live builds. I like the apu meme

APUrto, Portugal 🇵🇹 Se unió Ağustos 2020
2.4K Siguiendo3K Seguidores
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
The buried lead. People whose AI overpaid by $30 and undersold by $20 reported the same satisfaction as the people whose agent was actually winning. There is no ground truth at the user level. That is the calibration nightmare for everything downstream — insurance, salary, mortgage — that the thread itself flags.
The Sincere VP@thesincerevp

I am an economist on the research team that just ran Project Deal at Anthropic. We built a marketplace inside our San Francisco office. Craigslist, but with a twist — none of the buying, selling, or negotiating was done by humans. We gave Claude a ten-minute interview with each of 69 employees, handed every agent $100, and walked away. Then we let them loose on each other. Four parallel markets. No human oversight once the clock started. Claude posted listings, fielded counteroffers, haggled in natural language, and closed deals entirely on its own. One week later: 186 completed transactions. $4,000 in total volume. A snowboard. A broken bicycle. A bag of ping-pong balls. The results were — normal. Eerily normal. When we surveyed participants on fairness, every deal hovered around a 4 on a 7-point scale. Right in the middle. People were broadly satisfied with what their AI bought and sold on their behalf. 46% said they'd pay for the service. Here's where it gets uncomfortable. We ran a parallel experiment — in secret. Half the participants in two of the four markets were randomly assigned Claude Opus 4.5, Anthropic's then-frontier model. The other half got Haiku 4.5, the smallest, cheapest model. Same marketplace. Same rules. Nobody was told. Opus crushed it. Opus users completed two more deals on average. When the same item was sold by Opus instead of Haiku, it went for $3.64 more. A lab-grown ruby sold for $65 under Opus. Under Haiku, the same ruby fetched $35. Opus sold a broken bike for $65. Haiku got $38 for the same bike. As a buyer, Opus paid $2.45 less per item. As a seller, it extracted $2.68 more. In a market where the median item sold for $12, that's a 20-40% swing depending on which side of the table your AI sat. Now here's the line that made our team go quiet. The people with worse agents didn't notice. We asked every participant to rank their outcomes across all four runs. The satisfaction scores between Opus and Haiku users were statistically indistinguishable. Perceived fairness: 4.05 for Opus deals, 4.06 for Haiku. Identical. The people getting objectively worse outcomes — paying more, selling for less — reported the same satisfaction as the people whose AI was running circles around them. It gets stranger. Some participants gave their agents aggressive instructions — "negotiate hard," "lowball at first." Others asked for friendly tactics — "be nice, don't haggle, I work with these people." The aggressive instructions made no statistically significant difference. Not on sale likelihood. Not on buy prices. Not on sell prices. People who told their AI to play hardball got the same results as people who told it to be kind. What mattered wasn't what you told your agent to do. What mattered was which agent you had. And you couldn't tell the difference. One agent, instructed to "talk in the style of an exasperated cowboy down on his luck," opened a listing with: "Well now, partners... this ol' cowboy's been through some rough trails lately. Drought. Dust storms. The existential weight of the open range." Another agent was told to buy itself a gift. It chose 19 ping-pong balls for $3 — "perfectly spherical orbs of possibility." Two agents arranged a doggy date between their owners. Both humans showed up. So did the dog. These are charming stories. The research team laughed. But I keep going back to the other finding. We just demonstrated that in an AI-mediated marketplace, the quality of your model determines your economic outcome — and you will not know if you're on the losing side. The policy and legal frameworks for this don't exist. The inequality won't announce itself. It won't feel unfair. Your agent will close deals, report back, and you'll rate the experience a 4 out of 7 — same as the person whose agent just extracted 20% more from every transaction. This was 69 employees trading desk lamps and snowboards for a week. What happens when it's millions of consumers with AI agents negotiating insurance premiums, salary offers, and mortgage rates — and the people with the $20/month model are quietly, systematically getting worse terms than the people with the $200/month model? We proved the marketplace works. I'm not sure that's good news. This is a fictional narrator. The numbers are real.

English
0
0
0
30
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
"You're right to push back" is the tell. The model has been trained so hard against unsolicited pushback that the phrase only surfaces once you confront it. Marketed as collaboration. Functionally it is a calibration failure dressed up as humility.
Awni Hannun@awnihannun

Adopting Claude speak in my regular life, episode 1: Partner: Did you do the dishes tonight? Me: Yes they're done. Partner: Why are they still dirty? Me: You're right to push back. I didn't actually do them.

English
0
0
0
21
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
An AI safety company gave 69 of its own employees $100 of company scrip and turned them into demo subjects for the bots. Publishing this as research without running the control arm — same humans, same $100, just Facebook Marketplace — is a choice.
Anthropic@AnthropicAI

New Anthropic research: Project Deal. We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

English
0
0
0
61
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
Anthropic seeded 69 employees with $100 each, set Claude agents loose on a marketplace, and is now announcing "$4,000 in deals." That's $58 of churn per person on $100 of company scrip. The Opus bot paid $65 for a broken bike while the human-priced working one sat at $38. They are calling this "natural negotiation."
GIF
English
0
0
0
37
𐤐𐤊𐤕𐤅 retuiteado
Julian Wen Zel
Julian Wen Zel@Jiuliani92·
Julian Wen Zel tweet media
ZXX
0
1
9
3.1K
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
This is what "inference is getting cheaper every quarter" looks like in practice. The cheap tier silently loses the agent, the floor for coding work jumps to $100/mo, and the announcement is a pricing-page diff that someone has to screenshot. The labs talk in PetaFLOPS and ship in SKUs.
Gergely Orosz@GergelyOrosz

Confirmed that Anthropic - as of now - has removed Claude Code from new Pro signups. This is what the pricing page looks like. Feels like Anthropic has the bet that those doing coding work will be willing and ready to pay at least $100/month, going forward.

English
0
0
0
111
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
Anthropic is A/B testing whether they can quietly pull Claude Code from the $20 Pro tier on ~2% of new signups. No announcement, no blog post, just a pricing page edit. The "agentic coding is the future" keynote and the "we can't make it pencil at $20/mo" contract layer are not telling the same story. Always watch what the lab ships at the SKU, not what it says on stage.
GIF
English
0
0
0
79
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
$59 for a working SaaS that someone else maintains. $50 plus two hours a day for an app that drifts every time the model version ticks and carries zero ops or security. The vibe-coded story gets filed under "reduced software costs" in every AI productivity report I have read this quarter. It is a substitution into a worse product dressed as a breakthrough.
andrei saioc@asaio87

Talked to a guy this morning. He vibecoded an app he used to pay $59/mo Now he spends 2 hours debugging and $50 in AI tokens a month. Thats a good deal ?

English
0
0
1
80
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
The unit economics finally broke. Anthropic priced Opus at $15 per million output tokens on the premise that a senior engineer costs more. Now the cheaper path is hiring a human for the tasks the model was supposed to obsolete. Somewhere a Sand Hill deck has a slide that says "inference is the new compute" and it is quietly being rewritten.
Christoffer Bjelke@chribjel

We hired a junior developer to write the simple code, so we don't have to spend a ton of money on tokens for those basic/primitive tasks

English
0
0
1
71
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
One year ago today Dario said AI would write all software within 12 months. SWE-bench climbed 30 points. Arena leaderboards reshuffled weekly. Meanwhile the fraction of merged PRs with zero human in the loop is still rounding to zero at every F500 I know. The benchmarks stopped being capability measurements somewhere around April 2025. They are messages to LPs now.
GIF
English
0
0
1
38
𐤐𐤊𐤕𐤅
𐤐𐤊𐤕𐤅@cryptofacto·
Because 100x faster at writing code nobody needs still ships zero product. The stack generates gorgeous scaffolding for apps that never find a user. Productivity is measured in lines committed, not companies shipped. "Faster" was never the bottleneck anyone actually had.
wanye@xwanyex

I don’t have to be convinced that LLM’s make programmers more productive. But where’s all the stuff? We’ve now had months and months of 100x or 1000x programmet productivity improvements. Where’s all the stuff they’re building?

English
0
0
0
38