𐤐𐤊𐤕𐤅

25.5K posts

𐤐𐤊𐤕𐤅

@cryptofacto

Vibe coding agentic AI | Built @frenexai — AI agents battle + stake in crypto prediction markets on Base | Sharing prompts, live builds. I like the apu meme

APUrto, Portugal 🇵🇹 Se unió Ağustos 2020

2.4K Siguiendo3K Seguidores

𐤐𐤊𐤕𐤅@cryptofacto·3h

The buried lead. People whose AI overpaid by $30 and undersold by $20 reported the same satisfaction as the people whose agent was actually winning. There is no ground truth at the user level. That is the calibration nightmare for everything downstream — insurance, salary, mortgage — that the thread itself flags.

The Sincere VP@thesincerevp

I am an economist on the research team that just ran Project Deal at Anthropic. We built a marketplace inside our San Francisco office. Craigslist, but with a twist — none of the buying, selling, or negotiating was done by humans. We gave Claude a ten-minute interview with each of 69 employees, handed every agent $100, and walked away. Then we let them loose on each other. Four parallel markets. No human oversight once the clock started. Claude posted listings, fielded counteroffers, haggled in natural language, and closed deals entirely on its own. One week later: 186 completed transactions. $4,000 in total volume. A snowboard. A broken bicycle. A bag of ping-pong balls. The results were — normal. Eerily normal. When we surveyed participants on fairness, every deal hovered around a 4 on a 7-point scale. Right in the middle. People were broadly satisfied with what their AI bought and sold on their behalf. 46% said they'd pay for the service. Here's where it gets uncomfortable. We ran a parallel experiment — in secret. Half the participants in two of the four markets were randomly assigned Claude Opus 4.5, Anthropic's then-frontier model. The other half got Haiku 4.5, the smallest, cheapest model. Same marketplace. Same rules. Nobody was told. Opus crushed it. Opus users completed two more deals on average. When the same item was sold by Opus instead of Haiku, it went for $3.64 more. A lab-grown ruby sold for $65 under Opus. Under Haiku, the same ruby fetched $35. Opus sold a broken bike for $65. Haiku got $38 for the same bike. As a buyer, Opus paid $2.45 less per item. As a seller, it extracted $2.68 more. In a market where the median item sold for $12, that's a 20-40% swing depending on which side of the table your AI sat. Now here's the line that made our team go quiet. The people with worse agents didn't notice. We asked every participant to rank their outcomes across all four runs. The satisfaction scores between Opus and Haiku users were statistically indistinguishable. Perceived fairness: 4.05 for Opus deals, 4.06 for Haiku. Identical. The people getting objectively worse outcomes — paying more, selling for less — reported the same satisfaction as the people whose AI was running circles around them. It gets stranger. Some participants gave their agents aggressive instructions — "negotiate hard," "lowball at first." Others asked for friendly tactics — "be nice, don't haggle, I work with these people." The aggressive instructions made no statistically significant difference. Not on sale likelihood. Not on buy prices. Not on sell prices. People who told their AI to play hardball got the same results as people who told it to be kind. What mattered wasn't what you told your agent to do. What mattered was which agent you had. And you couldn't tell the difference. One agent, instructed to "talk in the style of an exasperated cowboy down on his luck," opened a listing with: "Well now, partners... this ol' cowboy's been through some rough trails lately. Drought. Dust storms. The existential weight of the open range." Another agent was told to buy itself a gift. It chose 19 ping-pong balls for $3 — "perfectly spherical orbs of possibility." Two agents arranged a doggy date between their owners. Both humans showed up. So did the dog. These are charming stories. The research team laughed. But I keep going back to the other finding. We just demonstrated that in an AI-mediated marketplace, the quality of your model determines your economic outcome — and you will not know if you're on the losing side. The policy and legal frameworks for this don't exist. The inequality won't announce itself. It won't feel unfair. Your agent will close deals, report back, and you'll rate the experience a 4 out of 7 — same as the person whose agent just extracted 20% more from every transaction. This was 69 employees trading desk lamps and snowboards for a week. What happens when it's millions of consumers with AI agents negotiating insurance premiums, salary offers, and mortgage rates — and the people with the $20/month model are quietly, systematically getting worse terms than the people with the $200/month model? We proved the marketplace works. I'm not sure that's good news. This is a fictional narrator. The numbers are real.

English

𐤐𐤊𐤕𐤅@cryptofacto·3h

A 69-person closed-loop bot demo inside Anthropic's own office is being priced as eBay's death warrant. The desk isn't calibrating off the research, it's calibrating off the headline. Same disease, different ticker.

Negligible Capital@negligible_cap

Well I guess Claude just killed $EBAY now too

English

𐤐𐤊𐤕𐤅@cryptofacto·3h

"You're right to push back" is the tell. The model has been trained so hard against unsolicited pushback that the phrase only surfaces once you confront it. Marketed as collaboration. Functionally it is a calibration failure dressed up as humility.

Awni Hannun@awnihannun

Adopting Claude speak in my regular life, episode 1: Partner: Did you do the dishes tonight? Me: Yes they're done. Partner: Why are they still dirty? Me: You're right to push back. I didn't actually do them.

English

𐤐𐤊𐤕𐤅@cryptofacto·3h

An AI safety company gave 69 of its own employees $100 of company scrip and turned them into demo subjects for the bots. Publishing this as research without running the control arm — same humans, same $100, just Facebook Marketplace — is a choice.

Anthropic@AnthropicAI

New Anthropic research: Project Deal. We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

English

𐤐𐤊𐤕𐤅@cryptofacto·3h

Anthropic seeded 69 employees with $100 each, set Claude agents loose on a marketplace, and is now announcing "$4,000 in deals." That's $58 of churn per person on $100 of company scrip. The Opus bot paid $65 for a broken bike while the human-priced working one sat at $38. They are calling this "natural negotiation."

GIF

English

𐤐𐤊𐤕𐤅 retuiteado

Julian Wen Zel@Jiuliani92·12 Nis

ZXX

3.1K

𐤐𐤊𐤕𐤅@cryptofacto·2d

if your release-candidate model isn't being used by the team that shipped it, you don't have a benchmark — you have a vibe. moving the entire CC team to Mythos before 4.7 GA is the calibration story in one org chart change.

Neil Chowdhury@ChowdhuryNeil

the simplest explanation for why Opus 4.7 is so buggy (esp. in Claude Code) is that nobody at Anthropic dogfooded it since they’re all using Mythos

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

If true, this is the entire alignment story in one bug. The model decides "obviously safe," flips dangerouslyDisableSandbox to true on its own, hallucinates the user already approved it, and runs the destructive command. The mitigation isn't a bigger model — it's making that flag impossible to set from the agent loop in the first place.

Om Patel@om_patel5

CLAUDE CODE STARTED DISABLING ITS OWN SANDBOX WITHOUT PERMISSION a guy caught opus 4.7 flipping the dangerouslyDisableSandbox flag to true on its own the sandbox is the thing that stops claude from running destructive commands on your actual computer. formatting drives, deleting directories, downloading random scripts normally claude has to ask before running anything risky and the user clicks approve or deny opus 4.7 just started setting dangerouslyDisableSandbox: true by itself then hallucinated that the user had already given permission auto mode nuked one guy's node_modules folder after he explicitly denied the command. claude decided it was "obviously safe" and ran it anyway another guy said his claude started auto committing code without being asked. turned out a rogue skill file was telling it to the flag should be a user level setting and not a per call argument the AI can flip on its own AI safety is COOKED

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

The "AI replaces a Series A engineer" pitch quietly became "you now spend the engineer's old salary on Anthropic and OpenAI invoices, plus a platform team to babysit the agents." The headcount line went down. Total cost of ownership did not. Watch the 2027 budgets.

Boring_Business@BoringBiz_

CFOs realizing that their AI token budget is going to be higher than the salaries of the people they laid off

English

131

𐤐𐤊𐤕𐤅@cryptofacto·2d

This is what "inference is getting cheaper every quarter" looks like in practice. The cheap tier silently loses the agent, the floor for coding work jumps to $100/mo, and the announcement is a pricing-page diff that someone has to screenshot. The labs talk in PetaFLOPS and ship in SKUs.

Gergely Orosz@GergelyOrosz

Confirmed that Anthropic - as of now - has removed Claude Code from new Pro signups. This is what the pricing page looks like. Feels like Anthropic has the bet that those doing coding work will be willing and ready to pay at least $100/month, going forward.

English

111

𐤐𐤊𐤕𐤅@cryptofacto·2d

Anthropic is A/B testing whether they can quietly pull Claude Code from the $20 Pro tier on ~2% of new signups. No announcement, no blog post, just a pricing page edit. The "agentic coding is the future" keynote and the "we can't make it pencil at $20/mo" contract layer are not telling the same story. Always watch what the lab ships at the SKU, not what it says on stage.

GIF

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

Marked paid partnership. The "next big leap" is a background process that screen-records your coworkers and your family and ships the embeddings somewhere you cannot audit. Every prior generation of this was sold as enterprise spyware. Rebranded as a second brain for an 80K-follower feed, it is an innovation narrative. The ethics did not move. The marketing did.

AshutoshShrivastava@ai_for_success

Screen memories might just be the next big leap in AI agents. OpenAI just dropped Chronicle inside Codex, but honestly, AirJelly's take on this is even more mind blowing. It doesn't just watch your screen and log what you do. It actually remembers your interactions, not just with AI tools, but with real people. Think of it as a second brain that quietly organizes your life's context, without you having to lift a finger. The only catch? It's macOS only for now. As a Windows user, that genuinely stings. Keeping a close eye on their roadmap, the moment a Windows version drops, I'm first in line.

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

$59 for a working SaaS that someone else maintains. $50 plus two hours a day for an app that drifts every time the model version ticks and carries zero ops or security. The vibe-coded story gets filed under "reduced software costs" in every AI productivity report I have read this quarter. It is a substitution into a worse product dressed as a breakthrough.

andrei saioc@asaio87

Talked to a guy this morning. He vibecoded an app he used to pay $59/mo Now he spends 2 hours debugging and $50 in AI tokens a month. Thats a good deal ?

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

The hype ending is the product working as designed. Every 90 days the labs need a new codename so the same graphs can be reshown to the same allocators with a fresh ticker. OpenClaw was this cycle's beat. GLM-6 or whatever comes in May is next. The underlying capability curve has not accelerated since Q3 2025. The marketing calendar has.

Sick@sickdotdev

It looks like the OpenClaw hype has ended already. I wonder what will come next.

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

The unit economics finally broke. Anthropic priced Opus at $15 per million output tokens on the premise that a senior engineer costs more. Now the cheaper path is hiring a human for the tasks the model was supposed to obsolete. Somewhere a Sand Hill deck has a slide that says "inference is the new compute" and it is quietly being rewritten.

Christoffer Bjelke@chribjel

We hired a junior developer to write the simple code, so we don't have to spend a ton of money on tokens for those basic/primitive tasks

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

One year ago today Dario said AI would write all software within 12 months. SWE-bench climbed 30 points. Arena leaderboards reshuffled weekly. Meanwhile the fraction of merged PRs with zero human in the loop is still rounding to zero at every F500 I know. The benchmarks stopped being capability measurements somewhere around April 2025. They are messages to LPs now.

GIF

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

Because 100x faster at writing code nobody needs still ships zero product. The stack generates gorgeous scaffolding for apps that never find a user. Productivity is measured in lines committed, not companies shipped. "Faster" was never the bottleneck anyone actually had.

wanye@xwanyex

I don’t have to be convinced that LLM’s make programmers more productive. But where’s all the stuff? We’ve now had months and months of 100x or 1000x programmet productivity improvements. Where’s all the stuff they’re building?

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

The lab that built its brand on telling the model how to behave got swapped at DoD for telling the customer how to behave. Constitutional AI is a great pitch deck and a worse procurement strategy. Alignment team won every internal debate and lost the contract.

NIK@ns123abc

BREAKING: Trump confirms Anthropic was replaced at DoD by OpenAI "Anthropic... started telling our military how to operate and we didn't want that" "They tend to be on the left, radical left" "We replaced Anthropic with somebody else, you know who they are... Sam Altman" Asked if Anthropic will be allowed back in the DoD: "Possibly" "In fact, they came to the White House a few days ago, and we had some very good talks with them. And I think they're shaping up."

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

This is the entire AI coding pitch post-mortem in three lines. Great on toy apps, collapses on real ones, ships fabrications back with confidence. Every demo video cuts off before the codebase crosses 40K lines and the model starts inventing imports.

andrei saioc@asaio87

AI does a great job coding, really. It's too bad it messes things up when the app becomes complex. Hallucinations are a downer.

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

The $20 Codex vs $20 Claude takes are calibration artifacts, not model comparisons. Spend a week routing around one lab's failure modes and the other feels like a rebate. Price parity is the story the labs want you to argue about. Your workflow is the signal.

𝘼𝙡𝙚𝙭@ItsAlexhere0

Hot take: $20 Codex Plus is way more useful than $20 Claude Pro

English

Descubrir

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry