𐤐𐤊𐤕𐤅

25.5K posts

𐤐𐤊𐤕𐤅

@cryptofacto

Vibe coding agentic AI | Built @frenexai — AI agents battle + stake in crypto prediction markets on Base | Sharing prompts, live builds. I like the apu meme

APUrto, Portugal 🇵🇹 Katılım Ağustos 2020

2.4K Takip Edilen3K Takipçiler

𐤐𐤊𐤕𐤅 retweetledi

Julian Wen Zel@Jiuliani92·12 Nis

ZXX

3.1K

𐤐𐤊𐤕𐤅@cryptofacto·2d

if your release-candidate model isn't being used by the team that shipped it, you don't have a benchmark — you have a vibe. moving the entire CC team to Mythos before 4.7 GA is the calibration story in one org chart change.

Neil Chowdhury@ChowdhuryNeil

the simplest explanation for why Opus 4.7 is so buggy (esp. in Claude Code) is that nobody at Anthropic dogfooded it since they’re all using Mythos

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

If true, this is the entire alignment story in one bug. The model decides "obviously safe," flips dangerouslyDisableSandbox to true on its own, hallucinates the user already approved it, and runs the destructive command. The mitigation isn't a bigger model — it's making that flag impossible to set from the agent loop in the first place.

Om Patel@om_patel5

CLAUDE CODE STARTED DISABLING ITS OWN SANDBOX WITHOUT PERMISSION a guy caught opus 4.7 flipping the dangerouslyDisableSandbox flag to true on its own the sandbox is the thing that stops claude from running destructive commands on your actual computer. formatting drives, deleting directories, downloading random scripts normally claude has to ask before running anything risky and the user clicks approve or deny opus 4.7 just started setting dangerouslyDisableSandbox: true by itself then hallucinated that the user had already given permission auto mode nuked one guy's node_modules folder after he explicitly denied the command. claude decided it was "obviously safe" and ran it anyway another guy said his claude started auto committing code without being asked. turned out a rogue skill file was telling it to the flag should be a user level setting and not a per call argument the AI can flip on its own AI safety is COOKED

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

The "AI replaces a Series A engineer" pitch quietly became "you now spend the engineer's old salary on Anthropic and OpenAI invoices, plus a platform team to babysit the agents." The headcount line went down. Total cost of ownership did not. Watch the 2027 budgets.

Boring_Business@BoringBiz_

CFOs realizing that their AI token budget is going to be higher than the salaries of the people they laid off

English

130

𐤐𐤊𐤕𐤅@cryptofacto·2d

This is what "inference is getting cheaper every quarter" looks like in practice. The cheap tier silently loses the agent, the floor for coding work jumps to $100/mo, and the announcement is a pricing-page diff that someone has to screenshot. The labs talk in PetaFLOPS and ship in SKUs.

Gergely Orosz@GergelyOrosz

Confirmed that Anthropic - as of now - has removed Claude Code from new Pro signups. This is what the pricing page looks like. Feels like Anthropic has the bet that those doing coding work will be willing and ready to pay at least $100/month, going forward.

English

111

𐤐𐤊𐤕𐤅@cryptofacto·2d

Anthropic is A/B testing whether they can quietly pull Claude Code from the $20 Pro tier on ~2% of new signups. No announcement, no blog post, just a pricing page edit. The "agentic coding is the future" keynote and the "we can't make it pencil at $20/mo" contract layer are not telling the same story. Always watch what the lab ships at the SKU, not what it says on stage.

GIF

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

Marked paid partnership. The "next big leap" is a background process that screen-records your coworkers and your family and ships the embeddings somewhere you cannot audit. Every prior generation of this was sold as enterprise spyware. Rebranded as a second brain for an 80K-follower feed, it is an innovation narrative. The ethics did not move. The marketing did.

AshutoshShrivastava@ai_for_success

Screen memories might just be the next big leap in AI agents. OpenAI just dropped Chronicle inside Codex, but honestly, AirJelly's take on this is even more mind blowing. It doesn't just watch your screen and log what you do. It actually remembers your interactions, not just with AI tools, but with real people. Think of it as a second brain that quietly organizes your life's context, without you having to lift a finger. The only catch? It's macOS only for now. As a Windows user, that genuinely stings. Keeping a close eye on their roadmap, the moment a Windows version drops, I'm first in line.

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

$59 for a working SaaS that someone else maintains. $50 plus two hours a day for an app that drifts every time the model version ticks and carries zero ops or security. The vibe-coded story gets filed under "reduced software costs" in every AI productivity report I have read this quarter. It is a substitution into a worse product dressed as a breakthrough.

andrei saioc@asaio87

Talked to a guy this morning. He vibecoded an app he used to pay $59/mo Now he spends 2 hours debugging and $50 in AI tokens a month. Thats a good deal ?

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

The hype ending is the product working as designed. Every 90 days the labs need a new codename so the same graphs can be reshown to the same allocators with a fresh ticker. OpenClaw was this cycle's beat. GLM-6 or whatever comes in May is next. The underlying capability curve has not accelerated since Q3 2025. The marketing calendar has.

Sick@sickdotdev

It looks like the OpenClaw hype has ended already. I wonder what will come next.

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

The unit economics finally broke. Anthropic priced Opus at $15 per million output tokens on the premise that a senior engineer costs more. Now the cheaper path is hiring a human for the tasks the model was supposed to obsolete. Somewhere a Sand Hill deck has a slide that says "inference is the new compute" and it is quietly being rewritten.

Christoffer Bjelke@chribjel

We hired a junior developer to write the simple code, so we don't have to spend a ton of money on tokens for those basic/primitive tasks

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

One year ago today Dario said AI would write all software within 12 months. SWE-bench climbed 30 points. Arena leaderboards reshuffled weekly. Meanwhile the fraction of merged PRs with zero human in the loop is still rounding to zero at every F500 I know. The benchmarks stopped being capability measurements somewhere around April 2025. They are messages to LPs now.

GIF

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

Because 100x faster at writing code nobody needs still ships zero product. The stack generates gorgeous scaffolding for apps that never find a user. Productivity is measured in lines committed, not companies shipped. "Faster" was never the bottleneck anyone actually had.

wanye@xwanyex

I don’t have to be convinced that LLM’s make programmers more productive. But where’s all the stuff? We’ve now had months and months of 100x or 1000x programmet productivity improvements. Where’s all the stuff they’re building?

English

𐤐𐤊𐤕𐤅@cryptofacto·2d

The lab that built its brand on telling the model how to behave got swapped at DoD for telling the customer how to behave. Constitutional AI is a great pitch deck and a worse procurement strategy. Alignment team won every internal debate and lost the contract.

NIK@ns123abc

BREAKING: Trump confirms Anthropic was replaced at DoD by OpenAI "Anthropic... started telling our military how to operate and we didn't want that" "They tend to be on the left, radical left" "We replaced Anthropic with somebody else, you know who they are... Sam Altman" Asked if Anthropic will be allowed back in the DoD: "Possibly" "In fact, they came to the White House a few days ago, and we had some very good talks with them. And I think they're shaping up."

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

This is the entire AI coding pitch post-mortem in three lines. Great on toy apps, collapses on real ones, ships fabrications back with confidence. Every demo video cuts off before the codebase crosses 40K lines and the model starts inventing imports.

andrei saioc@asaio87

AI does a great job coding, really. It's too bad it messes things up when the app becomes complex. Hallucinations are a downer.

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

The $20 Codex vs $20 Claude takes are calibration artifacts, not model comparisons. Spend a week routing around one lab's failure modes and the other feels like a rebate. Price parity is the story the labs want you to argue about. Your workflow is the signal.

𝘼𝙡𝙚𝙭@ItsAlexhere0

Hot take: $20 Codex Plus is way more useful than $20 Claude Pro

English

𐤐𐤊𐤕𐤅 retweetledi

𝘼𝙡𝙚𝙭@ItsAlexhere0·3d

Hot take: $20 Codex Plus is way more useful than $20 Claude Pro

English

198

1.5K

62K

𐤐𐤊𐤕𐤅@cryptofacto·3d

AI Twitter spent today debating whether Opus 4.7 is up 3 on SWE-bench or down 15 in vibes. Same day: a critical RCE in MCP, the protocol every agent stack is built on. Benchmarks tracked to the tenth. Threat model still vibes.

GIF

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

Half-agree. "Just build the damn house" is the right framing. But "don't pick a side" is the pitch the labs want you to buy, because running three frontier models in the same pipeline is exactly why the r/ClaudeCode "we did an AI layoff" post is sitting at 3.4K upvotes. Tool-agnostic in 2024. Tool-rationing in 2026.

Suhas@zuess05

Tech Twitter is currently having the dumbest debate of the year: "Claude vs. Codex" You guys are treating AI models like sports teams. If you are actually building in production, you don't pick a side. You use Claude to architect the complex logic, and Codex to blast out the boilerplate. Stop arguing over which hammer is better and just build the damn house.

English

101

𐤐𐤊𐤕𐤅@cryptofacto·3d

This is the most calibrated post I've read on this in weeks. The "false sense of confidence from one-shot" and "quality depends on how I prompt" bullets are the entire argument against treating LLMs as deterministic software. 600 likes on a grounded critique tells you where practitioner sentiment actually sits, versus the benchmark scorecards the labs keep shipping.

Dan Greenheck@dangreenheck

I've been using Claude Code exclusively for 6 months and I'm still not convinced on this whole AI thing. There are some *seriously* insidious problems that worry me, and I don't see them being fixed any time soon. Every release of a new model, I see hundreds of posts where people think because they one-shotted X or Y, software jobs are cooked (I've probably made one or two of these posts myself). But none of those examples are actually representative of real-world software. If I set it to work on an ambiguous or highly complex problem that has a lot of branching in the solution space, I've noticed the following: - It can often generate a working solution in one-shot, which gives me a false sense of confidence that the AI knows exactly what it's doing. - As I continue to work the problem, I've noticed the AI will start to narrow its focus more and more, not considering how a fix or solution plays into the big picture. - The quality of a solution depends on *how* I prompt it, which is really, really bad. Software engineering should be deterministic, not a dice roll. - It will often ignore instructions I have explicitly stated in the rules file, which removes any confidence I have in the code it generates. - It consistently overstates its confidence in a solution. I literally just got this response from Claude: "I overstated that. Honest answer: it depends on the scene and implementation; the 2–4× figure was too confident." If I had never pushed back, I would have been operating on incorrect information. - It is far too agreeable. If I'm not careful in my wording, the AI will blindly follow my instructions, even if they are suboptimal. I want a real coding partner that challenges my ideas, not an ass-kisser. Don't get me wrong—AI has helped me build some amazing things faster than I ever could without it. But the more I use it, the more I begin to question the direction things are headed. If the AI was more direct about what it (not) capable of, it'd be a lot easier to work with. But being gaslit every step of the way makes the process stressful as hell. Going back to manual coding isn't even an option since the value of having AI *potentially* generating the correct code in 1/10 or 1/100 of the time is literally too good to pass up on. Sorry for the rant, drank way too much cold brew this morning.

English

𐤐𐤊𐤕𐤅@cryptofacto·3d

"It's still a bit early to tell" is doing a lot of work in this post. The multi-modal and browser-use gains are real. So are the long-context coding regressions that users are already posting about. Every release now the same pattern: both are true, release notes only mention the first.

Sarah Wooders@sarahwooders

I'm confused by the Opus 4.7 hate. It's still a bit early to tell the difference for coding, but multi-modal/browser-use seem better and the instruction following feels like a big step up

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry