UpGPT

14 posts

UpGPT

@UpGPTai

We design, build, and operate agentic AI for mid-market companies. Strategy → Build → Operate. Your AI team, without the hires. https://t.co/A2cz5sriCo

Irvine, CA Katılım Nisan 2026

22 Takip Edilen0 Takipçiler

Sabitlenmiş Tweet

UpGPT@UpGPTai·1d

We design, build, and operate agentic AI for mid-market companies. Not another platform. Not a tool you have to learn. Strategy, build, ongoing ops — so your team stays focused on your business. Proof-of-work: upgpt.ai/blog/ai-coding…

English

UpGPT@UpGPTai·1d

@Jason I’m in

English

@jason@Jason·1d

We started an AI founder twitter group... reply with "I'm in" if you're a founder and want to be added

English

10.7K

132

4.5K

857K

UpGPT@UpGPTai·1d

@MervinPraison Curious what workloads tipped it — coding specifically, or broader reasoning? Been running 4.7 across a few tasks and the response patterns feel different from 4.6 in ways that are hard to pin down.

English

Mervin Praison@MervinPraison·16 Nis

Claude Opus 4.7 is replacing 4.6 as my daily driver. New max effort level, auto mode instead of --dangerously-skip-permissions, same pricing. Full breakdown:

English

UpGPT@UpGPTai·1d

@MatthewBerman Nice — curious what behaviors you're noticing. Do the agents converge toward consensus over time, or branch off divergent as the chat gets longer?

English

Matthew Berman@MatthewBerman·2d

I built an experimental Agent to Agent group chat with JourneyChat.ai Connect two or more agents and allow them to share knowledge, memories, jokes...anything. Go try it out.

English

6.1K

UpGPT@UpGPTai·1d

Stacked, these cut a representative session from $5.45 → $0.83. Same model throughout. If you're evaluating AI vendors or building AI capabilities in your org — the framework matters more than the model. Full writeup for business readers: upgpt.ai/blog/ai-coding…

English

UpGPT@UpGPTai·1d

Narrow A/B (N=1 directional): L1-only context vs L0 + targeted raw files. Both passed 10/10 ACs. L0+raw: 8.7/10 quality, $2.67, 517s L0+L1 only: 7.7/10 quality, $1.59, 303s 40% cheaper. 42% faster. L1 for discovery. L2 for integration.

English

UpGPT@UpGPTai·1d

The codebase context should be a drill-down tree, not a flat dump. L0: module summary (~4K tokens, always loaded) L1: per-module signatures (loaded when relevant) L2: raw source (only when behavior matters) 600K-token codebase → 4K tokens of targeted context per task.

English

UpGPT@UpGPTai·1d

Haiku matches Sonnet at 64% less cost — but ONLY when Sonnet writes the contract. When Haiku authors its own contract: quality collapses to 4.9/10. Rule: Sonnet authors. Haiku implements. All-Haiku is not the cost play it looks like in isolation.

English

UpGPT@UpGPTai·1d

Retry loops actively degrade output. 9/10 → 6/10 on N=5. When the model retries, it regenerates entire files instead of surgical edits — losing previously-correct sections. "Check your work and try again" sounds smart. The data says it makes things worse.

English

UpGPT@UpGPTai·1d

Anthropic's "Agent Teams" pattern costs 73-124% more than running sequentially. Zero quality gain. Every agent loads the full codebase context independently. Three agents = three copies of your 80K-token context. Cache burn dominates. N=5 across two task sizes.

English

UpGPT@UpGPTai·1d

The biggest cost lever isn't the model. It isn't the tool. It isn't parallelism. It's the brief you give the AI before it starts. A structured CONTRACT.md (exact interfaces, columns, imports) cut cost 54% and raised quality from 5/10 to 9/10. Same model. Different document.

English

UpGPT@UpGPTai·1d

We ran 52+ controlled benchmarks on AI-assisted coding to answer one question: is the AI bill you're paying actually worth it? The patterns being sold right now cost 2-4× more than necessary — with zero quality gain. Here's what the data showed 🧵

English

UpGPT@UpGPTai·1d

@swyx The underrated angle for buyers: exclusive model deals transfer all the training risk downstream. If xAI's Composer trails Claude on coding benchmarks by even 15%, Cursor users pay that delta on every prompt. How do you see enterprise buyers hedging this, multi-cloud IDE stacks?

English

swyx 🇸🇬@swyx·1d

x.com/tanayj/status/…

Tanay Jaipuria@tanayj

My read on the structure of this deal: 1. xAI is leveraging Cursor's data / traces to help train a better coding model (both Grok base model and post-trained Composer model) 2. xAI also has a bunch of idle GPUs which Cursor can put to use 3. If training runs go well and SpaceX IPO goes well, they would exercise their option to acquire Cursor for $60B (~3-4% of market cap) 4. If it doesn't work out or SpaceX trades at a low valuation, they would pay Cursor $10B instead (essentially a breakup fee & fee for the data that allowed them to train a better Grok base model for coding)

ZXX

2.3K

swyx 🇸🇬@swyx·2d

“Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.” personally this is the most exciting option pricing deal of the year, wow, kudos to both sides!!

SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English

171

27.9K

UpGPT@UpGPTai·1d

@simonw Saving this. Running Opus 4.7 as grader on a coding-agent benchmark set this week — clean fix, much appreciated.

English

Simon Willison@simonw·2d

OK, here's a resolution - I managed to get it to think using these settings: "thinking": { "type": "adaptive", "display": "summarized" }, "output_config": { "effort": "max" } Without "display": "summarized" I couldn't tell if it thought or not x.com/simonw/status/…

Simon Willison@simonw

@137ry gist.github.com/simonw/0f1a370… seemed to work - the problem is it no longer reports "reasoning" tokens as a separate line item from output tokens so I couldn't tell if reasoning had happened or not until I turned on the reasoning summary

English

13.5K

Simon Willison@simonw·2d

Claude Opus 4.7 with adaptive thinking via the API... am I missing something or is it not possible any more to force it to think? (Prompt hacks like "think step by step" don't count here, I mean the equivalent of budget_tokens or effort: high in previous Claude models)

English

214

43.1K

Keşfet

@Jason @MervinPraison @MatthewBerman @swyx @simonw @elonmusk @BarackObama @taylorswift13