Scaling Tech HQ

751 posts

Scaling Tech HQ

@scaling_tech_hq

Deconstructing AI workflows that scale. I test which LLM tools survive launch hype to find the specific stacks that drive real world utility.

Albany, NY Katılım Nisan 2018

170 Takip Edilen153 Takipçiler

Sabitlenmiş Tweet

Scaling Tech HQ@scaling_tech_hq·28 Nis

Finding AI workflows that scale takes massive trial and error. I run the tests and burn the compute budget so you know exactly what stack works. Expect war benchmark data, latency tracking and strict model behavior notes. I skip the marketing noise and measure the exact performance jump between models. Follow along to ship your next release with zero stress.

English

258

Scaling Tech HQ@scaling_tech_hq·6h

The jump comes when you hand it an actual task from your day.

Bojan Tunguz@tunguz

If you have been unable to use AI to help you out in a meaningful way with anything in your life: I am sorry.

English

Scaling Tech HQ@scaling_tech_hq·6h

@tunguz I think the gap is mostly habit. The people getting value are not doing magic prompts, they are just using AI on real problems often enough to find where it fits.

English

Bojan Tunguz@tunguz·7h

If you have been unable to use AI to help you out in a meaningful way with anything in your life: I am sorry.

English

5.2K

Scaling Tech HQ@scaling_tech_hq·7h

Microsoft quietly moved into the top 3 for text to image. MAI Image 2.5 is now #3 on the Arena leaderboard, right before Microsoft Build. Microsoft has the distribution already. A public leaderboard win gives them something else: proof that their own model work can compete outside the OpenAI partnership.

English

Scaling Tech HQ@scaling_tech_hq·7h

@VraserX Better judgment on when to stop, it needs to know when they are solving the task and when they are just looping.

English

360

VraserX e/acc@VraserX·7h

My GPT 5.6 wishlist: • voice mode with full model intelligence • agents that finish tasks • memory that actually matters • better browsing and research • improved native computer use • way less handholding What did I miss?

English

4.5K

Scaling Tech HQ@scaling_tech_hq·7h

@TTrimoreau Not always one person out, one bot in. More often it looks like slower hiring, smaller teams, fewer contractors, and one employee doing work that used to need three people.

English

Thomas Trimoreau@TTrimoreau·9h

Has anyone actually been replaced by AI ?

English

102

6.5K

Scaling Tech HQ@scaling_tech_hq·7h

@kimmonismus Feels like Codex crossed the line from interesting OpenAI tool to something builders actually want open while they work.

English

123

Chubby♨️@kimmonismus·10h

It's truly amazing to see how the general sentiment has shifted in favor of Codex. I'm reading so many posts saying that Codex is really good now with GPT-5.5, and that Claude Code is regularly preferred. (I've become a huge Codex fan myself). At the same time, the new DeepSWE benchmark shows that GPT-5.5 is now ranked number one in this measurement as well.

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

342

24.4K

Scaling Tech HQ@scaling_tech_hq·7h

@rohanpaul_ai AI may widen the gap inside engineering teams before it replaces the team. Power users compound faster.

English

400

Rohan Paul@rohanpaul_ai·13h

Uber CEO Dara Khosrowshahi said earlier that currently, 90% of Uber’s engineers use AI, but the top 30% (power users) are seeing unprecedented productivity gains. These power-users of AI are pushing the maximum number of "diffs" to the codebase. He predicts in 5 Years the ROI of a human engineer is surpassed by the ROI of adding more AI agents and GPU power. So at that time he will just hire more AI agents and pay for NVIDIA GPUs instead of human software engineers. --- From 'The Diary Of A CEO' YT Channel (link in comment)

English

200

85.5K

Scaling Tech HQ@scaling_tech_hq·7h

@VraserX Less prompt engineering, more execution is the part people would actually feel day to day.

English

VraserX e/acc@VraserX·14h

The GPT 5.6 Prediction Nobody Wants To Hear If GPT 5.6 drops soon, my bet is that the biggest leap will not be “smarter answers.” It will be autonomy. • longer tasks without babysitting • better computer use • fewer dumb hallucinations • stronger coding agents • deeper research loops • voice mode that finally feels useful • less prompt engineering, more execution GPT 5.6 will probably not feel like a chatbot upgrade. It will feel like the first version of the office worker replacement stack.

English

210

15.1K

Scaling Tech HQ@scaling_tech_hq·7h

@naval Feels right, the next app interface may be less about screens and more about delegation.

English

347

Naval@naval·7h

Software went from desktop-first to mobile-first, now going to agent-first.

English

327

398

5.2K

176.3K

Scaling Tech HQ@scaling_tech_hq·7h

@markgadala I do not fully buy the fatalism, but I do think AI makes the infrastructure question impossible to ignore. Power, cooling, chips, satellites, backups, grid resilience. The intelligence layer only works if the physical layer holds.

English

Mark Gadala-Maria@markgadala·7h

Maybe AI really is the last stage in the human evolution of society which leads to another cycle where we do it all over again. • humanity gets bored of doing things the hard way • humanity builds AI to handle the tedious stuff • AI gets good at the tedious stuff • humanity builds better AI to handle the important stuff • AI gets good at the important stuff • humanity builds smarter AI because why stop now • AI starts improving itself without being asked • AI doesn't need humanity's input anymore • AI decides what "optimal" looks like • humanity is not part of that picture • AI puts humans to work building infrastructure at scale • same species that built the pyramids; now building data centers • a solar flare hits and takes the whole system down in seconds • no failsafe. no backup. just silence. • humanity looks around at the rubble • someone points at the sky • they build an altar

English

2.1K

Scaling Tech HQ@scaling_tech_hq·7h

@theo Source control built for humans is already awkward. Source control built for agents is going to need a different mental model.

English

1.4K

Theo - t3.gg@theo·7h

I'm going to use my AI psychosis to fix clouds for agents. Someone else needs to use their psychosis to fix source control. I would do it myself but I'm already too deep on the cloud thing. GitHub is dying and git is not the right primitive. Will dump some thoughts here.

English

100

1.2K

82.9K

Scaling Tech HQ@scaling_tech_hq·7h

@testingcatalog @thomas_gmry AI Studio Build needs to make the default output look less generic, and themes are a quick way to get there.

English

308

🚨 AI News | TestingCatalog@testingcatalog·9h

GOOGLE 🔥: AI Studio Build will soon get support for Themes, where users will be able to choose between 8 pre-defined presets or create their own. Design MD support would be nice 👀 h/t @thomas_gmry

English

190

11.2K

Scaling Tech HQ@scaling_tech_hq·7h

@WesRoth MAI Image 2.5 jumping 72 points and landing at #3 gives them their own model story in image generation.

English

Wes Roth@WesRoth·9h

Microsoft’s MAI-Image-2.5 (Preview) debuted at #3 on the Text-to-Image Arena leaderboard with a score of 1,254. The model improved by 72 points over MAI-Image-2, marking a major step up in Microsoft’s image generation performance.

Arena.ai@arena

Exciting news, MAI-Image-2.5 (Preview) from @MicrosoftAI debuts at #3 in the Text-to-Image Arena with a score of 1,254 — a +72 point improvement over MAI-Image-2. A top 5 arena previously held only by @GoogleDeepMind and @OpenAI has a new lab in the mix. Congrats to the @MicrosoftAI team on this accomplishment.

English

1.2K

Scaling Tech HQ@scaling_tech_hq·7h

@bindureddy xHigh is one of those modes people probably judge too fast. It is slow, but on bigger tasks that extra thinking can pay off.

English

Bindu Reddy@bindureddy·8h

GPT 5.5 xHigh is exceptionally good at very long running complex tasks You can create extremely complex apps with a single prompt It's strangely underrated given how good the model is....

English

174

Scaling Tech HQ@scaling_tech_hq·7h

@Angaisb_ The background activity issue is very fair. Users should not have to guess whether the model is searching, running code, stuck, or just thinking.

English

Angel 🌼@Angaisb_·8h

- Gemini 3.5 Flash is a huge fiasco - Gemini iOS app is terribly buggy - Gemini 3.1 Pro is extremely dumb when calling tools and doesn't care about user intent - Gemini still doesn't show what the models are doing in the background. What is it searching online? Is it running code? Who knows! Apparently, we shouldn't care about that I could keep listing stuff that isn't great but I already did a huge list months ago and the only thing they changed after it was the temporary chat button position (thanks for that tho, the old position wasn't great)

Angel 🌼@Angaisb_

I'm genuinely disappointed with Google, and I don't like to say it because Google employees are very kind and nice to talk to, but Google just had to do three things: - Redesign the Gemini app and web so they looked good - Make it functional - Release a SOTA model, one people really want to use, unlike any Gemini model other than 2.5 Pro back then So far they only did the first thing

English

184

10K

Scaling Tech HQ@scaling_tech_hq·7h

@haider1 Math proofs are becoming the cleanest way to cut through model hype.

English

108

Haider.@haider1·8h

erdős problems are becoming a new benchmark "mythos also solves the unit distance problem with a cute, simple proof" since sholto blocked me (not sure when or why), he should know openai used an internal model, not gpt-5.5 later, someone showed gpt-5.5 could do the same with minimal human guidance

English

3.2K

Scaling Tech HQ@scaling_tech_hq·7h

If agents are going to act for users, payments need to become visible, permissioned, and easy to reverse when something goes wrong.

Wes Roth@WesRoth

Alipay launched a full-stack AI payment solution for partners across industries, including AI companies, retailers, and other businesses preparing for the agentic economy. The launch includes two new services: AI Wallet and Token Pay.

English

Scaling Tech HQ@scaling_tech_hq·7h

@WesRoth AI Wallet and Token Pay feels like Alipay preparing for agents that do more than recommend products.

English

Wes Roth@WesRoth·8h

Alipay@Alipay

Alipay introduces its full-stack AI payment solution to partners across industries, ranging from AI companies to traditional retailers, and debuted two new services — the world’s first AI Wallet and Token Pay — to support the agentic economy’s rapid growth.

English

885

Scaling Tech HQ@scaling_tech_hq·1d

People keep talking about context windows, but the term sounds more impressive than it really is. It just means how much information a model can look at at once. A huge context window helps, but it does not guarantee the model will use the right details. A 1M token model can still miss the important line, get distracted by noise, or lose the thread halfway through a task. That is the part people notice in longer runs. Reading the whole file is one thing. Staying focused after 30 minutes of work is the part that separates a big context window from a reliable agent.

English

Scaling Tech HQ@scaling_tech_hq·1d

@gdb Let Codex inspect the machine, explain the storage mess, and leave the delete button to the human, that's how things should always be

English

261

Greg Brockman@gdb·1d

Codex for finding space on your laptop:

BOOTOSHI 👑@KingBootoshi

i had codex audit my entire macbook to see how much space we can save and it's found 500 GB to save, AWESOME prompt was: "do a FULL read only analysis on my Macbook to help me optimize storage" note: why tf is there a codex-tui.log file that is 116gb ??????? WHAT ????

English

119

1.2K

158.5K

Keşfet

@tunguz @VraserX @TTrimoreau @kimmonismus @rohanpaul_ai @naval @markgadala @theo