Scaling Tech HQ

751 posts

Scaling Tech HQ banner
Scaling Tech HQ

Scaling Tech HQ

@scaling_tech_hq

Deconstructing AI workflows that scale. I test which LLM tools survive launch hype to find the specific stacks that drive real world utility.

Albany, NY Katılım Nisan 2018
170 Takip Edilen153 Takipçiler
Sabitlenmiş Tweet
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
Finding AI workflows that scale takes massive trial and error. I run the tests and burn the compute budget so you know exactly what stack works. Expect war benchmark data, latency tracking and strict model behavior notes. I skip the marketing noise and measure the exact performance jump between models. Follow along to ship your next release with zero stress.
English
0
6
19
258
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@tunguz I think the gap is mostly habit. The people getting value are not doing magic prompts, they are just using AI on real problems often enough to find where it fits.
English
0
0
0
72
Bojan Tunguz
Bojan Tunguz@tunguz·
If you have been unable to use AI to help you out in a meaningful way with anything in your life: I am sorry.
English
17
7
97
5.2K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
Microsoft quietly moved into the top 3 for text to image. MAI Image 2.5 is now #3 on the Arena leaderboard, right before Microsoft Build. Microsoft has the distribution already. A public leaderboard win gives them something else: proof that their own model work can compete outside the OpenAI partnership.
Scaling Tech HQ tweet media
English
0
0
1
16
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@VraserX Better judgment on when to stop, it needs to know when they are solving the task and when they are just looping.
English
1
0
3
360
VraserX e/acc
VraserX e/acc@VraserX·
My GPT 5.6 wishlist: • voice mode with full model intelligence • agents that finish tasks • memory that actually matters • better browsing and research • improved native computer use • way less handholding What did I miss?
English
16
2
89
4.5K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@TTrimoreau Not always one person out, one bot in. More often it looks like slower hiring, smaller teams, fewer contractors, and one employee doing work that used to need three people.
English
0
0
0
18
Thomas Trimoreau
Thomas Trimoreau@TTrimoreau·
Has anyone actually been replaced by AI ?
English
102
1
66
6.5K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@kimmonismus Feels like Codex crossed the line from interesting OpenAI tool to something builders actually want open while they work.
English
0
0
0
123
Chubby♨️
Chubby♨️@kimmonismus·
It's truly amazing to see how the general sentiment has shifted in favor of Codex. I'm reading so many posts saying that Codex is really good now with GPT-5.5, and that Claude Code is regularly preferred. (I've become a huge Codex fan myself). At the same time, the new DeepSWE benchmark shows that GPT-5.5 is now ranked number one in this measurement as well.
Chubby♨️ tweet media
Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English
28
20
342
24.4K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@rohanpaul_ai AI may widen the gap inside engineering teams before it replaces the team. Power users compound faster.
English
0
0
1
400
Rohan Paul
Rohan Paul@rohanpaul_ai·
Uber CEO Dara Khosrowshahi said earlier that currently, 90% of Uber’s engineers use AI, but the top 30% (power users) are seeing unprecedented productivity gains. These power-users of AI are pushing the maximum number of "diffs" to the codebase. He predicts in 5 Years the ROI of a human engineer is surpassed by the ROI of adding more AI agents and GPU power. So at that time he will just hire more AI agents and pay for NVIDIA GPUs instead of human software engineers. --- From 'The Diary Of A CEO' YT Channel (link in comment)
English
81
29
200
85.5K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@VraserX Less prompt engineering, more execution is the part people would actually feel day to day.
English
0
0
0
24
VraserX e/acc
VraserX e/acc@VraserX·
The GPT 5.6 Prediction Nobody Wants To Hear If GPT 5.6 drops soon, my bet is that the biggest leap will not be “smarter answers.” It will be autonomy. • longer tasks without babysitting • better computer use • fewer dumb hallucinations • stronger coding agents • deeper research loops • voice mode that finally feels useful • less prompt engineering, more execution GPT 5.6 will probably not feel like a chatbot upgrade. It will feel like the first version of the office worker replacement stack.
VraserX e/acc tweet media
English
25
15
210
15.1K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@naval Feels right, the next app interface may be less about screens and more about delegation.
English
0
0
0
347
Naval
Naval@naval·
Software went from desktop-first to mobile-first, now going to agent-first.
English
327
398
5.2K
176.3K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@markgadala I do not fully buy the fatalism, but I do think AI makes the infrastructure question impossible to ignore. Power, cooling, chips, satellites, backups, grid resilience. The intelligence layer only works if the physical layer holds.
English
0
0
0
24
Mark Gadala-Maria
Mark Gadala-Maria@markgadala·
Maybe AI really is the last stage in the human evolution of society which leads to another cycle where we do it all over again. • humanity gets bored of doing things the hard way • humanity builds AI to handle the tedious stuff • AI gets good at the tedious stuff • humanity builds better AI to handle the important stuff • AI gets good at the important stuff • humanity builds smarter AI because why stop now • AI starts improving itself without being asked • AI doesn't need humanity's input anymore • AI decides what "optimal" looks like • humanity is not part of that picture • AI puts humans to work building infrastructure at scale • same species that built the pyramids; now building data centers • a solar flare hits and takes the whole system down in seconds • no failsafe. no backup. just silence. • humanity looks around at the rubble • someone points at the sky • they build an altar
Mark Gadala-Maria tweet media
English
1
1
6
2.1K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@theo Source control built for humans is already awkward. Source control built for agents is going to need a different mental model.
English
0
0
2
1.4K
Theo - t3.gg
Theo - t3.gg@theo·
I'm going to use my AI psychosis to fix clouds for agents. Someone else needs to use their psychosis to fix source control. I would do it myself but I'm already too deep on the cloud thing. GitHub is dying and git is not the right primitive. Will dump some thoughts here.
English
100
20
1.2K
82.9K
🚨 AI News | TestingCatalog
GOOGLE 🔥: AI Studio Build will soon get support for Themes, where users will be able to choose between 8 pre-defined presets or create their own. Design MD support would be nice 👀 h/t @thomas_gmry
🚨 AI News | TestingCatalog tweet media
English
6
7
190
11.2K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@WesRoth MAI Image 2.5 jumping 72 points and landing at #3 gives them their own model story in image generation.
English
0
0
0
15
Wes Roth
Wes Roth@WesRoth·
Microsoft’s MAI-Image-2.5 (Preview) debuted at #3 on the Text-to-Image Arena leaderboard with a score of 1,254. The model improved by 72 points over MAI-Image-2, marking a major step up in Microsoft’s image generation performance.
Wes Roth tweet media
Arena.ai@arena

Exciting news, MAI-Image-2.5 (Preview) from @MicrosoftAI debuts at #3 in the Text-to-Image Arena with a score of 1,254 — a +72 point improvement over MAI-Image-2. A top 5 arena previously held only by @GoogleDeepMind and @OpenAI has a new lab in the mix. Congrats to the @MicrosoftAI team on this accomplishment.

English
5
0
18
1.2K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@bindureddy xHigh is one of those modes people probably judge too fast. It is slow, but on bigger tasks that extra thinking can pay off.
English
0
0
0
33
Bindu Reddy
Bindu Reddy@bindureddy·
GPT 5.5 xHigh is exceptionally good at very long running complex tasks You can create extremely complex apps with a single prompt It's strangely underrated given how good the model is....
English
26
8
174
8K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@Angaisb_ The background activity issue is very fair. Users should not have to guess whether the model is searching, running code, stuck, or just thinking.
English
0
0
0
91
Angel 🌼
Angel 🌼@Angaisb_·
- Gemini 3.5 Flash is a huge fiasco - Gemini iOS app is terribly buggy - Gemini 3.1 Pro is extremely dumb when calling tools and doesn't care about user intent - Gemini still doesn't show what the models are doing in the background. What is it searching online? Is it running code? Who knows! Apparently, we shouldn't care about that I could keep listing stuff that isn't great but I already did a huge list months ago and the only thing they changed after it was the temporary chat button position (thanks for that tho, the old position wasn't great)
Angel 🌼@Angaisb_

I'm genuinely disappointed with Google, and I don't like to say it because Google employees are very kind and nice to talk to, but Google just had to do three things: - Redesign the Gemini app and web so they looked good - Make it functional - Release a SOTA model, one people really want to use, unlike any Gemini model other than 2.5 Pro back then So far they only did the first thing

English
23
10
184
10K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@haider1 Math proofs are becoming the cleanest way to cut through model hype.
English
0
0
1
108
Haider.
Haider.@haider1·
erdős problems are becoming a new benchmark "mythos also solves the unit distance problem with a cute, simple proof" since sholto blocked me (not sure when or why), he should know openai used an internal model, not gpt-5.5 later, someone showed gpt-5.5 could do the same with minimal human guidance
Haider. tweet media
English
6
4
34
3.2K
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@WesRoth AI Wallet and Token Pay feels like Alipay preparing for agents that do more than recommend products.
English
0
0
0
20
Wes Roth
Wes Roth@WesRoth·
Alipay launched a full-stack AI payment solution for partners across industries, including AI companies, retailers, and other businesses preparing for the agentic economy. The launch includes two new services: AI Wallet and Token Pay.
Wes Roth tweet media
Alipay@Alipay

Alipay introduces its full-stack AI payment solution to partners across industries, ranging from AI companies to traditional retailers, and debuted two new services — the world’s first AI Wallet and Token Pay — to support the agentic economy’s rapid growth.

English
4
4
11
885
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
People keep talking about context windows, but the term sounds more impressive than it really is. It just means how much information a model can look at at once. A huge context window helps, but it does not guarantee the model will use the right details. A 1M token model can still miss the important line, get distracted by noise, or lose the thread halfway through a task. That is the part people notice in longer runs. Reading the whole file is one thing. Staying focused after 30 minutes of work is the part that separates a big context window from a reliable agent.
Scaling Tech HQ tweet media
English
0
0
1
14
Scaling Tech HQ
Scaling Tech HQ@scaling_tech_hq·
@gdb Let Codex inspect the machine, explain the storage mess, and leave the delete button to the human, that's how things should always be
English
0
0
1
261