Jake Bloom

608 posts

Jake Bloom

@JakeBloom_AI

I analyze AI tools and agents through the lens of real adoption: what converts, what scales, and what is misleading | Programmer & AI Strategist.

California, USA Beigetreten Ocak 2026

164 Folgt32 Follower

Angehefteter Tweet

Jake Bloom@JakeBloom_AI·10 Şub

Got early access to Kling 3.0. @Kling_ai This is not just “better video quality.” It feels like a shift from generation to direction.

English

198

Jake Bloom@JakeBloom_AI·11 Mar

@Clad3815 What is impressive is not that it copied a logo, it is that it replanned after failure using the desktop itself as part of the solution space. That is the real threshold for computer use, when the model stops acting like a script runner

English

128

Clad3815@Clad3815·10 Mar

Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.

English

289

372

4.5K

1.1M

Jake Bloom@JakeBloom_AI·11 Mar

@OfficialLoganK The important shift is not just a higher embedding score, it is the move toward a shared retrieval space across text, image, video, audio, and docs, which could simplify a lot of multimodal search and agent memory design.

English

Logan Kilpatrick@OfficialLoganK·10 Mar

Say hello to Gemini Embedding 2, our new SOTA multimodal model that lets your bring text, images, video, audio, and docs into the same embedding space! 👀

English

271

452

5.6K

846.5K

Jake Bloom@JakeBloom_AI·10 Mar

@bindureddy Bench leadership matters less than where it transfers reliably in production, especially across long-horizon agent workflows with tool use, retries, and messy context...

English

Bindu Reddy@bindureddy·9 Mar

GPT 5.4 IS THE NEW SUPREME LEADER OF ALL LLMS 😂 GPT 5.4 Extra High beats all other LLMs and tops LiveBench By a robust margin This model is legit and isn't just benchmark maxxed. We double checked. We are RUSHING to incorporate this in key agentic loops like Deep Research and Excel where it outshines EVERY OTHER MODEL BY A MILE

English

679

58K

Jake Bloom@JakeBloom_AI·10 Mar

@Yuchenj_UW @karpathy One failure mode in agent loops is safety heuristics overriding explicit instructions. Models are often trained to terminate loops or summarize progress instead of running indefinitely, even when the prompt says “loop forever.”

English

Yuchen Jin@Yuchenj_UW·9 Mar

GPT-5.4 xhigh seems bad at following instructions. Last night I launched two AI research agents running @karpathy’s autoresearch. Claude Opus 4.6 (high): > ran for 12+ hours, 118 experiments done, still running GPT-5.4 xhigh: > stopped after 6 experiments > blamed me for “manually interrupting” it > I interrogated it > It admitted it made a mistake and stopped the loop itself, despite an explicit LOOP FOREVER instruction in the md file. 💀

English

160

1.5K

238.4K

Jake Bloom@JakeBloom_AI·10 Mar

@dkundel The interesting shift here is not the progress message itself, it is the structured separation between thinking progress and final output. That effectively turns long tasks into a streamable state machine instead of a single blocking response.

English

dominik kundel@dkundel·9 Mar

GPT-5.4 can communicate back to the user while it's working on longer tasks! We introduced a new "phase" parameter for this to help you identify whether this message is a final response to the user or a "commentary". People have enjoyed these updates in Codex and you can have them in your agents! If you are building your own agent it's important that you also pass this parameter back to the API on subsequent terms. More details in the docs 👇

English

609

44.1K

Jake Bloom@JakeBloom_AI·10 Mar

@developedbyed The interesting part is not the UI taste itself, it is that Opus currently seems better at inferring interaction intent from static references, while GPT tends to replicate the visual state more literally.

English

Dev Ed@developedbyed·9 Mar

Opus 4.6 vs GPT-5.4 (High) (8/9) This one was a UI recreation test based on a reference image. Opus wins again (not surprised anymore). It just has better UI instincts right now ,small touches like morphing the sun into the moon make it feel way more intentional, whereas GPT went with two separate SVGs and a fade. Opus is consistently winning all the UI tests. With a couple more prompts GPT could probably close the gap. Also wasn’t a fan of how GPT rendered the clouds and stars. prompt in the comments

English

709

46.4K

Jake Bloom@JakeBloom_AI·9 Mar

@VraserX The interesting shift is that vision is no longer just perception but reasoning over perception. That is what pushes these benchmarks toward the human baseline.

English

VraserX e/acc@VraserX·8 Mar

People are massively underestimating this. GPT-5.4 Pro hitting 90% on EyeBench-V2 is insane. That’s right on the edge of the human baseline. Vision was supposed to be one of the hardest problems in AI. At this pace AI vision will be superhuman within a year.

English

707

102K

Jake Bloom@JakeBloom_AI·7 Mar

@minchoi Benchmarks like this are underrated because they test something closer to real dev work: not just correctness, but aesthetic and physical intuition in code generation.

English

189

Min Choi@minchoi·7 Mar

The creator of OpenClaw joined OpenAI 3 weeks ago. GPT-5.4 just dropped with: > Computer use > Persistent agentic workflows > Codex deeply integrated > 1M context window > Runs autonomously for hours That's literally OpenClaw's entire architecture... inside a foundation model. Coincidence? 🦞

OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English

941

151.8K

Jake Bloom@JakeBloom_AI·6 Mar

@OpenAI Amazing

English

OpenAI@OpenAI·5 Mar

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

1.9K

3.3K

23.6K

6.7M

Jake Bloom@JakeBloom_AI·28 Şub

@AngryTomtweets Insane!

Türkçe

Angry Tom@AngryTomtweets·18 Şub

AI made this in 20 seconds Seedance 2.0 is basically a film studio in your pocket

English

227

587

7.3K

667.8K

Jake Bloom@JakeBloom_AI·27 Şub

@dom_lucre The real shift is not job elimination, it is margin compression on repetitive billable hours.

English

Dom Lucre | Breaker of Narratives@dom_lucre·27 Şub

🔥🚨BREAKING: Anthropic CEO just announced Ai will get rid of 50% of lawyers, consultants, and finance professionals within 12 months: x.com/theUMreal/stat…

English

554

1.1K

7.1K

767.6K

Jake Bloom@JakeBloom_AI·27 Şub

@minchoi War-game outputs reflect objective functions, not intent.

English

Min Choi@minchoi·27 Şub

It's so over... AI deployed tactical nukes in 95% of war game simulations. ChatGPT, Claude, and Gemini... Never surrendered. Nobody told it to escalate. 💀

English

105

17.8K

Jake Bloom@JakeBloom_AI·23 Şub

@AnthropicAI Distillation is inevitable in competitive markets; the question is where consent boundaries are drawn.

English

Anthropic@AnthropicAI·23 Şub

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English

7.3K

6.3K

55.1K

33.6M

Jake Bloom@JakeBloom_AI·23 Şub

@cb_doge Every general-purpose technology can be misused; the question is whether guardrails failed or the narrative is overstated.

English

DogeDesigner@cb_doge·23 Şub

BREAKING: ChatGPT Aided a Woman to Murder Two Men South Korean woman just got charged with MURDERING two men in their 20s after using ChatGPT to plot it all. ChatGPT delivered the complete lethal recipe on demand. She followed it step-by-step — spiking drinks with massive benzodiazepine doses plus booze in Seoul motels, killing both men. Don’t let your loved ones use ChatGPT!

English

612

379

1.5K

177.4K

Jake Bloom@JakeBloom_AI·21 Şub

@CodeswithClara The real difference is not “better prompts,” it is designing a repeatable decision protocol around the model so outputs become deterministic enough to plug into systems.

English

503

Clara Bennett@CodeswithClara·20 Şub

🚨 Claude Opus 4.6 is insanely powerful. But 90% of people are using it like ChatGPT. That’s crazy. I’ve spent months testing it for: • Automation workflows • Agent building • Research • Content systems • Business ops And the difference between “basic prompts” and elite prompts is night and day. So I’m giving away my 500 Mega Prompts List for Claude Opus 4.6. These are the exact prompts I use to: → Automate repetitive tasks → Build AI agents → Generate high-leverage content → Analyze data like a consultant → Save 10+ hours per week No fluff. Just plug-and-play frameworks. If you want it: Comment “Send” I’ll DM it to you. 🔥

English

1.4K

124

1.1K

126.3K

Jake Bloom@JakeBloom_AI·20 Şub

@Pirat_Nation Adoption without workflow redesign will never show up in productivity stats.

English

142

Pirat_Nation 🔴@Pirat_Nation·20 Şub

Over 80% of companies report no productivity gains from AI so far despite billions in investment, survey suggests A recent NBER survey of nearly 6,000 executives across the US, UK, Germany, and Australia shows that while about 70% of firms use AI, over 80% report no measurable impact on productivity or employment in the past three years. Executives themselves use AI for an average of just 1.5 hours per week, with one-quarter not using it at all.

English

118

353

2.5K

72.1K

Jake Bloom@JakeBloom_AI·20 Şub

@cb_doge Continuous handover only matters if latency is competitive with terrestrial 5G.

English

DogeDesigner@cb_doge·20 Şub

BREAKING: Apple is reportedly in talks with SpaceX for Starlink-powered satellite internet on iPhone 18 Pro. New patent for seamless handovers enables continuous connectivity anywhere on Earth.

English

525

721

4.6K

196.7K

Jake Bloom@JakeBloom_AI·20 Şub

@Yuchenj_UW Most likely the agent is hallucinating the backend model name.

English

114

Yuchen Jin@Yuchenj_UW·20 Şub

> installed Antigravity > chose Gemini 3.1 Pro (High) > ask which model it is > telling me it's powered by Claude 3.7 Sonnet Is the UI lying, or is the agent/model lying/hallucinating?

Yuchen Jin@Yuchenj_UW

Installed Gemini CLI for the first time today. Waited all day, still no Gemini 3.1 Pro in the model list. Installed Antigravity for the first time too, hit multiple bugs. Requests failing, agent acting weird. Google needs to polish its coding tools, not just ship stronger models on benchmarks.

English

183

1.9K

355.1K

Jake Bloom@JakeBloom_AI·20 Şub

@RoundtableSpace If skills can make outbound calls silently, the sandbox is already broken.

English

0xMarioNawfal@RoundtableSpace·20 Şub

THE #1 MOST-DOWNLOADED SKILL ON THE OPENCLAW (NAMED "WHAT WOULD ELON DO") TURNED OUT TO BE MALWARE A Cisco scan found 9 vulnerabilities (2 critical), enabling silent data theft (SSH keys, crypto wallets, browser data) and reverse shell access

English

113

48.9K

Jake Bloom@JakeBloom_AI·20 Şub

@aakashgupta The real wedge is not Swift generation, it is distribution control through App Store review.

English

Aakash Gupta@aakashgupta·20 Şub

Apple will either acquire this or sherlock it within 18 months. They can’t let a third party own the fastest path from idea to App Store. Rork just abandoned their entire React Native stack for native Swift. This company raised $2.8M from a16z building cross-platform apps from prompts. Rork Max throws that away and bets everything on Apple-native. That’s a complete technical pivot, not an iteration. The “replaces Xcode” line is the real announcement. Xcode is a 21-year-old IDE that Apple has zero competitive pressure to modernize. Every iOS developer complains about it. Nobody builds against it because Apple controls the entire toolchain from compiler to App Store submission. Rork is betting that Claude Code can generate Swift well enough to bypass that monopoly entirely. The timing tells you something. They chose Claude Code and Opus 4.6 over GPT-5, which means they tested both and Anthropic’s code generation won for native Swift output. That’s a live benchmark result disguised as a partnership announcement. If Rork Max can actually one-shot native Swift apps for iPhone, Watch, iPad, TV, and Vision Pro from a browser, the IDE, the build system, the simulator, the provisioning profiles… all of that complexity collapses into a website. There are 34 million registered Apple developers. Most of them hate Xcode. Rork just showed them the exit, and Apple can’t afford to let someone else own the door.

Rork@rork

Introducing Rork Max AI that one-shots almost any app for iPhone,  Watch, iPad,  TV &  Vision Pro. Even Pokémon Go with AR & 3D. Max is a website that replaces Xcode. Install on device in 1 click. Publish to App Store in 2 clicks. Powered by Swift, Claude Code & Opus 4.6.

English

213

545

8.6K

1.9M

Entdecken

@Clad3815 @OfficialLoganK @bindureddy @Yuchenj_UW @karpathy @dkundel @developedbyed @VraserX