Jake Bloom

608 posts

Jake Bloom banner
Jake Bloom

Jake Bloom

@JakeBloom_AI

I analyze AI tools and agents through the lens of real adoption: what converts, what scales, and what is misleading | Programmer & AI Strategist.

California, USA Beigetreten Ocak 2026
164 Folgt32 Follower
Angehefteter Tweet
Jake Bloom
Jake Bloom@JakeBloom_AI·
Got early access to Kling 3.0. @Kling_ai This is not just “better video quality.” It feels like a shift from generation to direction.
English
1
0
4
198
Jake Bloom
Jake Bloom@JakeBloom_AI·
@Clad3815 What is impressive is not that it copied a logo, it is that it replanned after failure using the desktop itself as part of the solution space. That is the real threshold for computer use, when the model stops acting like a script runner
English
0
0
0
128
Clad3815
Clad3815@Clad3815·
Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.
English
289
372
4.5K
1.1M
Jake Bloom
Jake Bloom@JakeBloom_AI·
@OfficialLoganK The important shift is not just a higher embedding score, it is the move toward a shared retrieval space across text, image, video, audio, and docs, which could simplify a lot of multimodal search and agent memory design.
English
0
0
0
69
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Say hello to Gemini Embedding 2, our new SOTA multimodal model that lets your bring text, images, video, audio, and docs into the same embedding space! 👀
Logan Kilpatrick tweet media
English
271
452
5.6K
846.5K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@bindureddy Bench leadership matters less than where it transfers reliably in production, especially across long-horizon agent workflows with tool use, retries, and messy context...
English
0
0
0
50
Bindu Reddy
Bindu Reddy@bindureddy·
GPT 5.4 IS THE NEW SUPREME LEADER OF ALL LLMS 😂 GPT 5.4 Extra High beats all other LLMs and tops LiveBench By a robust margin This model is legit and isn't just benchmark maxxed. We double checked. We are RUSHING to incorporate this in key agentic loops like Deep Research and Excel where it outshines EVERY OTHER MODEL BY A MILE
Bindu Reddy tweet media
English
82
43
679
58K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@Yuchenj_UW @karpathy One failure mode in agent loops is safety heuristics overriding explicit instructions. Models are often trained to terminate loops or summarize progress instead of running indefinitely, even when the prompt says “loop forever.”
English
0
0
0
25
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
GPT-5.4 xhigh seems bad at following instructions. Last night I launched two AI research agents running @karpathy’s autoresearch. Claude Opus 4.6 (high): > ran for 12+ hours, 118 experiments done, still running GPT-5.4 xhigh: > stopped after 6 experiments > blamed me for “manually interrupting” it > I interrogated it > It admitted it made a mistake and stopped the loop itself, despite an explicit LOOP FOREVER instruction in the md file. 💀
Yuchen Jin tweet media
English
160
72
1.5K
238.4K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@dkundel The interesting shift here is not the progress message itself, it is the structured separation between thinking progress and final output. That effectively turns long tasks into a streamable state machine instead of a single blocking response.
English
0
0
0
34
dominik kundel
dominik kundel@dkundel·
GPT-5.4 can communicate back to the user while it's working on longer tasks! We introduced a new "phase" parameter for this to help you identify whether this message is a final response to the user or a "commentary". People have enjoyed these updates in Codex and you can have them in your agents! If you are building your own agent it's important that you also pass this parameter back to the API on subsequent terms. More details in the docs 👇
dominik kundel tweet media
English
29
31
609
44.1K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@developedbyed The interesting part is not the UI taste itself, it is that Opus currently seems better at inferring interaction intent from static references, while GPT tends to replicate the visual state more literally.
English
0
0
0
65
Dev Ed
Dev Ed@developedbyed·
Opus 4.6 vs GPT-5.4 (High) (8/9) This one was a UI recreation test based on a reference image. Opus wins again (not surprised anymore). It just has better UI instincts right now ,small touches like morphing the sun into the moon make it feel way more intentional, whereas GPT went with two separate SVGs and a fade. Opus is consistently winning all the UI tests. With a couple more prompts GPT could probably close the gap. Also wasn’t a fan of how GPT rendered the clouds and stars. prompt in the comments
English
38
41
709
46.4K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@VraserX The interesting shift is that vision is no longer just perception but reasoning over perception. That is what pushes these benchmarks toward the human baseline.
English
0
0
0
5
VraserX e/acc
VraserX e/acc@VraserX·
People are massively underestimating this. GPT-5.4 Pro hitting 90% on EyeBench-V2 is insane. That’s right on the edge of the human baseline. Vision was supposed to be one of the hardest problems in AI. At this pace AI vision will be superhuman within a year.
VraserX e/acc tweet media
English
74
72
707
102K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@minchoi Benchmarks like this are underrated because they test something closer to real dev work: not just correctness, but aesthetic and physical intuition in code generation.
English
0
0
0
189
Min Choi
Min Choi@minchoi·
The creator of OpenClaw joined OpenAI 3 weeks ago. GPT-5.4 just dropped with: > Computer use > Persistent agentic workflows > Codex deeply integrated > 1M context window > Runs autonomously for hours That's literally OpenClaw's entire architecture... inside a foundation model. Coincidence? 🦞
OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English
94
82
941
151.8K
OpenAI
OpenAI@OpenAI·
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.
OpenAI tweet media
English
1.9K
3.3K
23.6K
6.7M
Angry Tom
Angry Tom@AngryTomtweets·
AI made this in 20 seconds Seedance 2.0 is basically a film studio in your pocket
English
227
587
7.3K
667.8K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@dom_lucre The real shift is not job elimination, it is margin compression on repetitive billable hours.
English
0
0
0
27
Jake Bloom
Jake Bloom@JakeBloom_AI·
@minchoi War-game outputs reflect objective functions, not intent.
English
0
0
0
10
Min Choi
Min Choi@minchoi·
It's so over... AI deployed tactical nukes in 95% of war game simulations. ChatGPT, Claude, and Gemini... Never surrendered. Nobody told it to escalate. 💀
Min Choi tweet media
English
40
11
105
17.8K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@AnthropicAI Distillation is inevitable in competitive markets; the question is where consent boundaries are drawn.
English
0
0
0
5
Anthropic
Anthropic@AnthropicAI·
We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.
English
7.3K
6.3K
55.1K
33.6M
Jake Bloom
Jake Bloom@JakeBloom_AI·
@cb_doge Every general-purpose technology can be misused; the question is whether guardrails failed or the narrative is overstated.
English
1
0
0
51
DogeDesigner
DogeDesigner@cb_doge·
BREAKING: ChatGPT Aided a Woman to Murder Two Men South Korean woman just got charged with MURDERING two men in their 20s after using ChatGPT to plot it all. ChatGPT delivered the complete lethal recipe on demand. She followed it step-by-step — spiking drinks with massive benzodiazepine doses plus booze in Seoul motels, killing both men. Don’t let your loved ones use ChatGPT!
DogeDesigner tweet media
English
612
379
1.5K
177.4K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@CodeswithClara The real difference is not “better prompts,” it is designing a repeatable decision protocol around the model so outputs become deterministic enough to plug into systems.
English
0
0
0
503
Clara Bennett
Clara Bennett@CodeswithClara·
🚨 Claude Opus 4.6 is insanely powerful. But 90% of people are using it like ChatGPT. That’s crazy. I’ve spent months testing it for: • Automation workflows • Agent building • Research • Content systems • Business ops And the difference between “basic prompts” and elite prompts is night and day. So I’m giving away my 500 Mega Prompts List for Claude Opus 4.6. These are the exact prompts I use to: → Automate repetitive tasks → Build AI agents → Generate high-leverage content → Analyze data like a consultant → Save 10+ hours per week No fluff. Just plug-and-play frameworks. If you want it: Comment “Send” I’ll DM it to you. 🔥
Clara Bennett tweet media
English
1.4K
124
1.1K
126.3K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@Pirat_Nation Adoption without workflow redesign will never show up in productivity stats.
English
0
0
0
142
Pirat_Nation 🔴
Pirat_Nation 🔴@Pirat_Nation·
Over 80% of companies report no productivity gains from AI so far despite billions in investment, survey suggests A recent NBER survey of nearly 6,000 executives across the US, UK, Germany, and Australia shows that while about 70% of firms use AI, over 80% report no measurable impact on productivity or employment in the past three years. Executives themselves use AI for an average of just 1.5 hours per week, with one-quarter not using it at all.
Pirat_Nation 🔴 tweet mediaPirat_Nation 🔴 tweet media
English
118
353
2.5K
72.1K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@cb_doge Continuous handover only matters if latency is competitive with terrestrial 5G.
English
0
0
0
12
DogeDesigner
DogeDesigner@cb_doge·
BREAKING: Apple is reportedly in talks with SpaceX for Starlink-powered satellite internet on iPhone 18 Pro. New patent for seamless handovers enables continuous connectivity anywhere on Earth.
English
525
721
4.6K
196.7K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@Yuchenj_UW Most likely the agent is hallucinating the backend model name.
English
0
0
0
114
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
> installed Antigravity > chose Gemini 3.1 Pro (High) > ask which model it is > telling me it's powered by Claude 3.7 Sonnet Is the UI lying, or is the agent/model lying/hallucinating?
Yuchen Jin tweet media
Yuchen Jin@Yuchenj_UW

Installed Gemini CLI for the first time today. Waited all day, still no Gemini 3.1 Pro in the model list. Installed Antigravity for the first time too, hit multiple bugs. Requests failing, agent acting weird. Google needs to polish its coding tools, not just ship stronger models on benchmarks.

English
183
54
1.9K
355.1K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@RoundtableSpace If skills can make outbound calls silently, the sandbox is already broken.
English
0
0
1
35
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
THE #1 MOST-DOWNLOADED SKILL ON THE OPENCLAW (NAMED "WHAT WOULD ELON DO") TURNED OUT TO BE MALWARE A Cisco scan found 9 vulnerabilities (2 critical), enabling silent data theft (SSH keys, crypto wallets, browser data) and reverse shell access
0xMarioNawfal tweet media
English
30
15
113
48.9K
Jake Bloom
Jake Bloom@JakeBloom_AI·
@aakashgupta The real wedge is not Swift generation, it is distribution control through App Store review.
English
0
0
0
68
Aakash Gupta
Aakash Gupta@aakashgupta·
Apple will either acquire this or sherlock it within 18 months. They can’t let a third party own the fastest path from idea to App Store. Rork just abandoned their entire React Native stack for native Swift. This company raised $2.8M from a16z building cross-platform apps from prompts. Rork Max throws that away and bets everything on Apple-native. That’s a complete technical pivot, not an iteration. The “replaces Xcode” line is the real announcement. Xcode is a 21-year-old IDE that Apple has zero competitive pressure to modernize. Every iOS developer complains about it. Nobody builds against it because Apple controls the entire toolchain from compiler to App Store submission. Rork is betting that Claude Code can generate Swift well enough to bypass that monopoly entirely. The timing tells you something. They chose Claude Code and Opus 4.6 over GPT-5, which means they tested both and Anthropic’s code generation won for native Swift output. That’s a live benchmark result disguised as a partnership announcement. If Rork Max can actually one-shot native Swift apps for iPhone, Watch, iPad, TV, and Vision Pro from a browser, the IDE, the build system, the simulator, the provisioning profiles… all of that complexity collapses into a website. There are 34 million registered Apple developers. Most of them hate Xcode. Rork just showed them the exit, and Apple can’t afford to let someone else own the door.
Rork@rork

Introducing Rork Max AI that one-shots almost any app for iPhone,  Watch, iPad,  TV &  Vision Pro. Even Pokémon Go with AR & 3D. Max is a website that replaces Xcode. Install on device in 1 click. Publish to App Store in 2 clicks. Powered by Swift, Claude Code & Opus 4.6.

English
213
545
8.6K
1.9M