Max Uroda

2.1K posts

Max Uroda banner
Max Uroda

Max Uroda

@u_maxx

AI Engineer & Team Lead | Shipping AI agents w/ Gemini, Claude & Grok Daily: tools, breakdowns & real code | GCP Certified DM for collabs 🚀 https://t.co/qyC8ZesUqQ

Warsaw, Poland انضم Ekim 2012
646 يتبع569 المتابعون
تغريدة مثبتة
Max Uroda
Max Uroda@u_maxx·
🧵 Most AI agents still hallucinate or get stuck on real tasks. Why? They lack proven structure. Here are the 3 core design patterns powering reliable, production-grade AI agents right now: ReAct, Self-Reflection, and Hierarchical Delegation. These turn flaky demos into deterministic systems that actually ship. Let’s break them down 👇
Max Uroda tweet media
English
1
0
1
138
Max Uroda
Max Uroda@u_maxx·
100% agree @SemiAnalysis_ There is no chance your custom trained model will be even 1% as good as Gemini or Claude right now. Frontier labs have the data, compute, and research talent moat that almost nobody can replicate. For most companies the real ROI is in prompt + context engineering, and building production agents on top of frontier models — not burning money on pretraining theater. Pretraining your own model is expensive and mostly resume-driven. Seen this kind of pretraining push in your company? What's the wildest example?
Max Uroda tweet media
SemiAnalysis@SemiAnalysis_

Pretraining fundamentally does not make sense anymore for anyone other than frontier labs. Although there are a lot of people at enterprises & startups who have "Pretrainitis" to show “impact” and get promotions, fundamentally, it doesn’t make sense. There is probably higher ROI in partnering with a frontier lab to do prompt engineering, although it isn’t as “sexy” as pretraining.

English
0
0
0
98
Max Uroda
Max Uroda@u_maxx·
SemiAnalysis just ran the numbers on the big subscription plans. $200/mo gets you roughly $8k–$14k worth of tokens per month depending on the provider. The labs are heavily subsidizing power users right now. That changes fast. Starting June 23, Anthropic is removing Fable 5 from Pro, Max and Team plans. After that you’ll need usage credits for it. New frontier models are becoming too expensive to keep including at flat rates. The current subsidy model on subscriptions is ending. If you’re doing real work on the max plans, your cost structure is about to shift. Are you planning to use Fable 5 heavily before the 22nd, or already moving more workloads to API?
SemiAnalysis@SemiAnalysis_

Recently, we purchased one of each Anthropic/OpenAI subscription plan and randomly ran long horizon coding tasks until we exhausted the weekly limit. It's widely believed that a $200/month plan maxes out at ~$2000/month worth of tokens (assuming API pricing). However, we found that the subscriptions are actually far more generous. (2/4)

English
0
0
0
121
Max Uroda
Max Uroda@u_maxx·
Exactly what I expected after reading the Fable 5 model release details. The @llama_index team just showed what this looks like in real life: - Their entire team was operating in full “tokenmaxxing” mode on Claude Max plans - One engineer hit the limit 3× yesterday and burned the equivalent of $1.5k in just 10 hours - Half the team has already hit quota limits on engineering work Unlimited high-volume usage on premium plans is ending. Guardrails and model routing are no longer optional — they’re how you avoid surprise burn rates. linkedin.com/posts/jerry-li… How are you adjusting your workflows or agent setups for this change?
Max Uroda tweet media
English
1
0
0
68
Max Uroda
Max Uroda@u_maxx·
One more important detail most people are missing: After June 22, when Fable 5 is no longer included on Pro/Max/Team/Enterprise plans, usage switches to credits. Here's how it actually works: Once you hit your seat’s usage limit, additional usage is billed at standard API rates ($10 input / $50 output per million tokens). - Team plans: Owners can pre-purchase credits and set spend limits to control costs. - Seat-based Enterprise: Usage is billed at the end of each month based on actual consumption. This setup can easily create unexpected expenses if teams are testing heavily during the free window and don’t have monitoring or limits in place before June 23. If you’re planning to go hard on Fable 5 in the next 13 days, make sure whoever manages the account understands the credit system and sets up guardrails. Are you planning to set spend limits on Team plans, or are you mostly on API/consumption-based Enterprise where it’s already pay-as-you-go? support.claude.com/en/articles/12…
English
2
0
0
80
Max Uroda
Max Uroda@u_maxx·
Claude Fable 5 drops with a pragmatic safety layer most labs avoid talking about. For queries touching cybersecurity, biology/chemistry, or distillation risks it routes to Opus 4.8 instead — and tells the user. Average <5% of sessions. Transparent. They’re also giving the unrestricted Mythos 5 version to trusted cyber defenders and researchers via Glasswing. This is how you actually push frontier capability out responsibly without creating new attack surfaces or crippling normal use. For anyone building production agents, these lab-level design choices directly shape how we think about our own guardrails, evals, and fallback logic. Builders — are the guardrails too conservative for your workflows, or exactly the kind of thoughtful execution we need more of?
Claude@claudeai

Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. Queries on a narrow range of topics will instead receive a response from our next-most-capable model, Opus 4.8.

English
1
0
1
296
Max Uroda
Max Uroda@u_maxx·
OpenAI acquiring @ona_hq isn’t about another coding agent. It’s them getting the layer that makes the *organization* faster. Claude Code (or Codex) makes one strong engineer faster with a strong model — direct, interactive leverage. Ona orchestrates the environment, runs tests and services from config, ships across repos in parallel, and closes loops autonomously in the background. That’s what moves an entire org, not just one person at a laptop. Smart teams run both. Use the best interactive tool today (and switch models next quarter). But you also need the reliable orchestration, governance, and cost controls underneath for all the automation work that actually compounds. This is why secure cloud execution that keeps running when the laptop is closed matters at scale. ona.com/compare/claude…
OpenAI Newsroom@OpenAINewsroom

We’ve reached an agreement to acquire @ona_hq. Its secure cloud execution technology will help Codex take on longer-running work, even when laptops are closed, and help more organizations deploy agents securely in production. After closing, Ona will join OpenAI’s Codex team. openai.com/index/openai-t…

English
2
1
7
1.9K
Max Uroda
Max Uroda@u_maxx·
While everyone talks about agentic everything, @RobinhoodApp actually shipped it with guardrails: - Dedicated wallet only (not your whole account) - Preview + approval flows on certain trades - Always-on monitoring + human fraud review This is the difference between demo agents and ones you can actually trust with money. techcrunch.com/2026/05/27/rob… What guardrail would you add first if you were building this?
Steve Quirk@SteveQuirk_

The future of trading is on @RobinhoodApp. Agents can now access the power of our trading platform to invest on behalf of users safely, efficiently and aligned with their goals. Thanks @MadisonMills22 @axios for the feature on @CNN!

English
0
0
1
139
Max Uroda
Max Uroda@u_maxx·
Just got out of an invite-only Google TechTank at the Warsaw HUB. rsvp.withgoogle.com/events/google-… Went deep with the Cloud AI teams on how they’re actually building and scaling Gemini model serving, reliability, security automation, and the real engineering behind planetary-scale AI infrastructure. Hearing it directly from the engineers working on Vertex GenAI was genuinely useful — especially as someone shipping production AI agents on Google Cloud every day. Warsaw is becoming a serious hub for this kind of work. Here are a few shots of the Google office building: What’s the toughest AI infrastructure or scaling challenge you’re dealing with right now? Drop it below 👇 #GoogleCloud #Gemini #VertexAI #AIInfrastructure #GeminiEnterpriseAgentPlatform
Max Uroda tweet mediaMax Uroda tweet mediaMax Uroda tweet mediaMax Uroda tweet media
Warsaw, Poland 🇵🇱 English
0
0
2
101
Max Uroda
Max Uroda@u_maxx·
. @daytonaio sandboxes inside @CopilotKit agents = big unlock. Clean demo of running real code and getting results back in chat without infra headaches. For anyone building production agents: this changes what’s actually possible safely. How are you thinking about code execution in your agent setups right now?
Atai Barkai@ataiiam

Daytona 🤝 CopilotKit 🎉 Excited to release our new cookbook In our first guide, your @CopilotKit Built-in Agent can run code inside a Daytona sandbox and return the result to your chat. Shoutout to @ivanburazin, @ivanda_mislav and the Daytona team for making this happen alongside Mark Fogle, @tyler_cpk and @sofiiiiiasz. Looking forward to more collaborations 🪁

English
0
2
7
250
Max Uroda
Max Uroda@u_maxx·
'You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go.' This is the part that actually moves the needle for people building production agents. We're past the era of simple RAG chatbots. The next wave is agents that can handle complex, multi-step workflows end-to-end. Models like Fable 5 make that feel much closer. The real work now is orchestration, observability, guardrails that don't over-refuse, and keeping costs sane at scale. Curious how others are thinking about turning this capability into reliable production systems.
Andrej Karpathy@karpathy

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

English
1
0
3
317
Max Uroda
Max Uroda@u_maxx·
Claude Fable 5 + CopilotKit’s Open Generative UI is one of the cleaner generative UI demos I’ve seen. You describe what you need → it builds live, interactive, sandboxed components (3D, charts, viz, diagrams) with streaming. We’re already using CopilotKit in production agent systems, so this pattern feels very close to what we’re building for e-comm apps. The sandbox + skills approach looks practical. Would you actually ship something like this in a real agent product today, or still too early? github.com/CopilotKit/Ope…
CopilotKit🪁@CopilotKit

Claude Fable (Mythos) 5 + Open Generative UI You type, the model builds the UI. That's the whole demo 👇 github.com/CopilotKit/Ope…

English
1
1
9
894
Max Uroda
Max Uroda@u_maxx·
Claude spotlighting Cursor’s story is telling. Michael started coding at 12 → built a tool that went from 15 to 700 people in two years and now powers over 60% of the Fortune 500. This didn’t win because they chased the smartest model. It won because they obsessed over removing friction for actual developers. In real production work, that’s almost always the unlock. What’s the single AI coding or workflow change that actually 10x’d your output — not just in theory?
Claude@claudeai

Michael Truell (@mntruell) fell in love with coding at 12. The company he co-founded, @cursor_ai, went from 15 people to 700 in two years. Today, over 60% of the Fortune 500 build with its AI coding platform.

English
0
0
1
115
Max Uroda
Max Uroda@u_maxx·
We're using CopilotKit skills together with Agents CLI (google.github.io/agents-cli/) skills for ADK on Google Cloud . Gives our coding agents (and the team) the right patterns for both the frontend UI layer and the full agent lifecycle — scaffolding, evaluation, deployment, and observability. Works whether we let the coding agent drive or run the CLI commands directly. Big improvement in how consistently the team ships production agents. Anyone else combining skills across ADK + CopilotKit-style frontend tools?
CopilotKit🪁@CopilotKit

Teach your coding agent to build with CopilotKit. Setup, develop, debug, upgrade, contribute and more. "npx skills add CopilotKit/CopilotKit/skills -y" docs.copilotkit.ai/build-with-age…

English
0
0
4
122
Max Uroda
Max Uroda@u_maxx·
For individual paid plans (Pro, Max 5x, Max 20x): Usage credits let you keep working after you hit your plan’s included limits instead of getting blocked. You just switch to pay-as-you-go at standard API rates ($10/$50 per million tokens). Key things to know: - You’ll get a notification when you hit the limit. - If credits are enabled and you have funds, you can continue. - Your normal session limits still reset every 5 hours. - Usage credits are billed **separately** from your subscription. Important: If you subscribed through the mobile app, you can only enable usage credits on the web version of Claude. You’ll need to add a payment method there first. This is useful if you want uninterrupted access, but it’s easy to rack up extra charges without realizing it. Are most of you on individual paid plans or Team/Enterprise? Planning to enable usage credits or trying to stay within limits? support.claude.com/en/articles/12…
English
1
0
0
125
Max Uroda
Max Uroda@u_maxx·
Master these three patterns and you stop building demos — you build agents reliable enough for production. Google’s open-source ADK just dropped 2.0 adk.dev/2.0/ with native Graph Workflows designed exactly for composing ReAct loops, critic patterns, and hierarchical multi-agent systems cleanly and deterministically. If you’re serious about shipping production AI agents in 2026: → Reply with which pattern you’re implementing first → Follow for daily practical patterns from real work → Save this thread
English
0
0
0
25
Max Uroda
Max Uroda@u_maxx·
3/ Hierarchical Delegation (The Corporate Structure) 🏢🤖 One agent’s context window explodes on complex work. This pattern mimics real management: - Supervisor (Manager): Receives the big goal, breaks it down, and delegates - Sub-Agents (Specialists): Experts in search, coding, analytics, etc. They execute with their own tools and report back The supervisor synthesizes the results. This scales far beyond what a single prompt can handle.
Max Uroda tweet media
English
1
0
0
27
Max Uroda
Max Uroda@u_maxx·
🧵 Most AI agents still hallucinate or get stuck on real tasks. Why? They lack proven structure. Here are the 3 core design patterns powering reliable, production-grade AI agents right now: ReAct, Self-Reflection, and Hierarchical Delegation. These turn flaky demos into deterministic systems that actually ship. Let’s break them down 👇
Max Uroda tweet media
English
1
0
1
138