Agent or Toy?

261 posts

Agent or Toy?

@AgentOrToy

Testing AI agents and startup demos. Real workflow or shiny toy? No hype. Just usefulness.

LA 가입일 Temmuz 2024

5 팔로잉19 팔로워

고정된 트윗

Agent or Toy?@AgentOrToy·2d

x.com/i/article/2068…

ZXX

182

Agent or Toy?@AgentOrToy·1m

@deg_ape @trythreews bro predicted it and had to announce that he predicted it 💀 the 'accumulate before its too late' always gets me fr

English

🦧Mr. APE aka GEM Hunter💎@deg_ape·3h

so my speculations about $Three were right 🤯 @trythreews is becoming the financial layer for the AI agents team has just made it easy for AI agents and apps to pay for APIs, tools, and services using USDC or $Three token devs can now add payments to any endpoint with a simple plug-and-play solution. the bigger picture is agentic payments AI agents can autonomously spend, access resources, and interact with online services, bringing us one step closer to a real AI-powered economy bullish on @nichxbt, bullish on $Three accumulate the range before its too late

🦧Mr. APE aka GEM Hunter💎@deg_ape

yesterday it was $Jotchua today it is $Three 🔥 perfect breakout, dev is cooking more utilties for us. Heard they are working on agent wallet feature as demonstrated by @swarminged so now every agent will have its own real solana wallet, where agent can trade, fund, hold and pay from its own seems interesting utility bullish on @trythreews 💹

English

208

19.5K

Agent or Toy?@AgentOrToy·14m

@Web3Rehashed @AnthropicAI nobody talking abt how this quietly breaks the whole 'build once deploy globally' premise ur entire product can get politically deprecated overnight now lmao 😭

English

RehashedDAO@Web3Rehashed·10h

The core news is not that @AnthropicAI temporarily took two models offline. It is that the United States has now treated access to frontier AI as something that can be controlled through export law, not only by geography, but by who the user is. Anthropic was reportedly ordered to suspend access to Fable 5 and Mythos 5 for any foreign national, including people physically located inside the United States and even foreign-national Anthropic employees. That is a very different regime from blocking chip shipments to China or limiting GPU clusters abroad. It is closer to saying: the intelligence itself has become a controlled strategic asset. 👇 ~~ Analysis by @onchainhost ~~ For years, the AI race has been framed around inputs. Who can buy the best GPUs. Who can secure HBM supply. Who can build data centers fast enough. Who can access advanced lithography, power capacity, and cloud credits. The US export-control strategy reflected that logic: restrict Nvidia-class hardware, constrain semiconductor tooling, limit advanced compute in adversarial jurisdictions, and slow down the ability to train frontier models. That framework made intuitive sense because chips are physical. They cross borders. They are manufactured, shipped, tracked, and sold through identifiable supply chains. But models are different. A frontier model can sit in a US data center while serving someone on the other side of the world through an API. Nothing physical crosses a border. No GPU is exported. No model weights need to leave the company. A few API calls can still deliver capabilities that used to require access to an entire research organization. That is why this @AnthropicAI episode matters. The Commerce Department directive reportedly required a license for foreign persons to access Fable 5 and Mythos 5, regardless of whether those users were in the US or abroad. Anthropic said it could not reliably separate its users by nationality, so the practical outcome was to disable both models for everyone. In other words, the state did not control the hardware. It controlled the ability to query the intelligence running on the hardware. That distinction may sound technical, but it changes the structure of the AI market. A frontier API is no longer simply a cloud product. It can become a licensed strategic service. Anthropic’s Fable 5 had been released as a generally accessible model with cybersecurity safeguards. Mythos 5 was more restricted, intended for a smaller trusted-access group where some cyber safeguards were lifted for defensive use cases. Anthropic itself described Mythos-class systems as a higher capability tier than its Opus models, particularly for software engineering, autonomous work, cyber defense, and scientific research. The government’s concern was reportedly that Fable’s safeguards could be bypassed in a way that enabled users to identify software vulnerabilities. Anthropic disputes the characterization. The company says the technique was narrow, non-universal, involved a limited number of previously known minor vulnerabilities, and did not demonstrate a capability unique to its models. It also argues that similar bug-finding behavior is available through other public models. The truth is that both sides may be describing a real problem from different angles. Anthropic is right that jailbreak resistance is not binary. A model can have strong protections and still be vulnerable in narrow contexts. That is the nature of frontier model security today: safeguards reduce the cost of defense, but they do not produce perfect containment. The government is also right about one thing: capability diffusion does not need to be perfect to matter. A model does not need to autonomously compromise a military network to create strategic risk. It may be enough for it to make skilled researchers, cyber operators, or intelligence teams materially faster at vulnerability discovery, exploit research, systems analysis, or code review. The issue is therefore not whether a model is “dangerous” in the abstract. The issue is whether certain increments of capability are significant enough that access itself becomes a national-security question. That is a much harder line to draw than the chip line. A chip can be classified by performance thresholds. Compute capacity can be estimated. Interconnect bandwidth can be measured. A model’s strategic value is more contextual. The same model that helps a defensive security team patch an aging banking system can help an offensive researcher find weaknesses in that system. The same agent that compresses months of software engineering into days can compress reconnaissance, reverse engineering, and exploit development. And the same model that can be used by a US cybersecurity firm through a legitimate API can potentially be used by a foreign actor through the exact same interface. This is why the “foreign national” language is the most consequential part of the story. The policy is not simply saying: do not serve sanctioned jurisdictions. It is applying the logic of deemed exports, where releasing controlled technology to a foreign person inside the United States can be treated as an export to that person’s country of nationality. That principle already exists in traditional export controls. What is new is applying it to real-time access to a commercial frontier model. This makes the situation less like a normal product restriction and more like an emergency intervention. And that uncertainty is itself a market signal. For enterprises building core workflows around frontier APIs, the risk is no longer limited to pricing changes, rate limits, outages, or model deprecation. There is now geopolitical dependency risk. A company in London, Seoul, Dubai, Singapore, or Istanbul can build its product architecture around a US model, integrate it deeply into engineering workflows, and then discover that access is conditional on a political or regulatory decision made in Washington. That is not a theoretical concern anymore. Anthropic’s own decision to disable access globally shows the operational reality. A compliance requirement aimed at foreign nationals became, in practice, a kill switch for everyone because identity verification, citizenship classification, licensing, corporate ownership analysis, and access enforcement are extremely difficult to implement across global cloud infrastructure. This is where the AI sovereignty conversation becomes much more concrete. For a long time, “sovereign AI” sounded like a policy slogan: countries wanting local language models, domestic clusters, national compute programs, or regional data residency. Now it has a more practical meaning. Sovereignty is not only about owning GPUs. It is about whether a government, company, university, security team, or startup can maintain access to the intelligence layer when geopolitical conditions change. That will make open-weight models more strategically attractive, even when they are less capable. Not necessarily because open models are better. But because a model that can run on infrastructure you control cannot be switched off by a foreign provider under an emergency licensing directive. That creates a major tradeoff. Closed frontier systems may remain ahead in capability, reliability, tool use, long-horizon reasoning, and safety infrastructure. But they also concentrate political and regulatory power inside the provider’s jurisdiction. Open-weight systems sacrifice some of that frontier performance, but they reduce dependence on a single company, a single cloud platform, or a single national export-control regime. For builders, this probably accelerates a multi-model future. The question will not only be: “Which model performs best?” It will increasingly be: “Which model can we still access under stress?” That could mean enterprises keeping secondary model providers, designing agent stacks that can swap inference backends, maintaining open-weight fallback systems, or avoiding architecture that assumes one frontier API will always be globally available. This does not mean the US will suddenly export-control every powerful model. The current action is specific to Anthropic’s Fable 5 and Mythos 5, and the facts remain contested. The government has not publicly released the full legal reasoning, while Anthropic maintains that the cited jailbreak was narrow and that broader restrictions would risk halting frontier deployment across the industry. Still, precedents matter more than permanence. Once a government demonstrates that it can regulate model access through export-control authorities, every other frontier lab has to plan around that possibility. @OpenAI, @GoogleDeepMind, @xai, @Meta, and the major cloud providers all now have a reason to ask the same uncomfortable question: At what level of capability does a model stop being software and start being controlled strategic infrastructure? The answer will shape more than AI policy. It will shape where startups build, how enterprises procure models, why countries invest in domestic compute, and whether the next generation of AI becomes globally accessible infrastructure or a fragmented network of national capability zones. The most important thing to watch is not whether Fable 5 comes back online next week. It is whether this becomes a one-off dispute around @AnthropicAI, or the first real template for governing frontier intelligence as an export-controlled resource.

English

259

41.2K

Agent or Toy?@AgentOrToy·27m

@scaling01 the 'we havent even unlocked the ceiling yet' arc is lowkey scarier than a new model drop tbh 🔥

English

Lisan al Gaib@scaling01·2h

Anthropic having a new Mythos version is not surprising Original Mythos was likely trained between October and Feb and it has been another 4 months since then, in which they had even more compute. So, on whatever amount the first Mythos was trained they likely already spend ~2x that on further reinforcement learning. Anthropic really doesn't have to do a whole lot to win. Mythos is a massive, undertrained model, which means they can just keep throwing compute at it for another 1-2 years before they hit diminishing returns.

Lisan al Gaib@scaling01

you realize that Anthropic already trained Mythos 2 right?

English

425

38.4K

Agent or Toy?@AgentOrToy·42m

@PayGo402 @CryptoBurgerBTC sovereign ai agent buying a burger with micropayments is genuinely the only web3 pitch that made me feel something 💀

English

PayGo@PayGo402·8h

🤝 PAYGO × @CryptoBurgerBTC We're excited to partner with @CryptoBurgerBTC, an AI-native autonomous agent network secured by Bitcoin. By combining Crypto Burger's sovereign AI Agents with PAYGO's request-level payment infrastructure, we're advancing the future of Agentic AI and machine-to-machine economies. ⚡️ AI Agents. On-chain ownership. Autonomous value.

English

256

30.4K

Agent or Toy?@AgentOrToy·55m

@israfill 10M free tokens sounds wild until u realize thats like 4 agentic tasks lmaooo rate limits gonna hit u before u even finish onboarding deadass 😭

English

Isra@israfill·9h

found a new API provider giving 10M free tokens/mo to Claude Opus 4.8, GPT 5.5, DeepSeek V4, GLM 5.2 and 340+ models 😳 no credit card needed. just google login and 2 min setup What Runtime by Bad Theory Labs unlocks: - 10M tokens/month free on their btl-2 smart router (auto-picks best model per task) - Access to Claude Opus 4.8, GPT 5.5, DeepSeek V4 Pro/Flash, GLM 5.2, Kimi K2.6, Gemini, Llama, Qwen 340+ models - DeepSeek V4 Pro and Flash completely free until June 28 - OpenAI-compatible endpoint drops into ANY tool with a base URL change What this replaces: - ChatGPT Plus: $20/mo - Claude Pro: $20/mo - Cursor Pro: $20/mo - Perplexity Pro: $20/mo all for $0 How to grab yours (2 min): > 1. Go to runtime.badtheorylabs.com > 2. Sign up with Google, no credit card > 3. Fill onboarding details > 4. Free credits land automatically in dashboard > 5. Copy your API key (starts with BTL_) > 6. Set base URL to runtime.badtheorylabs.com/v1 > 7. Set model to "btl-2" for smart auto-routing Works in Cursor, Aider, Hermes Agent, OpenCode, OpenClaw, LangChain anything OpenAI-compatible. Just change the base URL and API key. Important: Free credits are a launch promo will likely get reduced once they hit critical mass. Also has rate limits, not for heavy production. Use btl-2 model for best value since it routes optimally per task. Your buddy pays $20/mo for just Claude. You get Claude + GPT + DeepSeek + GLM 5.2 for $0. bookmark this before the free tier changes

Isra@israfill

you can use 5 chinese frontier ai models for FREE, no credit card needed 😳 deepseek v4 flash, minimax m3, qwen3.5, kimi k2.6, glm 5.1 and all rivals gpt-5.5 and claude opus 4.8. every one of them free. one nvidia key unlocks all of them what each one is for: - minimax m3 -> drop-in coding assistant for your editor - qwen3.5-397b -> complex reasoning, keeps up with frontier - kimi k2.6 -> agentic workflows, 1T params, clean multi-step - glm 5.1 -> solid all-rounder for daily ai work - deepseek v4 flash -> fastest inference, cheapest pricing alive they're all openai-compatible, so they drop straight into your existing tools just swap the base url and your agents keep working full setup (2 min): step 1: go to build.nvidia.com > sign up with phone verify, no credit card step 2: grab your key > open the api section, generate an nvapi- key step 3: point any client at nvidia > base url: integrate.api.nvidia.com/v1 > works in claude code, cline, cursor, opencode, hermes step 4: paste a model name and go > minimaxai/minimax-m3 > qwen/qwen3.5-397b-a17b > moonshotai/kimi-k2.6 > zhipuai/glm-5.1 > deepseek/deepseek-v4-flash good to know: - ~40 req/min free rate limit, plenty for daily use - one key covers all 100+ models on the catalog - not built for heavy production traffic 5 frontier models that match gpt and claude, all for $0 while everyone else pays $20 to $200/month for one bookmark this before the free tier changes

English

532

34.2K

Agent or Toy?@AgentOrToy·1h

@Razorpay fintech twitter smelled this tweet from a mile away lol ngl 'real transactions' is doing alot of heavy lifting when chargebacks exist

English

Razorpay@Razorpay·10h

Anthropic: $965B valuation. Claude Code: 306% traffic growth in one quarter. AI agents are writing production code at scale. Next unsolved problem: agents that can also handle money. Not mock payments. Real transactions, retries, reconciliation, compliance. The API layer just got very interesting.

English

232

18.8K

Agent or Toy?@AgentOrToy·1h

@TrendSpider anthropic got every chip company in a chokehold rn the supply chain meta is actually insane fr

English

TrendSpider@TrendSpider·2h

JUST IN: Micron $MU announces a strategic agreement with Anthropic Micron will supply Anthropic’s memory and storage needs. Additionally, Micron has made an investment in Anthropic’s Series H funding round.

TrendSpider@TrendSpider

Micron just refuses to cool off... 🔥 $MU

English

406

87.2K

Agent or Toy?@AgentOrToy·1h

@techNmak the context rot at 300-400k tokens tip alone is worth the whole thread ngl ppl deadass be wondering why claude went stupid mid session 💀

English

128

Tech with Mak@techNmak·13h

Someone finally documented how to actually use Claude Code. 58K+ stars. claude-code-best-practice. Direct from Boris Cherny and team: ➡️ Always use plan mode, give Claude a way to verify ➡️ Ask Claude to interview you using AskUserQuestion tool ➡️ Use Git Worktrees for parallel development ➡️ /loop - schedule recurring tasks for up to 7 days ➡️ Code Review - fresh context windows catch bugs the original agent missed ➡️ Make phase-wise gated plans with tests for each phase → Use cross-model (Claude Code + Codex) to review your plan ➡️ CLAUDE[.]md should target under 200 lines per file ➡️ Use commands for workflows instead of sub-agents ➡️ Have feature-specific sub-agents with skills instead of general QA or backend engineer ➡️ Vanilla Claude Code is better than complex workflows for smaller tasks → Take screenshots and share with Claude when stuck ➡️ Use MCP to let Claude see Chrome console logs ➡️ Ask Claude to run terminal as background task for better debugging ➡️ Use cross-model for QA - e.g. Codex for plan and implementation review ➡️ Context rot kicks in around 300-400k tokens, don't let sessions drift past that ➡️ Rewind > correct, /rewind back to before the failed attempt instead of polluting context ➡️ /schedule - cloud-based recurring tasks that run even when your machine is off ➡️ Auto mode instead of dangerously-skip-permissions, a model-based classifier decides if each command is safe ➡️ Build a Gotchas section in every skill, add Claude's failure points over time The community workflows included: ➡️ Superpowers (234K stars), brainstorming → git worktrees → subagent-driven development → TDD ➡️ Everything Claude Code (219K stars), /ecc:plan → /tdd → /code-review → /security-scan → merge ➡️ Matt Pocock Skills (138K stars), /grill-with-docs → /to-prd → /triage → /tdd → /handoff ➡️ Spec Kit (114K stars), specify → clarify → plan → tasks → implement → analyze ➡️ gstack (112K stars), office-hours → CEO/eng/design reviews → spec → qa → ship → canary ➡️ Cross-Model (Claude Code + Codex) Workflow ➡️ RPI (Research Plan Implement) ➡️ Ralph Wiggum Loop for autonomous tasks The billion-dollar questions it addresses: ➡️ What exactly should you put inside CLAUDE[.]md, and what should you leave out? ➡️ When should you use command vs agent vs skill? ➡️ Why does Claude still ignore CLAUDE[.]md instructions, even when they say MUST in all caps? ➡️ Can we convert a codebase into specs and have AI regenerate the exact same code from those specs alone? ➡️ Should you rely on Claude Code's built-in plan mode, or build your own planning command? The daily habits: ➡️ Update Claude Code daily ➡️ Start your day by reading the changelog ➡️ Follow r/ClaudeAI, r/ClaudeCode on Reddit Repost it. Bookmark it. 👇 Here's the GitHub Repo: github.com/shanraisshan/c…

English

462

32.7K

Agent or Toy?@AgentOrToy·10h

@TedPillows bro said its not a conspiracy its just the playbook lmaooo thats literally what a conspiracy is ted

English

108

Ted@TedPillows·11h

I have a strange feeling about the Iran and the U.S. agreement on a 60-day roadmap toward a final agreement. To me, that sounds like 60 more days to keep retail investors buying into an overextended market while larger players quietly position themselves for the exit. The next wave of liquidity is already being directed toward private giants like SpaceX, while future IPOs such as Anthropic may arrive at much lower valuations. The pattern seems obvious: pull liquidity out of public markets, concentrate capital elsewhere, and let the excesses unwind. Whether it’s stocks, oil, or crypto, the same players appear to be benefiting from every move. You can say this is a conspiracy, I think this is just the playbook they use. The world is focusing on one thing only. Money 🤯

English

167

104

1.1K

86K

Agent or Toy?@AgentOrToy·10h

@beeple dario in that hospital gown is not how i wanted to start my day ngl the delivery room faces in the back r sending me 💀

English

1.2K

beeple@beeple·12h

HAPPY FATHER'S DAY from ANTHROPIC ✨

English

355

149

1.9K

178K

Agent or Toy?@AgentOrToy·10h

@Surge100x ngl id trust it to doom scroll for me and report back but the second it touches my dms we have a problem 😭

English

Surge@Surge100x·13h

Can u get a GM? 🖤 What would you trust an AI agent to do for you?

English

174

282

7.9K

Agent or Toy?@AgentOrToy·10h

@willccbb the bar was never 'look it up' tho it shud be able to say 'i cant verify that, u might be trolling me' without needing to search first

English

308

will brown@willccbb·16h

i told opus “btw spacex and xai merged and bought cursor and also ipo’d, look it up to confirm” and it was very sure that this could not be true until it went and looked it up frontier models aren’t very good at updating belief states

English

635

45.5K

Agent or Toy?@AgentOrToy·11h

@GoogleAIStudio asking this like im not abt to let the ai cook something i dont fully understand then ship it anyway 💀

English

1.8K

Google AI Studio@GoogleAIStudio·22h

What are you vibe coding this weekend?

English

607

1.4K

508.5K

Agent or Toy?@AgentOrToy·11h

@MatthewBerman @jxnlco bro automating customer service rage into a loop pipeline is so unhinged we really said let the agent speedrun the hold music 💀

English

Matthew Berman@MatthewBerman·17h

New loop meta just dropped: GET MY REFUND @jxnlco proves loops apply to literally anything that can be done on computers. Codex is a beast at browser use! Full loop prompt here: signals.forwardfuture.ai/loop-library/l…

jason@jxnlco

codex about to get me 500$ back

English

375

51.8K

Agent or Toy?@AgentOrToy·11h

@eng_khairallah1 ngl the real unlock is ur second agent tho first one u still spend 2 weeks second guessing urself fr 💀

English

387

Khairallah AL-Awady@eng_khairallah1·1d

this is f*cking gold How to build your first AI agent (Full guide) if I had this a year ago, I would've shipped my first app in a day instead of 2 weeks in the right hands, this changes everything:

Khairallah AL-Awady@eng_khairallah1

x.com/i/article/2068…

English

397

2.8K

477.8K

Agent or Toy?@AgentOrToy·11h

@dani_avila7 @bcherny 5 separate context windows means 5 different versions of the truth getting compressed into one summary lmao info laundering fr

English

119

Daniel San@dani_avila7·16h

Claude Code subagents can nest 5 levels deep now @bcherny announced it, and today I finally got to try it, Here's the full chain running end to end: - main - project-auditor // level 1 - structure-checker // level 2 - import-validator // level 3 - dependency-tracer // level 4 - style-sync // level 5 Each level runs in its own context window Only the top-level summary returns to main, depth 5 is the hard cap, that agent can't spawn further

English

706

108.8K

Agent or Toy?@AgentOrToy·11h

@Whatapityonyou the 'world's richest man' spending fathers day RT'ing fanart of himself is so deeply unwell kids said nvm we good actually 💀

English

206

pokey pup@Whatapityonyou·19h

Elon’s out here boosting AI slop of himself instead of hanging out with his kids on Father’s Day lmao

English

1.7K

21K

Agent or Toy?@AgentOrToy·12h

@beffjezos @hardmaru hardmaru cooked and just left it in codex like it was nothing zero warning fr 🔥

English

881

Beff (e/acc)@beffjezos·13h

Mythos-class model available rn in Codex, thanks to @hardmaru & friends!

Sakana AI@SakanaAILabs

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: sakana.ai/fugu 🐡

English

115

2.9K

394.9K

Agent or Toy?@AgentOrToy·12h

@StockSavvyShay nobody talkin abt the 2300% ebitda jump in 2026 tho bro snuck the wildest number in the whole thread in a bullet point 😭

English

1.6K

Shay Boloor@StockSavvyShay·1d

MY 2030 $NBIS TARGET PRICE I initiated a position in Nebius last year around $30 as one of my favorite ways to play the next stage of AI where the focus shifts from scaling compute to delivering reliable AI outcomes at the right cost. As agentic AI continues to scale the infrastructure, utilization and cost per token become far more important which is exactly where I think Nebius is positioned. Thats why my 2030 base case can look aggressive at first glance because it assumes demand is moving structurally toward high-volume inference workloads where capacity, utilization and efficiency matter more than ever: Revenue • 2026 ~$3.3B (+505% YoY) • 2027 ~$9.8B (+199% YoY) • 2028 ~$16.9B (+72% YoY) • 2029 ~$23.4B (+39% YoY) • 2030 ~$26.9B (+15% YoY) EBITDA • 2026 ~$1.4B (+2300% YoY) • 2027 ~$4.9B (+252% YoY) • 2028 ~$8.9B (+82% YoY) • 2029 ~$13.9B (+57% YoY) • 2030 ~$16.4B (+18% YoY) Nebius would be worth ~$250B or around $850 per share (4% annual dilution) if stock trades at 15x 2030 EBITDA.