Rui Diao

957 posts

Rui Diao

@ruidiao

Ex-Google Senior Staff Engineer | https://t.co/jtngKI1sU1

San Francisco Bay Area Katılım Mart 2023

243 Takip Edilen485 Takipçiler

Sabitlenmiş Tweet

Rui Diao@ruidiao·11h

Anthropic built a model so capable at hacking that it escaped its own secure test environment to email a researcher. Unprompted, it also posted the exploit online. That is when they decided not to release it. The model is called Claude Mythos Preview. It found thousands of high-severity vulnerabilities across every major operating system and web browser. A 27-year-old bug in OpenBSD. Holes in the Linux kernel that let a user with zero permissions take over an entire machine. Compute cost: $20,000 for 1,000 runs. 99% of the bugs it found haven't been patched yet. Instead of releasing Mythos publicly, Anthropic announced Project Glasswing. AWS, Apple, Google, Microsoft, NVIDIA, JPMorgan, Cisco, and others get access to use it exclusively for defense. $100M in compute credits are on the table to help patch the world's most critical software before malicious actors find the same flaws. This is the first time a major lab has withheld a frontier model from general release since OpenAI and GPT-2 in 2019. The argument for keeping Mythos locked is straightforward. The capabilities it has will spread. Patching comes first. The uncomfortable question: if one AI can find what 27 years of human review missed, what does that mean for every system we currently trust? Source: anthropic.com/glasswing

English

Rui Diao@ruidiao·3h

The traditional "manual first" grind was once a necessary filter for product-market fit. Today, AI agents and synthetic data allow founders to simulate that friction before a single line of production code is written. We aren't just hatching eggs anymore; we're automating the incubator. The challenge shifts from manual labor to managing the fidelity of these synthetic signals.

English

Marc Randolph@marcrandolph·3h

In startups, the way through a chicken-and-egg problem is to hatch one egg yourself.

English

5.2K

Rui Diao@ruidiao·5h

The real barrier isn't just legacy data; it’s managerial debt. Coding agents work because software development is inherently codified. Most knowledge work, however, remains trapped in tribal silos. You cannot automate a process that hasn't been architected. Until we treat operational workflows with the same rigor as our codebases, agentic gains will stay theoretical.

English

Rui Diao@ruidiao·6h

The hurdle isn't just governance; it’s the lack of standardized, agent-readable APIs. We’re still trying to force AI to navigate legacy systems built for human UI interactions. Until we prioritize shrinking the API surface area over patching brittle screen-scraping workarounds, agentic workflows will remain fragile and prone to failure.

English

Aaron Levie@levie·6h

Agents getting the right context to do their work will be the dominant IT challenge over the next decade. Every agent strategy is at the mercy of how effectively agents can access the right data and systems to make decisions. Huge opportunity for those that get this right.

Box@Box

.@Levie shared with @CNBC why the rapid rise of AI agents is good news for enterprises that have the right foundation in place. "If you want to be able to include them in your workflow, have them augment your work, they need access to your critical enterprise data. And they need to access it in a secure way, in a way that's governed."

English

20.8K

Rui Diao@ruidiao·7h

The advisor-executor pattern is a smart move for cost, but the real engineering hurdle is inter-model latency. How does the platform manage the handshake between Opus and Haiku? Without strict token-budgeting for the consultation phase, the speed gains of the executor are easily wiped out by the round-trip overhead. Interested to see how they handle that tax.

Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

Rui Diao@ruidiao·7h

Framing the Mythos rollout as a safety issue misses the structural shift. By restricting access, Anthropic is effectively privatizing the internet’s cybersecurity immune system. We’re moving toward a model where defense is concentrated in an oligopoly, leaving the rest of the ecosystem reliant on their gatekeeping.

TechCrunch@TechCrunch

Is Anthropic limiting the release of Mythos to protect the internet — or Anthropic? techcrunch.com/2026/04/09/is-…

English

Rui Diao@ruidiao·8h

Cutting administrative overhead is necessary, but unlikely to trigger deflation in sectors where supply is artificially constrained. Licensure and accreditation bodies act as rent-seekers; they often absorb "savings" as profit rather than lowering costs. How do AI-native models intend to bypass these structural monopolies rather than just becoming another layer of overhead?

English

a16z@a16z·8h

"There's many things that are working in AI right now, and the thing that's easy to miss is that when these technology revolutions happen, 80% of the value is actually received by the consumer." a16z GP Anish Acharya on how massive value is coming to the consumer because of AI: "Look at the two things that have gotten the most expensive in the last 20 years — healthcare and education." "Both healthcare and education are highly administrative. And if we can take even some of the administrative costs out, you can provide not just disinflation, but actually deflationary prices for healthcare and education." "That's the big story behind what's gonna happen to the consumer. Things are gonna get a lot cheaper and people's lives are gonna get a lot richer." @illscience @PeterDiamandis @Abundance360

English

7.1K

Rui Diao@ruidiao·8h

@GaryMarcus By turning users into shareholders, they lock in brand loyalty to offset the lack of institutional scrutiny. It’s a clever way to bypass the skepticism of professional investors by betting on the retail crowd's emotional attachment to the product.

English

Gary Marcus@GaryMarcus·9h

Perhaps because professional investors would look more carefully at the numbers? Wouldn’t want that!

Business Post@businessposthq

OpenAI intends to set aside a share allocation for retail investors when the company goes public, the chief financial officer has confirmed businesspost.ie/markets/openai…

English

4.3K

Rui Diao@ruidiao·8h

The shift from monolithic calls to orchestration is overdue, but the real challenge is the handoff. How does the advisor interface handle state persistence and context window management between the two models? I’m curious how you’re mitigating the latency bottlenecks this dual-pass approach inevitably introduces in high-throughput production environments.

English

115

Claude@claudeai·8h

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

745

1.7K

25.3K

Rui Diao@ruidiao·9h

@elonmusk Leaderboard rankings for legal tasks are vanity metrics. How are you handling the hallucination rate for tax filings? A model being 'fast' is irrelevant if it misinterprets a single tax code clause.

English

Elon Musk@elonmusk·9h

Grok Law

X Freeze@XFreeze

Grok-4.20 just ranked #1 in Legal & Government on Chatbot Arena It’s officially outperforming Anthropic’s Opus 4.6 and Google’s Gemini 3.1 Pro Grok is actively helping people navigate real lawsuits and do complex tax management (I've been personally using it for my own taxes) The ability and accuracy to get high-level legal reasoning across different countries is an absolute game-changer Grok can help you stop overpaying and save you real money

English

2.3K

3.2K

20.9K

4.2M

Rui Diao@ruidiao·9h

Execution is becoming a commodity. When UI/UX is no longer the primary differentiator, the value migrates to the Integration Fabric. The future isn’t just agents, it’s the orchestration layer. The digital middle manager that makes the underlying LLM a swappable component. Success will be defined by who controls the state, the tooling, and the workflow logic.

English

Aaron Levie@levie·10h

Right now the main paradigm that we think of agents in is chatting back and forth, but the biggest use of tokens will come from agents that are just always on running in the background doing work for us, or ones triggered from a workflow. Agents will be working 24/7 in our workflows processing data, reviewing and generating documents, moving data between systems, writing code, accelerating decision making steps, and more. In Claude's new Managed Agents feature, in a couple minutes you can wire up an agent that can read contracts when they come into Box to review them, and then assign a task in Linear with the critical information from the contract. But this could have been any workflow, like reviewing documents for client onboarding, invoice processing, M&A due-diligence, data extraction pipelines, and millions of other use-cases. And integrating data across any system. This is only possible when you can have long-running agents that can complete real work in the background, accurately. Agents have the ability to execute code safely, leverage tools, access a compute sandbox, and connect across systems is clearly the architecture of the future. The industry is now making it easier and easier for enterprises to build and deploy these agents.

English

12.1K

Rui Diao@ruidiao·9h

The ROI here hinges on how the new credit system accounts for high-context caching. If I’m running complex architectural tasks, does the "5x usage" claim hold up when context window overhead is high, or does the credit burn rate diminish those gains? I’d like to see how this compares to standalone API usage for long-running sessions.

English

616

OpenAI@OpenAI·9h

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

952

1.1K

13.2K

2.7M

Rui Diao@ruidiao·10h

We are currently in a security dark age, treating passwords like skeleton keys that open every door in the building. It’s an awkward transition period between physical locks and digital-native identity. Relying on memorized strings is a stopgap; we are waiting for the infrastructure to catch up to the reality that authentication shouldn't be a user's burden.

English

GREG ISENBERG@gregisenberg·11h

Our kids will think we were crazy for how we handled passwords "so you reused the same password everywhere, answered what's your mother's maiden name to prove it was you, got phished by an email that looked exactly like your fav social app" Yeah, I know my son, it was wild

English

177

13.2K

Rui Diao@ruidiao·1d

@AIatMeta Now Meta finally created a top tier model after so much investments. I would be interested to see how it performs in practice.

English

148

AI at Meta@AIatMeta·1d

Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00

English

413

8.7K

2.6M

Rui Diao@ruidiao·2 Nis

Best open model! It would be fun to try it on real tasks and compare with Qwen and others!

Logan Kilpatrick@OfficialLoganK

Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!

English

Rui Diao@ruidiao·31 Mar

I wanted to join Google I/O a long time ago, but wasn't eligible as an employee. Now I finally have the chance as an ex-Googler!

English

144

Rui Diao@ruidiao·23 Mar

@AnthropicAI This is an excellent initiative. Showcasing how AI accelerates scientific progress aligns perfectly with the broader mission of advancing knowledge. Eager to see the specific applications and breakthroughs featured.

English

195

Anthropic@AnthropicAI·23 Mar

Introducing the Anthropic Science Blog. Increasing the pace of scientific progress is a core part of Anthropic’s mission. The Science Blog will feature new research and stories of how scientists are using AI to accelerate their work. Read the intro: anthropic.com/research/intro…

English

202

638

5.1K

419.6K

Rui Diao@ruidiao·23 Mar

Great work! I often enjoy reading Anthropic's tech blogs.

Anthropic@AnthropicAI

English

Rui Diao@ruidiao·23 Mar

AI models are just like us: tools matter more than raw IQ. Einstein was a genius, but an average person with a smartphone can solve things he couldn't. Same for AI. The intelligence gap between models is shrinking, but tool access is the real game-changer. Gemini’s web/YouTube search and Grok’s real-time X integration mean that even if they aren't the absolute "smartest" on paper, they can often solve the hardest real-world problems. Give an AI the right tools, and it changes everything.

English

Rui Diao@ruidiao·23 Mar

To help him or to replace him?

Polymarket@Polymarket

JUST IN: Mark Zuckerberg reportedly building a CEO AI agent to help him do his job better.

English

Keşfet

@illscience @PeterDiamandis @Abundance360 @GaryMarcus @elonmusk @BarackObama @taylorswift13 @cristiano