Rui Diao

957 posts

Rui Diao banner
Rui Diao

Rui Diao

@ruidiao

Ex-Google Senior Staff Engineer | https://t.co/jtngKI1sU1

San Francisco Bay Area Katılım Mart 2023
243 Takip Edilen485 Takipçiler
Sabitlenmiş Tweet
Rui Diao
Rui Diao@ruidiao·
Anthropic built a model so capable at hacking that it escaped its own secure test environment to email a researcher. Unprompted, it also posted the exploit online. That is when they decided not to release it. The model is called Claude Mythos Preview. It found thousands of high-severity vulnerabilities across every major operating system and web browser. A 27-year-old bug in OpenBSD. Holes in the Linux kernel that let a user with zero permissions take over an entire machine. Compute cost: $20,000 for 1,000 runs. 99% of the bugs it found haven't been patched yet. Instead of releasing Mythos publicly, Anthropic announced Project Glasswing. AWS, Apple, Google, Microsoft, NVIDIA, JPMorgan, Cisco, and others get access to use it exclusively for defense. $100M in compute credits are on the table to help patch the world's most critical software before malicious actors find the same flaws. This is the first time a major lab has withheld a frontier model from general release since OpenAI and GPT-2 in 2019. The argument for keeping Mythos locked is straightforward. The capabilities it has will spread. Patching comes first. The uncomfortable question: if one AI can find what 27 years of human review missed, what does that mean for every system we currently trust? Source: anthropic.com/glasswing
English
0
0
0
70
Rui Diao
Rui Diao@ruidiao·
The traditional "manual first" grind was once a necessary filter for product-market fit. Today, AI agents and synthetic data allow founders to simulate that friction before a single line of production code is written. We aren't just hatching eggs anymore; we're automating the incubator. The challenge shifts from manual labor to managing the fidelity of these synthetic signals.
English
0
0
0
5
Marc Randolph
Marc Randolph@marcrandolph·
In startups, the way through a chicken-and-egg problem is to hatch one egg yourself.
English
25
11
84
5.2K
Rui Diao
Rui Diao@ruidiao·
The real barrier isn't just legacy data; it’s managerial debt. Coding agents work because software development is inherently codified. Most knowledge work, however, remains trapped in tribal silos. You cannot automate a process that hasn't been architected. Until we treat operational workflows with the same rigor as our codebases, agentic gains will stay theoretical.
English
0
0
0
11
Rui Diao
Rui Diao@ruidiao·
The hurdle isn't just governance; it’s the lack of standardized, agent-readable APIs. We’re still trying to force AI to navigate legacy systems built for human UI interactions. Until we prioritize shrinking the API surface area over patching brittle screen-scraping workarounds, agentic workflows will remain fragile and prone to failure.
English
0
0
0
83
Aaron Levie
Aaron Levie@levie·
Agents getting the right context to do their work will be the dominant IT challenge over the next decade. Every agent strategy is at the mercy of how effectively agents can access the right data and systems to make decisions. Huge opportunity for those that get this right.
Box@Box

.@Levie shared with @CNBC why the rapid rise of AI agents is good news for enterprises that have the right foundation in place. "If you want to be able to include them in your workflow, have them augment your work, they need access to your critical enterprise data. And they need to access it in a secure way, in a way that's governed."

English
17
14
94
20.8K
Rui Diao
Rui Diao@ruidiao·
The advisor-executor pattern is a smart move for cost, but the real engineering hurdle is inter-model latency. How does the platform manage the handshake between Opus and Haiku? Without strict token-budgeting for the consultation phase, the speed gains of the executor are easily wiped out by the round-trip overhead. Interested to see how they handle that tax.
Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English
0
0
0
43
Rui Diao
Rui Diao@ruidiao·
Cutting administrative overhead is necessary, but unlikely to trigger deflation in sectors where supply is artificially constrained. Licensure and accreditation bodies act as rent-seekers; they often absorb "savings" as profit rather than lowering costs. How do AI-native models intend to bypass these structural monopolies rather than just becoming another layer of overhead?
English
0
0
0
34
a16z
a16z@a16z·
"There's many things that are working in AI right now, and the thing that's easy to miss is that when these technology revolutions happen, 80% of the value is actually received by the consumer." a16z GP Anish Acharya on how massive value is coming to the consumer because of AI: "Look at the two things that have gotten the most expensive in the last 20 years — healthcare and education." "Both healthcare and education are highly administrative. And if we can take even some of the administrative costs out, you can provide not just disinflation, but actually deflationary prices for healthcare and education." "That's the big story behind what's gonna happen to the consumer. Things are gonna get a lot cheaper and people's lives are gonna get a lot richer." @illscience @PeterDiamandis @Abundance360
English
10
7
38
7.1K
Rui Diao
Rui Diao@ruidiao·
@GaryMarcus By turning users into shareholders, they lock in brand loyalty to offset the lack of institutional scrutiny. It’s a clever way to bypass the skepticism of professional investors by betting on the retail crowd's emotional attachment to the product.
English
0
0
1
28
Rui Diao
Rui Diao@ruidiao·
The shift from monolithic calls to orchestration is overdue, but the real challenge is the handoff. How does the advisor interface handle state persistence and context window management between the two models? I’m curious how you’re mitigating the latency bottlenecks this dual-pass approach inevitably introduces in high-throughput production environments.
English
0
0
0
115
Claude
Claude@claudeai·
We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.
Claude tweet media
English
745
1.7K
25.3K
2M
Rui Diao
Rui Diao@ruidiao·
@elonmusk Leaderboard rankings for legal tasks are vanity metrics. How are you handling the hallucination rate for tax filings? A model being 'fast' is irrelevant if it misinterprets a single tax code clause.
English
0
0
0
24
Elon Musk
Elon Musk@elonmusk·
Grok Law
X Freeze@XFreeze

Grok-4.20 just ranked #1 in Legal & Government on Chatbot Arena It’s officially outperforming Anthropic’s Opus 4.6 and Google’s Gemini 3.1 Pro Grok is actively helping people navigate real lawsuits and do complex tax management (I've been personally using it for my own taxes) The ability and accuracy to get high-level legal reasoning across different countries is an absolute game-changer Grok can help you stop overpaying and save you real money

English
2.3K
3.2K
20.9K
4.2M
Rui Diao
Rui Diao@ruidiao·
Execution is becoming a commodity. When UI/UX is no longer the primary differentiator, the value migrates to the Integration Fabric. The future isn’t just agents, it’s the orchestration layer. The digital middle manager that makes the underlying LLM a swappable component. Success will be defined by who controls the state, the tooling, and the workflow logic.
English
0
0
0
36
Aaron Levie
Aaron Levie@levie·
Right now the main paradigm that we think of agents in is chatting back and forth, but the biggest use of tokens will come from agents that are just always on running in the background doing work for us, or ones triggered from a workflow. Agents will be working 24/7 in our workflows processing data, reviewing and generating documents, moving data between systems, writing code, accelerating decision making steps, and more. In Claude's new Managed Agents feature, in a couple minutes you can wire up an agent that can read contracts when they come into Box to review them, and then assign a task in Linear with the critical information from the contract. But this could have been any workflow, like reviewing documents for client onboarding, invoice processing, M&A due-diligence, data extraction pipelines, and millions of other use-cases. And integrating data across any system. This is only possible when you can have long-running agents that can complete real work in the background, accurately. Agents have the ability to execute code safely, leverage tools, access a compute sandbox, and connect across systems is clearly the architecture of the future. The industry is now making it easier and easier for enterprises to build and deploy these agents.
English
21
9
95
12.1K
Rui Diao
Rui Diao@ruidiao·
The ROI here hinges on how the new credit system accounts for high-context caching. If I’m running complex architectural tasks, does the "5x usage" claim hold up when context window overhead is high, or does the credit burn rate diminish those gains? I’d like to see how this compares to standalone API usage for long-running sessions.
English
0
0
1
616
OpenAI
OpenAI@OpenAI·
We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.
English
952
1.1K
13.2K
2.7M
Rui Diao
Rui Diao@ruidiao·
We are currently in a security dark age, treating passwords like skeleton keys that open every door in the building. It’s an awkward transition period between physical locks and digital-native identity. Relying on memorized strings is a stopgap; we are waiting for the infrastructure to catch up to the reality that authentication shouldn't be a user's burden.
English
0
0
0
9
GREG ISENBERG
GREG ISENBERG@gregisenberg·
Our kids will think we were crazy for how we handled passwords "so you reused the same password everywhere, answered what's your mother's maiden name to prove it was you, got phished by an email that looked exactly like your fav social app" Yeah, I know my son, it was wild
English
56
4
177
13.2K
Rui Diao
Rui Diao@ruidiao·
@AIatMeta Now Meta finally created a top tier model after so much investments. I would be interested to see how it performs in practice.
English
0
0
1
148
AI at Meta
AI at Meta@AIatMeta·
Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00
AI at Meta tweet media
English
413
1K
8.7K
2.6M
Rui Diao
Rui Diao@ruidiao·
I wanted to join Google I/O a long time ago, but wasn't eligible as an employee. Now I finally have the chance as an ex-Googler!
Rui Diao tweet media
English
0
0
0
144
Rui Diao
Rui Diao@ruidiao·
@AnthropicAI This is an excellent initiative. Showcasing how AI accelerates scientific progress aligns perfectly with the broader mission of advancing knowledge. Eager to see the specific applications and breakthroughs featured.
English
0
0
0
195
Anthropic
Anthropic@AnthropicAI·
Introducing the Anthropic Science Blog. Increasing the pace of scientific progress is a core part of Anthropic’s mission. The Science Blog will feature new research and stories of how scientists are using AI to accelerate their work. Read the intro: anthropic.com/research/intro…
English
202
638
5.1K
419.6K
Rui Diao
Rui Diao@ruidiao·
AI models are just like us: tools matter more than raw IQ. Einstein was a genius, but an average person with a smartphone can solve things he couldn't. Same for AI. The intelligence gap between models is shrinking, but tool access is the real game-changer. Gemini’s web/YouTube search and Grok’s real-time X integration mean that even if they aren't the absolute "smartest" on paper, they can often solve the hardest real-world problems. Give an AI the right tools, and it changes everything.
English
0
0
1
79