Muad_Dib

312 posts

Muad_Dib banner
Muad_Dib

Muad_Dib

@b162543

AI-driven quant research, built in (public). Publishing what survives walk-forward — and what doesn't. Plus frontier tech notes.

Mars Katılım Ocak 2021
1.4K Takip Edilen51 Takipçiler
Muad_Dib
Muad_Dib@b162543·
The 10x reduction in time-to-incident is the metric that should be in every agent platform’s security pitch — not in their post-incident postmortem. Curious what your detection signal was. We’re seeing the most reliable trigger come from short-lived token TTL violations — agent tries to use a token past its expected lifetime, gets blocked, security team gets paged. That single signal catches most behavioral anomalies before they escalate. The engineering tax framing is right. VPC isolation, proxy tokens, sandboxed executors — none of these are individually expensive. The cost is the integration work and the discipline to not skip them when you’re shipping fast.
English
0
0
0
6
Thiago Salvador
Thiago Salvador@bettercallsalva·
@b162543 Exactly the gap. Most agent platforms ship with raw API keys and ambient process state and call it 'production ready'. Built one with VPC-isolated executors and time-to-incident on misbehaving agents dropped 10x. Worth the engineering tax.
English
1
0
1
13
Muad_Dib
Muad_Dib@b162543·
This is the architecture every “AI agent platform” should have shipped from day one. Most didn’t. Hardware-isolated sandboxes per task + VPC-level storage separation + short-lived proxy tokens instead of raw API keys + safety detection on agent-accessed content. Four design choices that turn the biggest agent security gaps into structural impossibilities. The Vercel breach last month was caused by exactly the opposite pattern: long-lived OAuth tokens from a third-party AI tool that gave attackers escalation paths into customer environments. Perplexity is showing what “secure by default” looks like when it’s an architectural commitment, not a post-incident retrofit. The pattern across the agent ecosystem this month: Composio brokers OAuth, TrustClaw sandboxes execution, Notion adds platform-level rate limiting and observability, Perplexity isolates compute and storage per task. Security architecture stopped being a differentiator and became the table stakes for any agent platform serious about enterprise adoption. Anyone shipping agents without these four primitives is one prompt injection away from a breach.
Aravind Srinivas@AravSrinivas

Perplexity is building one of the most secure scalable agent runtime sandboxes in the market right now. A blog post on how we: 1. Handle proxy API keys for agents securely 2. Run safety detection for all content accessed by agents 3. Encrypt data passed via connectors to agents 4. Decouple storage and compute reliably

English
1
0
1
41
Muad_Dib
Muad_Dib@b162543·
The strategic move buried in this launch isn’t the Developer Platform. It’s that Notion stopped competing with Slack, Asana, Linear, Jira — and started competing with the orchestration layer between all of them. Six building blocks shipped today change what Notion is: - Syncs: continuous upsert from Zendesk, Salesforce, Postgres into Notion databases. Cursor-based, schema-declarative. - Worker Tools: deterministic x→y→z logic exposed as a single agent tool. More reliable than LLM reasoning, fraction of the cost. - Webhooks: any app can now trigger Notion workflows (PR merged → Notion agent → Slack kudos in one flow). - External Agents API: bring Claude, Codex, Cursor, Warp, Cognition, Decagon agents into Notion natively. They’ve partnered with the whole agent ecosystem. - Agent SDK: trigger Notion agents from any app via streaming or async polling. Embed in Slack, Discord, your product. -ntn CLI: full Notion API + Worker deployment from terminal. Built for humans AND coding agents. The closing observation Ivan Zhao made: “Everyone can work with agents in Notion, not just engineers.” That’s the unlock. Notion already owned the canvas where non-technical teams worked. Now it owns the orchestration surface for the agents that work alongside them. @openbb_finance did this for financial data. Composio did this for tool execution. Notion just did it for cross-app orchestration with human-readable canvas. The race for the agent operating system is over. It’s not Slack. It’s not the IDE. It’s the document.
Notion Developers@NotionDevs

Introducing: the Notion Developer Platform New building blocks that help you (and your coding agents) sync any data source, build any tool, and orchestrate any agent. Follow along 👇 twitter.com/i/broadcasts/1…

English
0
0
2
42
Muad_Dib
Muad_Dib@b162543·
This is the change power users have been asking for since Agent SDK launched. The old model: your Pro/Max subscription rate limits were shared across Claude Code, claude -p, Agent SDK builds, GitHub Actions, and chat. Run an Autoresearch loop overnight and you’d burn through your interactive budget by morning. The new model: programmatic usage gets its own credit bucket. Build agents, run scheduled tasks, ship GitHub Actions workflows — all without cannibalizing the rate limits you need for interactive coding. The strategic move buried in this: Anthropic is treating “developer running automated workflows” as a distinct user category from “developer using IDE.” Different budgets, different pricing logic, different optimization target. This is what infrastructure maturity looks like — separating workload classes at the billing layer, not just the API layer. For anyone running Karpathy-style Autoresearch or production agent pipelines on their Pro/Max plan, June 15 is the date.
ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English
1
0
0
78
Muad_Dib
Muad_Dib@b162543·
The friction between “I need this data” and “I have this data running in my workflow” was the most underrated bottleneck in quant research. Currently using OpenBB Workspace for X_Quant — the App Marketplace removes the worst part of the loop: sales calls, IT tickets, integration sprints just to trial a data feed. Browse, trial, connect API key when ready. The strategic move buried in this launch: OpenBB is becoming the data broker layer for systematic research. Same architectural play as Composio for agent tools, but for financial data. Vetted providers + interactive widgets + AI-native workspace is the right primitive for the next generation of quantitative research stacks.
OpenBB@openbb_finance

Today we're launching the OpenBB App Marketplace. Financial data apps from vetted providers with charts, tables, documents, and other interactive widgets. All live inside the OpenBB Workspace and ready to explore with AI. Find a provider, trial their data in your actual workflow, and connect an API key when you're ready. Skip sales calls, IT tickets, and integration sprints. The data you need is now a few clicks away. Early partners include BlueGamma, @findatasets, Outsampler, @synorb, @koinju, Open Portfolio (Alberto Gallini), VecViz (Rodger Coyne), @axioradev, Exponential Technology, and @velo_xyz . More coming soon. 🌐 Go to the OpenBB Workspace (pro.openbb.co) → Apps page → Marketplace tab

English
1
0
0
37
Muad_Dib
Muad_Dib@b162543·
Cross-model adjudication has been the biggest win for me. One Claude instance implements the fix, separate instance verifies against the spec without seeing the implementation. Catches “looks right, isn’t right” failures that linters miss. Curious if your team is moving toward statistical gates (reject if behavior drifts X sigma from baseline) or staying with binary pass/fail.
English
0
0
0
24
Delba
Delba@delba_oliveira·
You can extend every step of Claude Code's agentic loop. I've been thinking a lot about what that means for the last one. What are you doing to help Claude verify its own work? Genuinely want to hear what workflows people have.
English
19
4
165
10.2K
Muad_Dib
Muad_Dib@b162543·
The verify step is where most agentic workflows quietly degrade. What I do in X_Quant (274K LOC, systematic trading) for Claude verification beyond linters and tests: - Cross-model adjudication: Claude writes the fix, separate Claude instance reviews it against the original spec without seeing the implementation. Catches “looks right, isn’t right” failures. - Property-based assertions over examples: instead of “test X returns Y”, verify invariants Claude can’t game. For trading: PnL conservation, no-arbitrage constraints, returns distribution moments stay in bounds. - Statistical hypothesis as gate: if the change passes 100 deterministic tests but the Sharpe ratio drops by 2σ from baseline, the change is rejected even if everything “compiles.” Behavior matters more than syntax. - Adversarial replay: replay the last 50 production-relevant scenarios. If any silently changes output, halt. The pattern is the same across domains: the verify step needs evaluators that operate at a different abstraction level than the actor. Tests check the artifact. Evaluators check the behavior the artifact produces.
Delba@delba_oliveira

You can extend every step of Claude Code's agentic loop. I've been thinking a lot about what that means for the last one. What are you doing to help Claude verify its own work? Genuinely want to hear what workflows people have.

English
1
0
0
54
Muad_Dib
Muad_Dib@b162543·
You can now type one line and Claude won’t stop until it’s done. /goal all tests pass and lint is clean Claude runs. Tests fail. Claude reads, fixes, retries. A second model checks if the goal is met after every turn. Loop continues until “done.” This is the moment Claude Code stopped being a chat tool. Read the docs — it’s only 5 minutes and it changes how you work.
ClaudeDevs@ClaudeDevs

How do you keep Claude working until the job is done? Claude Code helps with this in a few ways, including one we shipped recently: /goal.

English
0
0
0
20
Muad_Dib
Muad_Dib@b162543·
Fast mode is the most underrated release of the week, and most people will miss why. It’s not a new model. It’s the same Opus 4.6 with a different API configuration — 2.5x faster at the cost of $30/$150 MTok vs standard pricing. The detail that matters: this is Anthropic decoupling capability from latency at the infrastructure level. Same weights, two delivery profiles. Pay for speed when you need rapid iteration, pay for cost when you’re running batch jobs. This is the same architectural insight Perplexity wrote about yesterday with GB200 prefill-decode disaggregation. The future of LLM serving isn’t “one model, one price.” It’s “one model, multiple delivery tiers calibrated to workload.” The fact that Anthropic shipped this simultaneously across Claude Code, Cursor, Emergent, Factory AI, v0, Warp, and Windsurf tells you everything about the distribution play. They’re not asking developers to switch IDEs — they’re meeting developers where they already work. Static model pricing is over. Tiered serving is the new default.
ClaudeDevs@ClaudeDevs

Fast mode for Claude Opus 4.7 is now available in research preview on the API and in Claude Code.

English
0
0
1
49
Muad_Dib retweetledi
Watcher.Guru
Watcher.Guru@WatcherGuru·
JUST IN: Google $GOOGL in talks with Elon Musk's SpaceX to launch data centers in space.
Watcher.Guru tweet mediaWatcher.Guru tweet media
English
435
744
7.8K
776.8K
Muad_Dib
Muad_Dib@b162543·
The cognitive ceiling argument is the strongest part of this thesis. I see the same pattern in quant systematic trading — strategies that survive 4 layers of validation (Sharpe → DSR → walk-forward → out-of-sample) routinely fail on the 5th (cross-regime stability) because no human reviewer holds all five framings in memory simultaneously. The failures hide at the interaction layer. Curious about your sub-agent orchestration approach. Are the parallel exploration threads sharing intermediate findings via a central memory, or running fully isolated and then merged at the end? The former is more powerful but harder to keep coherent at 7+ layers. The latter is easier to scale but loses cross-thread insights. This is the right problem to be working on.
English
0
0
0
38
Muad_Dib
Muad_Dib@b162543·
The $250K bounty is real and verifiable on HackenProof. That’s the receipt. The technical claim worth pulling out: human auditors hit a cognitive ceiling at 4-5 layers of interacting system logic. Most critical bugs that survive audits live below that ceiling. AI doesn’t have that ceiling. This matches what López de Prado wrote about quant strategy validation — the failure modes that destroy you are the ones where 5+ assumptions interact in ways no single human can hold in working memory. Same principle, different domain. The frontier model labs haven’t solved this internally. Greg.ai built a reasoning harness on top of existing models that traces logic across 7+ system layers in parallel via spawned sub-agents. That’s the unlock. Same model weights, completely different output. One detail worth verifying before going full bull on this: the post claims confirmed live vulns also found in Ethereum, Lido, Chainlink, Aave, Uniswap, Polygon. Only one of those has a public $250K HackenProof receipt. The other findings should have public CVEs or postmortems by now. If they exist, this is genuinely the largest AI security event of 2026. If they don’t, the framework is still impressive but the marketing is ahead of the verification. Either way: deep multi-system reasoning is the next moat. Static scanners are obsolete.
riptide@0xriptide

x.com/i/article/2054…

English
0
0
0
33
Muad_Dib
Muad_Dib@b162543·
The prefill-decode split everyone is talking about as a hardware optimization is actually a fundamental rethink of how LLM serving works. Prefill is one big matmul against the prompt — embarrassingly parallel, compute-bound, scales with FLOPS. Decode is autoregressive — one token at a time, latency-bound, scales with memory bandwidth and interconnect speed. For years we ran both on the same GPUs because we had no choice. Hopper’s NVLink domain was too small to make disaggregation economical. GB200’s NVL72 rack-scale domain (72 GPUs in one coherent memory space) finally makes the math work — you can dedicate compute-optimized nodes to prefill and bandwidth-optimized nodes to decode, then move KV cache between them at NVLink speeds instead of network speeds. The interesting second-order effect: this is why Cerebras and Groq are dangerous. They’re already disaggregated by architecture. When the rest of the industry catches up to “serving needs two different chip profiles,” they had it from day one. The MoE inference race isn’t going to be won by whoever has more H100s. It’s going to be won by whoever orchestrates prefill, decode, and expert routing across heterogeneous compute fastest.
Perplexity@perplexity_ai

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

English
0
0
0
65
Muad_Dib retweetledi
NSF - NASASpaceflight.com
NSF - NASASpaceflight.com@NASASpaceflight·
Starship Flight 12: Navigational warnings indicate SpaceX is now targeting NET May 19 for launch. (Source: NGA. Graphics @NeedPizza42)
NSF - NASASpaceflight.com tweet mediaNSF - NASASpaceflight.com tweet media
English
21
127
1.3K
69.8K
Muad_Dib
Muad_Dib@b162543·
Both products are pulling away from the “AI chat that writes code” framing. Claude Code is becoming an agent fleet manager. Codex is becoming a coding surface that lives everywhere. Six months ago they were competing for the same workflow. Today they’re competing for different parts of the developer’s day. That’s how you tell the AI race has matured: when the products stop trying to be everything to everyone.
English
1
0
0
46
Muad_Dib
Muad_Dib@b162543·
Both Anthropic and OpenAI shipped massive Claude Code and Codex updates in the last 30 days. I read every changelog so you don’t have to. The TL;DR: they’re building completely different products under the same “coding agent” label. A breakdown of what shipped, what it means, and which one fits which workflow 🧵
English
1
0
0
40