D. R. Arthur

27.8K posts

D. R. Arthur banner
D. R. Arthur

D. R. Arthur

@siproj

Futurist and Advanced Thought Leader - Cybersecurity as well as technology! Team leader of very first US national weather Radar put on the Internet in 1994.

Perseus Arm - Milkyway Katılım Nisan 2009
7.5K Takip Edilen1.7K Takipçiler
D. R. Arthur retweetledi
The All-In Podcast
The All-In Podcast@theallinpod·
The Besties React to Greg Brockman’s Diary in the Elon/Sam Altman Trial 😂 Jason: “ Greg Brockman, the co-founder (of OpenAI), was journalmaxxing apparently.” Friedberg: “I just don’t know why Greg Brockman's got a friggin’ diary where he's literally documenting… I mean, I love the guy, but what the f**k is he thinking?” “You're just sitting here at home like, let me write about the crime I'm committing, and let me record it, and by the way, let me never delete it.” Sacks: “It's not just journalmaxxing, it's discoverymaxxing.” Jason: “It’s like that scene from The Wire!”
English
23
28
295
43.6K
D. R. Arthur retweetledi
Grok Imagine
Grok Imagine@imagine·
Your entire creative workflow just collapsed into one infinite canvas. In @imagine Agent Mode, you can brainstorm, write, generate and edit images, then turn them into videos without leaving the page. Try it at grok.com/imagine, on desktop.
English
728
798
6.5K
30.1M
D. R. Arthur retweetledi
Science girl
Science girl@sciencegirl·
U.S. Marines recently proved that low-tech creativity can still defeat cutting-edge military artificial intelligence. In a DARPA field trial, a team of eight Marines was challenged to sneak past a sophisticated AI-powered detection system. Instead of relying on advanced stealth gear or electronic countermeasures, they turned to absurdly simple, almost cartoonish tactics and succeeded Some Marines cartwheeled and rolled across 300 meters of open ground. Others concealed themselves under ordinary cardboard boxes and slowly inched forward. One soldier even disguised himself as a small fir tree, shuffling gradually toward the objective. Remarkably, every Marine reached the target without ever triggering the AI sensors. The system had been trained extensively on normal human walking and running patterns, but it had no reference for these bizarre movements. Because the Marines’ actions fell completely outside the AI’s learned understanding of “human behavior,” they were effectively invisible to it. This exercise offers a timely lesson for the defense sector: no matter how advanced military AI becomes, it can still be outmaneuvered by human ingenuity, unconventional thinking, and old-fashioned manual tactics. This incident serves as a vital reminder for the defense industry that while AI is an incredibly powerful tool, it remains susceptible to creative human deception and the unpredictable nature of manual tactics. source: Scharre, P. (2023). Four Battlegrounds: Power in the Age of Artificial Intelligence. W. W. Norton & Company.
Science girl tweet media
English
358
1.8K
5.8K
370.8K
D. R. Arthur retweetledi
Rahul
Rahul@sairahul1·
Two Anthropic engineers spent 24 minutes exposing every Claude Code feature you didn't know existed. Most people will scroll past this. Don't be most people.
English
135
3.6K
35.7K
9.8M
D. R. Arthur retweetledi
David Sacks
David Sacks@DavidSacks·
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
AI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

English
271
572
5K
1.1M
D. R. Arthur retweetledi
Andrew Bolis
Andrew Bolis@AndrewBolis·
Most people upload a file and hope AI “figures it out.” It won’t … unless you build a system around it. NotebookLM uses Gemini’s large context window. It turns scattered files into a connected, cited research brain. Here’s the workflow power users follow: [ 🔖 bookmark this post for later ] ✨ Gemini Flash • Core engine powering NotebookLM. • Fast, accurate synthesis across large sets of sources. Sample prompt: “Highlight the main key insights across these docs.” ✨ Gemini Pro (Plus Tier) • Used for deeper reasoning and context handling. • Best for complex briefs and enterprise workflows. Sample prompt: “Draft a polished briefing using these internal docs, with cited evidence.” 1️⃣ Add Sources • Import PDFs, Docs, transcripts, and webpages. • Structure, tables, and images stay intact. 2️⃣ Source-Based Chat • Respond only using your uploaded content. • Ideal for reviews, validation, and checks. 3️⃣ Structured Study Guides • Generates summaries, timelines, and briefs. • Helpful for organizing large material sets. 4️⃣ Visual Topic Mapping • Creates visuals to show relationships between topics. • Useful for comparing topics or themes. 5️⃣ Audio Overview • Converts notebooks into spoken recaps. • Great for reviewing long content on the go. 6️⃣ Video Summaries • Creates short narrated videos with slides. • Perfect for updates or training. 7️⃣ Deep Research • Find credible references for your topic. • Expands research with relevant material. 8️⃣ Structured Data Tables • Extracts key data points into organized, sortable rows. • Turns messy, unstructured info into clean, exportable data. 9️⃣ Collaborative Notebooks • Share notebooks with teams or clients. • Structure and citations stay intact. Copy-Paste These Power Prompts: ► Summarize Content “Create a 500-word thematic summary with citations.” ► Compare Sources “Show key points across these documents and note conflicts.” ► Identify Decisions “List main decisions and attach each one to its source.” ► Generate a Brief “Assemble a brief that organizes the material into: Background → Core Points → Recommendations.” ► Create Script for Audio/Video “Write a two-host script and test weak claims.” Workflows You Can Build: Research Review: ➟ Add papers into a notebook ➟ Produce a briefing from all sources ➟ Use chat for targeted questions ➟ Create audio for on-the-go review Shared Knowledge Base: ➟ Add all project documents ➟ Build an onboarding study guide ➟ Share the notebook with your team ➟ Auto-update when new files are added Content Analysis & Creation: ➟ Add competitor material ➟ Generate a comparison summary ➟ Build a mind map of themes ➟ Export slides for presentations Build the system once and turn raw files into insights fast. Save this guide and test it on your next deep-work task. 📌 Learn 30 free AI tools in 30 days: bit.ly/48woPL4 👉 Follow me @AndrewBolis for more and 🔄 Repost this to help others use AI
Andrew Bolis tweet media
English
28
42
169
14.4K
D. R. Arthur retweetledi
General Mike Flynn
General Mike Flynn@GenFlynn·
Breaking: Comey is NOT the only INDICTMENT. Don’t lose site of another important indictment today because we are going to track accountability! Accountability is NOT optional…!!! Anthony Fauci’s top advisor for 16 years has just been indicted by the DOJ and the trail leads straight back to the top. David Morens served as a senior advisor directly to Dr. Anthony Fauci at NIAID from 2006 to 2022. His job included preparing briefings that Fauci then delivered to the White House, Congress, and the American public. He gathered information from grantees and the scientific community and helped prepare the briefings Fauci used with the President of the United States. Now he’s facing federal charges for conspiracy against the United States, falsification of records, and concealment of government documents, all tied to a coordinated effort to suppress the lab leak theory and protect EcoHealth Alliance’s funding pipeline to the Wuhan Institute of Virology. In a February 2021 email, Morens wrote that he had learned from an NIH FOIA official “how to make emails disappear after I am FOIA’d but before the search starts.” They called you a conspiracy theorist for asking questions they were actively hiding answers to. That ends NOW. Accountability is not optional justice.gov/opa/pr/former-…
English
470
4.4K
13.2K
192.4K
D. R. Arthur retweetledi
U.S. Department of Justice
U.S. Department of Justice@TheJusticeDept·
The @DOJFraudDiv is now on X!   Give them a follow and be on the lookout for a major announcement tomorrow👀
English
1.8K
8.6K
36.2K
4.8M
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
Don't try to build a self-improving AI agent without evals. You are just wasting time and compute. An agent can't improve from traces it can't evaluate. This is why it's exciting to see @FutureAGI_ going fully open source with their platform. It combines the best of all the eval tools and methods in one stack. They've shipped a set of tools to make it easier for AI devs to reliably ship self-improving agents. There is a lot to like here: - Evals for hallucination, groundedness, PII, toxicity, tool-use correctness, bias, and any custom metric. Every evaluator is readable and modifiable, not a black-box score. No vendor lock-in to worry about. - Six prompt optimization algorithms (GEPA, PromptWizard, ProTeGi, and others) that take production traces and feed them back as training signals. - Multi-turn simulation before launch, including voice agents through LiveKit, VAPI, Retell, and Pipecat. You stress test edge cases before users ever hit them. - Real-time guardrails for jailbreaks, prompt injection, and PII leaks. - OpenTelemetry-native tracing with 4+ languages (Python, TypeScript, Java, and C#), 50+ framework instrumentors (LangChain, LlamaIndex, CrewAI, AutoGen, DSPy, Haystack). - An OpenAI-compatible gateway with 100+ providers, routing strategies, and caching. If self-improving agents are the direction the field is moving, we need eval infrastructures we can actually trust and build on top of. This is that infrastructure, and now it's open. Check it out here: github.com/future-agi/fut… Generous free tier cloud-based offer here: shorturl.at/cxYOd
GIF
English
12
15
76
10.2K
D. R. Arthur retweetledi
DAIR.AI
DAIR.AI@dair_ai·
The Top AI Papers of the Week (April 19 - 26) - Skill-RAG - DeepSeek V4 - Autogenesis - Attention to Mamba - Stateless Decision Memory - Self-Evolving Logic Synthesis - Self-Generated World Knowledge Read on for more:
DAIR.AI@dair_ai

x.com/i/article/2048…

English
7
58
318
48.4K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
// Agentic World Modeling // Massive 40-author survey just dropped. Cleanest taxonomy of world models in agent research I've seen. (bookmark it) The paper proposes a "levels × laws" framework. Three capability levels: > L1 Predictors do one-step transitions > L2 Simulators do multi-step action-conditioned rollouts > L3 Evolvers self-revise as the world changes It discusses four law regimes, including physical, digital, social, scientific. They synthesize 400+ works and 100+ representative systems spanning model-based RL, video generation, web/GUI agents, multi-agent simulation, and scientific discovery. The framework also identifies failure modes and proposes evaluation principles for each level. Why it matters: as agents shift from chatbots to goal-accomplishers, the bottleneck moves from language to environment. This is the first paper that gives builders a shared vocabulary for designing and evaluating world models across communities that have been working in isolation. Paper: arxiv.org/abs/2604.22748 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
15
77
369
34.3K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
"AI should elevate your thinking, not replace it." I don't disagree, but the issue is that current LLMs are not really trained to support that out of the box. I've solved this by building my own agent harness (retrieval, verification, memory, multi-agent architecture, skills, etc.). That's how important agent harnesses are today. Even with simple skills (.md files), you can already get far, so even non-technical folks can improve the "human-centered augmenting" capabilities of LLMs/agents. Continual learning promises to solve this, but we are so early on this. People need to understand that in-context learning works great for this. Today's LLMs are steerable if YOU spend time building and optimizing your workflows. Self-improving agents don't work as well because the incentives are not there. A good mindset is that every output you get from an LLM should be reused in some way, let it work for you, and make you and the agent better in the next session. So this has to come from you. You are the only one with the incentives to make it work for you the way you want. Don't wait for anyone to build it for you. Use AI to build the AI you want. Own the harness.
elvis tweet media
English
29
13
102
10.3K
D. R. Arthur retweetledi
DAIR.AI
DAIR.AI@dair_ai·
Pay attention to this one, AI devs. If you're building multi-agent systems, you're probably wiring static org charts. New research argues they should look more like a labor market. The paper introduces OneManCompany (OMC). Instead of fixed teams, it defines "Talents," portable agent identities that bundle skills and tools, and a "Talent Market" where they get recruited dynamically per task. An Explore-Execute-Review tree search decomposes work hierarchically and aggregates results back up. On PRDBench: 84.67% success, +15.5 points over prior SOTA. Generalizes across domains in their case studies. Why it matters: pre-wired multi-agent pipelines break the moment tasks drift outside their design envelope. Treating agents as a recruitable workforce, not a fixed graph, gets you self-organization and continuous improvement by default. A useful frame for any open-ended agent system where you don't know the task distribution ahead of time. Paper: arxiv.org/abs/2604.22446 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
18
55
364
29.2K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
A few notes on how to get started with building LLM Knowledge Bases. @karpathy popularized it but most people don't know where to start. Everyone should be creating LLM Wikis. Live session tomorrow. Shared a repo example and a Skill coming soon. academy.dair.ai/blog/how-to-bu…
elvis tweet media
English
19
26
197
19.6K
D. R. Arthur retweetledi
Nick Levine
Nick Levine@status_effects·
New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
English
163
349
2.8K
961.1K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
// From Skill Text to Skill Structure // One of the more practical skill papers I've seen this month. SKILL.md files entangle invocation interface, execution flow, and tool/resource side effects in one blob of natural language. This makes downstream discovery and risk review brittle. New research proposes SSL, a three-layer typed JSON representation: a Scheduling layer for invocation signals, a Structural layer for execution scenes, and a Logical layer for atomic actions and resource use. It's drawn from Schank and Abelson's classical work on scripts, MOPs, and conceptual dependency. An LLM-based normalizer converts existing SKILL.md files into this structure. The numbers: Skill Discovery MRR jumps from 0.573 to 0.707, and Risk Assessment macro F1 from 0.744 to 0.787. They release a 6,184-skill corpus, 403 task queries, and 500 risk-labeled skills. As skill registries scale, you can't keep treating capability packages as unstructured prose. Paper: arxiv.org/abs/2604.24026 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
19
32
209
16.4K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
// Agentic Harness Engineering // Pay attention to this one, AI devs. (bookmark it) Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution. This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes. Each edit becomes a contract you can verify or revert. Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO. The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified. Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise. Paper: arxiv.org/abs/2604.25850 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
64
233
1.6K
133.4K
D. R. Arthur retweetledi
elvis
elvis@omarsar0·
@EngrammeHQ Very interesting direction. Proactive memory recall feels like a fundamental primitive of modern AI agents. I shouldn't have to search for context that exists already. The brain doesn't work like that, and AI agents shouldn't either. Great launch!
English
1
1
55
12.3K