Diego

29 posts

Diego banner
Diego

Diego

@diegocaples

Founder @ Markov Robotics

San Francisco, CA Katılım Kasım 2020
167 Takip Edilen78 Takipçiler
Diego retweetledi
Greg Tarr
Greg Tarr@Greg_Tarr·
we @agi_inc just achieved 76.3% on the OSWorld benchmark taking the #1 spot from ByteDance (53.1%)
English
2
1
11
758
Diego retweetledi
AGI, Inc.
AGI, Inc.@agi_inc·
AGI surpasses human-level performance at computer use. We’re excited to announce that AGI, Inc. is now the global leader on OSWorld-Verified, the industry benchmark for AI computer-control. agi-0 is the first agent to reach a superhuman score on OSWorld, with a score of 76.2%.🔥 Learn more about in it our company blog post from @_gundawar: 👇 theagi.company/blog/osworld
AGI, Inc. tweet media
English
14
9
176
47.9K
Diego retweetledi
Weights & Biases
Weights & Biases@wandb·
🏆 Grand Prize Winners: Daydreamer @diegocaples @_gundawar They're tackling the "GPT Moment for Robotics." Their agent uses a video diffusion model to imagine a successful outcome, executes it in the real world, and then uses VLM feedback to self-improve, training only on its successes.
English
1
1
10
1.2K
Diego retweetledi
Stephen James
Stephen James@stepjamUK·
𝗗𝗟𝗥 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗴𝗮𝘃𝗲 𝗮 𝗿𝗼𝗯𝗼𝘁𝗶𝗰 𝗮𝗿𝗺 𝗳𝘂𝗹𝗹-𝗯𝗼𝗱𝘆 𝘁𝗼𝘂𝗰𝗵 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆 𝘄𝗶𝘁𝗵 𝗻𝗼 𝗮𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝘀𝗸𝗶𝗻 𝗻𝗲𝗲𝗱𝗲𝗱. They used internal force-torque sensors at 8 kHz + deep learning. The robot can feel where you touch it, recognize letters drawn on its surface, and respond to virtual buttons placed anywhere on its body. What's interesting is the infrastructure behind it. To train these models, you need high-frequency sensor streams, manifold learning to unfold trajectories, and the ability to iterate fast. They collected 2,300 samples from 20 people and hit 95.5% accuracy on digit recognition. This is what's possible when you have the right data infrastructure. 📄 lnkd.in/exgWfeXf Video credit: @DLR_en
English
56
344
2.3K
173.5K
Diego retweetledi
AGI, Inc.
AGI, Inc.@agi_inc·
AGI, Inc. is now the global leader on the AndroidWorld benchmark, with state-of-the-art verified performance of 97.4% This is a huge milestone for Android use, and just a sneak preview of what's coming - bringing trustworthy, reliable agents to every screen 🚀
AGI, Inc. tweet media
English
31
87
300
50.2K
Diego retweetledi
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node" The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm. I certainly didn't know.
Lucas Beyer (bl16) tweet media
English
333
534
8.9K
772.9K
Diego retweetledi
Crémieux
Crémieux@cremieuxrecueil·
Waymo is so safe that if every car was driven like a Waymo, about 9% of America's life expectancy gap would disappear. 9 percent Americans die in car accidents *that often*.
Crémieux tweet media
Crémieux@cremieuxrecueil

About 10% of America's life expectancy shortfall is due to motor vehicle accidents—fixed by Waymo and Tesla. About 18% is overdoses—that's fading now. And the lion's share of the rest is obesity-related. America will soon be buying its way out of its poor life expectancy issue.

English
228
648
6.6K
1.2M
Tiffany Soe
Tiffany Soe@tiff_soerianto·
github.com/dCaples/AutoDi… super cool stufff. creates a database with RAG and then generates Q&A from your documents, that it then trains/fine tunes the model llama3.1 8B.
English
1
0
1
39
Diego
Diego@diegocaples·
@jxmnop Because the models produced by this method are very different than the models learned by gradient descent. While this does give us a “ground truth” to benchmark interp methods on, the results don’t generalize to actual learned models.
English
0
0
7
320
Jack Morris
Jack Morris@jxmnop·
another incredibly underrated paper: Thinking Like Transformers (Weiss et al, 2021) presents RASP: a programming language that compiles to transformer *weights*. can implement sort(), bincount(), etc. seems important. why don't interpretability people care about this?
Jack Morris tweet media
English
35
105
1.2K
83.2K
Diego retweetledi
PicoCreator - AI builder @ AIE 🇸🇬
SOTA AI agent that reliably works... where Claude, Gemini, and o3 fail... to do the boring chores in life... @FeatherlessAI is making this possible, as part of our work into AI reliability Surpassing existing frontier models & agents by 50%+
PicoCreator - AI builder @ AIE 🇸🇬 tweet media
English
4
21
118
26.9K
Diego retweetledi
AGI, Inc.
AGI, Inc.@agi_inc·
🚀 INTRODUCING REAL Bench: Our New Standard for Web AI Agent Evaluation We're thrilled to announce the release of REAL Bench - our groundbreaking benchmark to transform how web AI agents are evaluated! Why we created REAL Bench: ✅ We built functional replicas of popular websites to test what agents can REALLY do ✅ We wanted to measure ACTUAL performance, not academic abstractions ✅ We compared leading frameworks including BrowserUse (31%) and StageHand (19%) What web tasks would YOU like to see AI agents tackle? Join our community to be part of the agentic revolution reshaping AI! ⚡ 👉 Explore REAL Bench → [realevals.xyz] 🛠️ Try REAL Bench and get your REAL score today → [github.com/agi-inc/agisdk]
English
18
16
161
120K
Diego retweetledi
Div Garg
Div Garg@divgarg·
Learn to build AGI agents you actually want to work with 🔥 Sign up and follow 👉: theagi.company/course In collaboration with @AndrewYNg and @DeepLearningAI!
Andrew Ng@AndrewYNg

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with @agi_inc and taught by @divgarg and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape DeepLearning.AI's website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here: deeplearning.ai/short-courses/…

English
6
8
71
15.4K
Diego retweetledi
DeepLearning.AI
DeepLearning.AI@DeepLearningAI·
AI agents that can browse the web, fill out forms, and even place online orders are no longer just research demos—they’re being built today. But real-world websites are complex. Layouts change. Popups appear. And one wrong click can cascade into booking the wrong flight or buying the wrong product. In our new course, Building AI Browser Agents, made in collaboration with @agi_inc, you’ll learn how to build web agents and how to make them more reliable using AgentQ, a framework that helps agents self-correct. Guided by instructors @divgarg and @namangarg0, you’ll build agents step-by-step: from scraping and summarizing, to signing up for newsletters, to navigating the open web and choosing optimal actions. 👉 Learn for free: hubs.la/Q03hDWK10
English
11
74
420
38.2K
Diego retweetledi
AGI, Inc.
AGI, Inc.@agi_inc·
Good AGI agents complete tasks. Great ones check their own work. Discover how to build them in our new course with @DeepLearningAI Enroll Now! bit.ly/4i9yR8U
DeepLearning.AI@DeepLearningAI

AI agents that can browse the web, fill out forms, and even place online orders are no longer just research demos—they’re being built today. But real-world websites are complex. Layouts change. Popups appear. And one wrong click can cascade into booking the wrong flight or buying the wrong product. In our new course, Building AI Browser Agents, made in collaboration with @agi_inc, you’ll learn how to build web agents and how to make them more reliable using AgentQ, a framework that helps agents self-correct. Guided by instructors @divgarg and @namangarg0, you’ll build agents step-by-step: from scraping and summarizing, to signing up for newsletters, to navigating the open web and choosing optimal actions. 👉 Learn for free: hubs.la/Q03hDWK10

English
7
32
65
13.7K
Diego retweetledi
Diego retweetledi
AGI House SF
AGI House SF@AGIHouseSF·
Anthropic brought Model Context Protocol to life. We gathered 200+ elite hackers for 12 hours to build the open source future of AI agent connections. Here's what we saw at the Finally Connected MCP Hackathon, where LLMs met the real world, with @AnthropicAI, @SmitheryDotAI, @kodjima33, @ExaAILabs, by @JvNixon:
AGI House SF tweet media
English
5
23
95
21.1K
Diego retweetledi
AGI House SF
AGI House SF@AGIHouseSF·
1/ AutoMCP 🥇 1st Place ToolMaster RL - Training open-source LLMs to excel with MCPs through reinforcement learning. This project creates an environment where models learn tool usage through trial and error rather than prompt engineering. "Reinforcement Learning is All You Need" for transforming mediocre open-source models into tool-using experts that rival closed-source alternatives. Diego Caples, @diegocaples Thomas Joshi, @thomastjoshi Meghna Natraj, @NatrajMeghna Xiangyi Li, @xdotli
AGI House SF tweet media
English
6
6
31
5.6K
Donald J. Trump
Donald J. Trump@realDonaldTrump·
On Trade, I have decided, for purposes of Fairness, that I will charge a RECIPROCAL Tariff meaning, whatever Countries charge the United States of America, we will charge them - No more, no less! For purposes of this United States Policy, we will consider Countries that use the VAT System, which is far more punitive than a Tariff, to be similar to that of a Tariff. Sending merchandise, product, or anything by any other name through another Country, for purposes of unfairly harming America, will not be accepted. In addition, we will make provision for subsidies provided by Countries in order to take Economic advantage of the United States. Likewise, provisions will be made for Nonmonetary Tariffs and Trade Barriers that some Countries charge in order to keep our product out of their domain or, if they do not even let U.S. businesses operate. We are able to accurately determine the cost of these Nonmonetary Trade Barriers. It is fair to all, no other Country can complain and, in some cases, if a Country feels that the United States would be getting too high a Tariff, all they have to do is reduce or terminate their Tariff against us. There are no Tariffs if you manufacture or build your product in the United States. For many years, the U.S. has been treated unfairly by other Countries, both friend and foe. This System will immediately bring Fairness and Prosperity back into the previously complex and unfair System of Trade. America has helped many Countries throughout the years, at great financial cost. It is now time that these Countries remember this, and treat us fairly – A LEVEL PLAYING FIELD FOR AMERICAN WORKERS. I have instructed my Secretary of State, Secretary of Commerce, Secretary of the Treasury, and United States Trade Representative (USTR) to do all work necessary to deliver RECIPROCITY to our System of Trade!
English
33K
63.3K
417.5K
76.1M
Diego
Diego@diegocaples·
@jelmerdeboer @confusionm8trix You're vastly more likely to invent something if it's named after you than if it isn't. Stigler's Law is just stating the obvious fact that most people didn't invent any given thing. Bad statistics masquerading as insight
English
0
0
4
386