Diego (@diegocaples) - Twitter Profili | Zamantika Mersobahis Locabet

Diego retweetledi

Greg Tarr@Greg_Tarr·23 Eki

we @agi_inc just achieved 76.3% on the OSWorld benchmark taking the #1 spot from ByteDance (53.1%)

English

2

1

11

758

Diego retweetledi

AGI, Inc.@agi_inc·23 Eki

AGI surpasses human-level performance at computer use. We’re excited to announce that AGI, Inc. is now the global leader on OSWorld-Verified, the industry benchmark for AI computer-control. agi-0 is the first agent to reach a superhuman score on OSWorld, with a score of 76.2%.🔥 Learn more about in it our company blog post from @_gundawar: 👇 theagi.company/blog/osworld

English

14

9

176

47.9K

Diego retweetledi

Weights & Biases@wandb·14 Eki

🏆 Grand Prize Winners: Daydreamer @diegocaples @_gundawar They're tackling the "GPT Moment for Robotics." Their agent uses a video diffusion model to imagine a successful outcome, executes it in the real world, and then uses VLM feedback to self-improve, training only on its successes.

English

1

10

1.2K

Diego retweetledi

Stephen James@stepjamUK·14 Eki

𝗗𝗟𝗥 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗴𝗮𝘃𝗲 𝗮 𝗿𝗼𝗯𝗼𝘁𝗶𝗰 𝗮𝗿𝗺 𝗳𝘂𝗹𝗹-𝗯𝗼𝗱𝘆 𝘁𝗼𝘂𝗰𝗵 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆 𝘄𝗶𝘁𝗵 𝗻𝗼 𝗮𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝘀𝗸𝗶𝗻 𝗻𝗲𝗲𝗱𝗲𝗱. They used internal force-torque sensors at 8 kHz + deep learning. The robot can feel where you touch it, recognize letters drawn on its surface, and respond to virtual buttons placed anywhere on its body. What's interesting is the infrastructure behind it. To train these models, you need high-frequency sensor streams, manifold learning to unfold trajectories, and the ability to iterate fast. They collected 2,300 samples from 20 people and hit 95.5% accuracy on digit recognition. This is what's possible when you have the right data infrastructure. 📄 lnkd.in/exgWfeXf Video credit: @DLR_en

English

56

344

2.3K

173.5K

Diego retweetledi

AGI, Inc.@agi_inc·11 Eki

AGI, Inc. is now the global leader on the AndroidWorld benchmark, with state-of-the-art verified performance of 97.4% This is a huge milestone for Android use, and just a sneak preview of what's coming - bringing trustworthy, reliable agents to every screen 🚀

English

31

87

300

50.2K

Diego retweetledi

Lucas Beyer (bl16)@giffmana·24 Eyl

Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node" The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm. I certainly didn't know.

English

333

534

8.9K

772.9K

Diego retweetledi

Crémieux@cremieuxrecueil·17 Eyl

Waymo is so safe that if every car was driven like a Waymo, about 9% of America's life expectancy gap would disappear. 9 percent Americans die in car accidents *that often*.

Crémieux@cremieuxrecueil

About 10% of America's life expectancy shortfall is due to motor vehicle accidents—fixed by Waymo and Tesla. About 18% is overdoses—that's fading now. And the lion's share of the rest is obesity-related. America will soon be buying its way out of its poor life expectancy issue.

English

228

648

6.6K

1.2M

Diego@diegocaples·26 Haz

@tiff_soerianto Thanks!

English

0

11

Tiffany Soe@tiff_soerianto·26 Haz

@diegocaples great work!

English

1

0

35

Tiffany Soe@tiff_soerianto·26 Haz

github.com/dCaples/AutoDi… super cool stufff. creates a database with RAG and then generates Q&A from your documents, that it then trains/fine tunes the model llama3.1 8B.

English

1

0

1

39

Diego@diegocaples·24 Haz

@jxmnop Because the models produced by this method are very different than the models learned by gradient descent. While this does give us a “ground truth” to benchmark interp methods on, the results don’t generalize to actual learned models.

English

0

7

320

Jack Morris@jxmnop·24 Haz

another incredibly underrated paper: Thinking Like Transformers (Weiss et al, 2021) presents RASP: a programming language that compiles to transformer *weights*. can implement sort(), bincount(), etc. seems important. why don't interpretability people care about this?

English

35

105

1.2K

83.2K

Diego retweetledi

PicoCreator - AI builder @ AIE 🇸🇬@picocreator·24 May

SOTA AI agent that reliably works... where Claude, Gemini, and o3 fail... to do the boring chores in life... @FeatherlessAI is making this possible, as part of our work into AI reliability Surpassing existing frontier models & agents by 50%+

PicoCreator - AI builder @ AIE 🇸🇬 tweet media

English

4

21

118

26.9K

Diego retweetledi

AGI, Inc.@agi_inc·18 Nis

🚀 INTRODUCING REAL Bench: Our New Standard for Web AI Agent Evaluation We're thrilled to announce the release of REAL Bench - our groundbreaking benchmark to transform how web AI agents are evaluated! Why we created REAL Bench: ✅ We built functional replicas of popular websites to test what agents can REALLY do ✅ We wanted to measure ACTUAL performance, not academic abstractions ✅ We compared leading frameworks including BrowserUse (31%) and StageHand (19%) What web tasks would YOU like to see AI agents tackle? Join our community to be part of the agentic revolution reshaping AI! ⚡ 👉 Explore REAL Bench → [realevals.xyz] 🛠️ Try REAL Bench and get your REAL score today → [github.com/agi-inc/agisdk]

English

18

16

161

120K

Diego retweetledi

Div Garg@divgarg·16 Nis

Learn to build AGI agents you actually want to work with 🔥 Sign up and follow 👉: theagi.company/course In collaboration with @AndrewYNg and @DeepLearningAI!

Andrew Ng@AndrewYNg

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with @agi_inc and taught by @divgarg and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape DeepLearning.AI's website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here: deeplearning.ai/short-courses/…

English

6

8

71

15.4K

Diego retweetledi

DeepLearning.AI@DeepLearningAI·16 Nis

AI agents that can browse the web, fill out forms, and even place online orders are no longer just research demos—they’re being built today. But real-world websites are complex. Layouts change. Popups appear. And one wrong click can cascade into booking the wrong flight or buying the wrong product. In our new course, Building AI Browser Agents, made in collaboration with @agi_inc, you’ll learn how to build web agents and how to make them more reliable using AgentQ, a framework that helps agents self-correct. Guided by instructors @divgarg and @namangarg0, you’ll build agents step-by-step: from scraping and summarizing, to signing up for newsletters, to navigating the open web and choosing optimal actions. 👉 Learn for free: hubs.la/Q03hDWK10

English

11

74

420

38.2K

Diego retweetledi

AGI, Inc.@agi_inc·16 Nis

Good AGI agents complete tasks. Great ones check their own work. Discover how to build them in our new course with @DeepLearningAI Enroll Now! bit.ly/4i9yR8U

DeepLearning.AI@DeepLearningAI

AI agents that can browse the web, fill out forms, and even place online orders are no longer just research demos—they’re being built today. But real-world websites are complex. Layouts change. Popups appear. And one wrong click can cascade into booking the wrong flight or buying the wrong product. In our new course, Building AI Browser Agents, made in collaboration with @agi_inc, you’ll learn how to build web agents and how to make them more reliable using AgentQ, a framework that helps agents self-correct. Guided by instructors @divgarg and @namangarg0, you’ll build agents step-by-step: from scraping and summarizing, to signing up for newsletters, to navigating the open web and choosing optimal actions. 👉 Learn for free: hubs.la/Q03hDWK10

English

7

32

65

13.7K

Diego retweetledi

Meghna Natraj@NatrajMeghna·12 Nis

We won 1st Place! 🏆 Our hackathon project 'AutoRL: Reinforcement Learning is all you Need' trains open-source LLMs via RL to master tools (MCPs) rivaling closed-source models. Proud of the team: @diegocaples, @thomastjoshi, @xdotli! Thank you @JvNixon! #RL #LLM #ML #AI #AGIHouse

AGI House SF@AGIHouseSF

1/ AutoMCP 🥇 1st Place ToolMaster RL - Training open-source LLMs to excel with MCPs through reinforcement learning. This project creates an environment where models learn tool usage through trial and error rather than prompt engineering. "Reinforcement Learning is All You Need" for transforming mediocre open-source models into tool-using experts that rival closed-source alternatives. Diego Caples, @diegocaples Thomas Joshi, @thomastjoshi Meghna Natraj, @NatrajMeghna Xiangyi Li, @xdotli

English

3

2

12

2.4K

Diego retweetledi

AGI House SF@AGIHouseSF·11 Nis

Anthropic brought Model Context Protocol to life. We gathered 200+ elite hackers for 12 hours to build the open source future of AI agent connections. Here's what we saw at the Finally Connected MCP Hackathon, where LLMs met the real world, with @AnthropicAI, @SmitheryDotAI, @kodjima33, @ExaAILabs, by @JvNixon:

English

5

23

95

21.1K

Diego retweetledi

AGI House SF@AGIHouseSF·11 Nis

1/ AutoMCP 🥇 1st Place ToolMaster RL - Training open-source LLMs to excel with MCPs through reinforcement learning. This project creates an environment where models learn tool usage through trial and error rather than prompt engineering. "Reinforcement Learning is All You Need" for transforming mediocre open-source models into tool-using experts that rival closed-source alternatives. Diego Caples, @diegocaples Thomas Joshi, @thomastjoshi Meghna Natraj, @NatrajMeghna Xiangyi Li, @xdotli

English

6

31

5.6K

Diego@diegocaples·18 Şub

@rg7777777777 @realDonaldTrump

QME

0

2

120

Donald J. Trump@realDonaldTrump·17 Şub

On Trade, I have decided, for purposes of Fairness, that I will charge a RECIPROCAL Tariff meaning, whatever Countries charge the United States of America, we will charge them - No more, no less! For purposes of this United States Policy, we will consider Countries that use the VAT System, which is far more punitive than a Tariff, to be similar to that of a Tariff. Sending merchandise, product, or anything by any other name through another Country, for purposes of unfairly harming America, will not be accepted. In addition, we will make provision for subsidies provided by Countries in order to take Economic advantage of the United States. Likewise, provisions will be made for Nonmonetary Tariffs and Trade Barriers that some Countries charge in order to keep our product out of their domain or, if they do not even let U.S. businesses operate. We are able to accurately determine the cost of these Nonmonetary Trade Barriers. It is fair to all, no other Country can complain and, in some cases, if a Country feels that the United States would be getting too high a Tariff, all they have to do is reduce or terminate their Tariff against us. There are no Tariffs if you manufacture or build your product in the United States. For many years, the U.S. has been treated unfairly by other Countries, both friend and foe. This System will immediately bring Fairness and Prosperity back into the previously complex and unfair System of Trade. America has helped many Countries throughout the years, at great financial cost. It is now time that these Countries remember this, and treat us fairly – A LEVEL PLAYING FIELD FOR AMERICAN WORKERS. I have instructed my Secretary of State, Secretary of Commerce, Secretary of the Treasury, and United States Trade Representative (USTR) to do all work necessary to deliver RECIPROCITY to our System of Trade!

English

33K

63.3K

417.5K

76.1M

Diego@diegocaples·16 Şub

@jelmerdeboer @confusionm8trix You're vastly more likely to invent something if it's named after you than if it isn't. Stigler's Law is just stating the obvious fact that most people didn't invent any given thing. Bad statistics masquerading as insight

English

0

4

386

Jelmer de Boer@jelmerdeboer·15 Şub

@confusionm8trix

QME

1

10

359

11.5K

machine yearning engineer@confusionm8trix·15 Şub

I think about this a lot

English

57

3.4K

108.7K

1.6M

Diego

Keşfet