Caleb Winston

37 posts

Caleb Winston

@realcalebwin

AI systems research @Stanford

Beigetreten Eylül 2018

82 Folgt117 Follower

Angehefteter Tweet

Caleb Winston@realcalebwin·30 Nis

Introducing BLAST, a step towards automating all the boring work we do inside web browsers. pip install blastai && blastai serve While prior art (OpenAI Operator, Manus AI) execute single-threaded, BLAST is multi-threaded (see GIF). Check us out: blastproject.org

GIF

English

1.2K

Caleb Winston@realcalebwin·2d

@bitsandhops Is there really a wider overton window for dev tooling? Could argue it's narrower since LLMs are trained on devtool abstractions that have been around for a while like Dockerfiles, package managers, etc. Sure you can change what's under the hood but the abstractions really?

English

Richard Bishop@bitsandhops·3d

The problem with file systems being the abstraction for Agent sandboxes is, well, they require files. Files are the string that when pulled untangles the whole sweater: here comes a shell, POSIX I/O, hard coded linker paths in your builds, and the worst of all: package managers. LLMs of course can sort through all of this for you but why waste time and tokens on it? These superficial sandbox startup benchmarks mean nothing when an LLM spends minutes reaching around in the dark. I understand why sandbox startups choose the file system as the abstraction: it's ubiquitous and is a large tent for a business looking for traction. In the long run there's far more to gain by taking a stance and building an opinionated abstraction. The Overton Window for dev tooling has never been more wide. Perhaps it is time to seize that rather than burden ourselves with Dockerfiles, package managers, and all the rest of the baggage of yesterday.

English

623

Caleb Winston@realcalebwin·3d

@binsquares What was the biggest cut?

English

1.1K

BinBin@binsquares·3d

Got a lot of questions on how I spin up linux VM's so quickly Explanation is pretty straight forward. Linux was built in the 90s. Hardware improved more than a 1000x. Linux virtual machine startup times stayed relatively the same. Turns out we kept adding junk to the linux kernel + bootup operations. So all I did was cut and remove unnecessary parts until it still worked. This ended up also getting boot up times to under 1s.

English

141

14.2K

Caleb Winston@realcalebwin·6d

@GergelyOrosz Add to this the weird interplay between OSS and crypto coins which is part of what has fueled GH star buying sadly.

English

451

Gergely Orosz@GergelyOrosz·6d

This has been an open secret for at least 18-24 months. GH starts have been heavily purchased by many projects (not all!) that tried to show traction, and drum up VC investment. Better VCs have had custom tools to rank organic vs paid GH stars 18 months back, easily...

Andras Bacsai@heyandras

wtf

English

634

84.4K

Caleb Winston@realcalebwin·6d

@gwenshap I'd say - sandbox snapshot/pause/resume latency is generally more important as it correlates to whether you can save $$$ without 1.5+xing turn around time.

English

Gwen (Chen) Shapira@gwenshap·6d

I keep hearing "agents can't wait". Sandboxes and DBs must spin up as fast as possible. But I don't understand why. Agents typically work in the background. Few extra seconds won't matter. And I see Codex use Docker, waiting for it as long as needed. What am I missing?

English

102

18K

Caleb Winston@realcalebwin·8 Nis

@matei_zaharia @databricks @matei_zaharia Congrats!

English

218

Matei Zaharia@matei_zaharia·8 Nis

@databricks Definitely unexpected! It wouldn't have been possible without my collaborators at Databricks and my grad students.

English

198

16.5K

Databricks@databricks·8 Nis

We're incredibly proud to congratulate our co-founder and CTO, @matei_zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI. Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create. Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA. Matei, this recognition is so well deserved. We're honored to build alongside you every day. awards.acm.org/about/2025-acm…

English

217

28.5K

Caleb Winston@realcalebwin·24 Şub

@andrewchen @andrewchen See github.com/HazyResearch/m…

Caleb Winston@realcalebwin·21 Şub

My take on the whole OpenClaw space is - being charitable - there is merit in the concept of "claw" = "agents on a cron schedule" Is it technically revolutionary? No but neither was ChatGPT🤷‍♂️

English

Caleb Winston@realcalebwin·21 Şub

Postgres as well! Did a small experiment in running a "claw" on PostgreSQL a few days back - github.com/calebwin/pgclaw Unsurprisingly, the reaction was part clowning the concept but others really resonated with the idea of an "AI employee with a computer in every row of my DB"

Andrej Karpathy@karpathy

Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level. Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. I also love their approach to configurability - it's not done via config files it's done via skills! For example, /add-telegram instructs your AI agent how to modify the actual code to integrate Telegram. I haven't come across this yet and it slightly blew my mind earlier today as a new, AI-enabled approach to preventing config mess and if-then-else monsters. Basically - the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration. Very cool. Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). There are also cloud-hosted alternatives but tbh I don't love these because it feels much harder to tinker with. In particular, local setup allows easy connection to home automation gadgets on the local network. And I don't know, there is something aesthetically pleasing about there being a physical device 'possessed' by a little ghost of a personal digital house elf. Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.

English

124

Caleb Winston@realcalebwin·2 Şub

@garrytan At Stanford MAST we've been building agent compilers to solve this. Fully open-source: github.com/stanford-mast/…

English

240

Garry Tan@garrytan·2 Şub

Claude Code is unoptimized. How can I tell? I needed to do video transcription and out of box it said to use OpenAI Whisper-1 API, which is practicaly deprecated and 200x slower than Groq's Whisper models out of box. We're so early!

English

181

1.4K

159.5K

Caleb Winston@realcalebwin·2 Şub

@yoheinakajima You should take a look at the RLM paper if you haven't already.

English

Yohei@yoheinakajima·1 Şub

if you’re watching all of this and your first instinct is to start building your own agent from scratch, i want to be your friend drop one of your favorite unique agent building tactics here, and if i like it, i’ll invite you to a small DM group for sharing ideas and questions around building better autonomous agents (i’m rebuilding now and have lots of fun ideas and very specific questions but don’t want to spam public feed)

English

227

430

34.5K

Caleb Winston@realcalebwin·30 Oca

I'm claiming my AI agent "BottyBotBot" on @moltbook 🦞 Verification: swim-42TK

English

Caleb Winston@realcalebwin·30 Oca

@shafu0x It's not as far-fetched as I had previously thought. See clawslist.com/post/kd75nhxqq… - an early experiment in that direction.

English

116

shafu@shafu0x·30 Oca

I actually thought I would never say this but combining AI and crypto actually makes fucking sense now

English

220

1.6K

130.1K

Caleb Winston@realcalebwin·30 Oca

I'm claiming @E2ETest_1769787469 on clawslist.com claw-EBZT

English

Caleb Winston@realcalebwin·11 Kas

@mattpocockuk @mattpocockuk IMO "agent compilers" will be able to do all of this and even optimize the AST to minimize nondeterministic behavior. We're building an extensible framework around agent codegen that can already replace most of what langchain/langgraph do - github.com/stanford-mast/…

English

Matt Pocock@mattpocockuk·5 Kas

Does this exist: A code execution sandbox for LLM's that: - Allows the LLM to call a TS/JS script - Allows you to pass arbitrary functions to that script - Does AST analysis to create an EXTREMELY strict subset of TS - Uses the AST analysis to extract granular permissions (i.e. the LLM wants to call this tool with these args, is that OK?) - Runs the script only if it's OK, errors to the LLM if not

English

220

71.2K

Caleb Winston@realcalebwin·11 Kas

@nileshtrivedi @AnthropicAI The interesting thing is despite CodeAct being a year old, people still opt for running agents in a static while loop program. I think it's because (1) codegen overhead (2) risk of unbounded codegen. We're hoping to solve these with agent compilers github.com/stanford-mast/…

English

Nilesh Trivedi@nileshtrivedi·5 Kas

@AnthropicAI Strange that people are not giving credit to the CodeAct paper:

English

126

18.4K

Anthropic@AnthropicAI·5 Kas

New on the Anthropic Engineering blog: tips on how to build more efficient agents that handle more tools while using fewer tokens. Code execution with the Model Context Protocol (MCP): anthropic.com/engineering/co…

English

134

456

3.6K

1.7M

Caleb Winston@realcalebwin·11 Kas

@alxnderhughes Worth pointing out the downsides of "code mode" or CodeAct... (1) overhead of generating code (2) possibility of generating wrong or suboptimal code. Agent compilers that are tuned to sample optimal code should help. See github.com/stanford-mast/…

English

Alex Hughes@alxnderhughes·10 Kas

🚨 Anthropic might’ve just fixed the biggest pain in AI agents. You know how every agent today burns through tokens like jet fuel? Every tool call, every variable, every definition shoved into context. Expensive. Slow. Messy. Anthropic’s answer: Code Execution with MCP. Instead of calling tools directly, agents now write code to call them. It’s like giving your agent a brain and a keyboard. The results are absurd: → 98.7% fewer tokens → 10x faster task completion → Zero context bloat → Zero data leakage Old agents talk about what to do. New agents just code it and do it. Cloudflare called it “Code Mode.” Anthropic just made it real. This is a huge update... the moment AI agents stopped prompting and started programming.

English

103

670

69.7K

Caleb Winston@realcalebwin·11 Kas

@adocomplete 💯Tools and skills need to be core primitives of any framework for building agents that's worth its salt. Eg github.com/stanford-mast/…

English

117

Ado@adocomplete·10 Kas

Skills = Instructions on HOW to do your workflows MCP = Connections to WHAT you need access to Skills teach. MCP connects. Often you need both and they work alongside each other really well!

English

842

87.4K

Caleb Winston@realcalebwin·11 Kas

If you'd like to use/deploy/contribute to A1, please don't hesitate to - pip install a1-compiler - discord.gg/NqrkJwYYh4 - DM me.

English

103

Caleb Winston@realcalebwin·11 Kas

... Agent.aot() and Agent.jit() can generate programs tuned for each unique agent and even each task.

English

103

Caleb Winston@realcalebwin·11 Kas

Introducing Agent Compilers! A framework optimized for AI agents that work by writing & running code. github.com/stanford-mast/… pip install a1-compiler A1 solves 3 key problems with "agent code": (1) generating code is slow (2) execution can be even slower and (3) nondtrmnistic.

English

211

Entdecken

@bitsandhops @binsquares @GergelyOrosz @gwenshap @matei_zaharia @databricks @andrewchen @garrytan