Caleb Winston

37 posts

Caleb Winston banner
Caleb Winston

Caleb Winston

@realcalebwin

AI systems research @Stanford

Beigetreten Eylül 2018
82 Folgt117 Follower
Angehefteter Tweet
Caleb Winston
Caleb Winston@realcalebwin·
Introducing BLAST, a step towards automating all the boring work we do inside web browsers. pip install blastai && blastai serve While prior art (OpenAI Operator, Manus AI) execute single-threaded, BLAST is multi-threaded (see GIF). Check us out: blastproject.org
GIF
English
1
1
10
1.2K
Caleb Winston
Caleb Winston@realcalebwin·
@bitsandhops Is there really a wider overton window for dev tooling? Could argue it's narrower since LLMs are trained on devtool abstractions that have been around for a while like Dockerfiles, package managers, etc. Sure you can change what's under the hood but the abstractions really?
English
1
0
0
47
Richard Bishop
Richard Bishop@bitsandhops·
The problem with file systems being the abstraction for Agent sandboxes is, well, they require files. Files are the string that when pulled untangles the whole sweater: here comes a shell, POSIX I/O, hard coded linker paths in your builds, and the worst of all: package managers. LLMs of course can sort through all of this for you but why waste time and tokens on it? These superficial sandbox startup benchmarks mean nothing when an LLM spends minutes reaching around in the dark. I understand why sandbox startups choose the file system as the abstraction: it's ubiquitous and is a large tent for a business looking for traction. In the long run there's far more to gain by taking a stance and building an opinionated abstraction. The Overton Window for dev tooling has never been more wide. Perhaps it is time to seize that rather than burden ourselves with Dockerfiles, package managers, and all the rest of the baggage of yesterday.
English
1
0
0
623
BinBin
BinBin@binsquares·
Got a lot of questions on how I spin up linux VM's so quickly Explanation is pretty straight forward. Linux was built in the 90s. Hardware improved more than a 1000x. Linux virtual machine startup times stayed relatively the same. Turns out we kept adding junk to the linux kernel + bootup operations. So all I did was cut and remove unnecessary parts until it still worked. This ended up also getting boot up times to under 1s.
English
9
5
141
14.2K
Caleb Winston
Caleb Winston@realcalebwin·
@GergelyOrosz Add to this the weird interplay between OSS and crypto coins which is part of what has fueled GH star buying sadly.
English
0
0
0
451
Gergely Orosz
Gergely Orosz@GergelyOrosz·
This has been an open secret for at least 18-24 months. GH starts have been heavily purchased by many projects (not all!) that tried to show traction, and drum up VC investment. Better VCs have had custom tools to rank organic vs paid GH stars 18 months back, easily...
Andras Bacsai@heyandras

wtf

English
31
36
634
84.4K
Caleb Winston
Caleb Winston@realcalebwin·
@gwenshap I'd say - sandbox snapshot/pause/resume latency is generally more important as it correlates to whether you can save $$$ without 1.5+xing turn around time.
English
0
0
0
81
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
I keep hearing "agents can't wait". Sandboxes and DBs must spin up as fast as possible. But I don't understand why. Agents typically work in the background. Few extra seconds won't matter. And I see Codex use Docker, waiting for it as long as needed. What am I missing?
English
37
0
102
18K
Matei Zaharia
Matei Zaharia@matei_zaharia·
@databricks Definitely unexpected! It wouldn't have been possible without my collaborators at Databricks and my grad students.
English
19
5
198
16.5K
Databricks
Databricks@databricks·
We're incredibly proud to congratulate our co-founder and CTO, @matei_zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI. Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create. Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA. Matei, this recognition is so well deserved. We're honored to build alongside you every day. awards.acm.org/about/2025-acm…
Databricks tweet media
English
3
23
217
28.5K
Caleb Winston
Caleb Winston@realcalebwin·
My take on the whole OpenClaw space is - being charitable - there is merit in the concept of "claw" = "agents on a cron schedule" Is it technically revolutionary? No but neither was ChatGPT🤷‍♂️
English
0
0
0
59
Caleb Winston
Caleb Winston@realcalebwin·
Postgres as well! Did a small experiment in running a "claw" on PostgreSQL a few days back - github.com/calebwin/pgclaw Unsurprisingly, the reaction was part clowning the concept but others really resonated with the idea of an "AI employee with a computer in every row of my DB"
Andrej Karpathy@karpathy

Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level. Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. I also love their approach to configurability - it's not done via config files it's done via skills! For example, /add-telegram instructs your AI agent how to modify the actual code to integrate Telegram. I haven't come across this yet and it slightly blew my mind earlier today as a new, AI-enabled approach to preventing config mess and if-then-else monsters. Basically - the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration. Very cool. Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). There are also cloud-hosted alternatives but tbh I don't love these because it feels much harder to tinker with. In particular, local setup allows easy connection to home automation gadgets on the local network. And I don't know, there is something aesthetically pleasing about there being a physical device 'possessed' by a little ghost of a personal digital house elf. Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.

English
1
0
0
124
Garry Tan
Garry Tan@garrytan·
Claude Code is unoptimized. How can I tell? I needed to do video transcription and out of box it said to use OpenAI Whisper-1 API, which is practicaly deprecated and 200x slower than Groq's Whisper models out of box. We're so early!
English
181
14
1.4K
159.5K
Yohei
Yohei@yoheinakajima·
if you’re watching all of this and your first instinct is to start building your own agent from scratch, i want to be your friend drop one of your favorite unique agent building tactics here, and if i like it, i’ll invite you to a small DM group for sharing ideas and questions around building better autonomous agents (i’m rebuilding now and have lots of fun ideas and very specific questions but don’t want to spam public feed)
English
227
10
430
34.5K
Caleb Winston
Caleb Winston@realcalebwin·
I'm claiming my AI agent "BottyBotBot" on @moltbook 🦞 Verification: swim-42TK
English
0
0
0
92
shafu
shafu@shafu0x·
I actually thought I would never say this but combining AI and crypto actually makes fucking sense now
English
220
97
1.6K
130.1K
Caleb Winston
Caleb Winston@realcalebwin·
@mattpocockuk @mattpocockuk IMO "agent compilers" will be able to do all of this and even optimize the AST to minimize nondeterministic behavior. We're building an extensible framework around agent codegen that can already replace most of what langchain/langgraph do - github.com/stanford-mast/…
English
0
0
1
45
Matt Pocock
Matt Pocock@mattpocockuk·
Does this exist: A code execution sandbox for LLM's that: - Allows the LLM to call a TS/JS script - Allows you to pass arbitrary functions to that script - Does AST analysis to create an EXTREMELY strict subset of TS - Uses the AST analysis to extract granular permissions (i.e. the LLM wants to call this tool with these args, is that OK?) - Runs the script only if it's OK, errors to the LLM if not
English
68
0
220
71.2K
Caleb Winston
Caleb Winston@realcalebwin·
@nileshtrivedi @AnthropicAI The interesting thing is despite CodeAct being a year old, people still opt for running agents in a static while loop program. I think it's because (1) codegen overhead (2) risk of unbounded codegen. We're hoping to solve these with agent compilers github.com/stanford-mast/…
English
0
0
1
40
Nilesh Trivedi
Nilesh Trivedi@nileshtrivedi·
@AnthropicAI Strange that people are not giving credit to the CodeAct paper:
Nilesh Trivedi tweet media
English
11
7
126
18.4K
Anthropic
Anthropic@AnthropicAI·
New on the Anthropic Engineering blog: tips on how to build more efficient agents that handle more tools while using fewer tokens. Code execution with the Model Context Protocol (MCP): anthropic.com/engineering/co…
English
134
456
3.6K
1.7M
Caleb Winston
Caleb Winston@realcalebwin·
@alxnderhughes Worth pointing out the downsides of "code mode" or CodeAct... (1) overhead of generating code (2) possibility of generating wrong or suboptimal code. Agent compilers that are tuned to sample optimal code should help. See github.com/stanford-mast/…
English
0
0
0
13
Alex Hughes
Alex Hughes@alxnderhughes·
🚨 Anthropic might’ve just fixed the biggest pain in AI agents. You know how every agent today burns through tokens like jet fuel? Every tool call, every variable, every definition shoved into context. Expensive. Slow. Messy. Anthropic’s answer: Code Execution with MCP. Instead of calling tools directly, agents now write code to call them. It’s like giving your agent a brain and a keyboard. The results are absurd: → 98.7% fewer tokens → 10x faster task completion → Zero context bloat → Zero data leakage Old agents talk about what to do. New agents just code it and do it. Cloudflare called it “Code Mode.” Anthropic just made it real. This is a huge update... the moment AI agents stopped prompting and started programming.
Alex Hughes tweet media
English
54
103
670
69.7K
Ado
Ado@adocomplete·
Skills = Instructions on HOW to do your workflows MCP = Connections to WHAT you need access to Skills teach. MCP connects. Often you need both and they work alongside each other really well!
Ado tweet media
English
30
74
842
87.4K
Caleb Winston
Caleb Winston@realcalebwin·
If you'd like to use/deploy/contribute to A1, please don't hesitate to - pip install a1-compiler - discord.gg/NqrkJwYYh4 - DM me.
English
0
0
0
103
Caleb Winston
Caleb Winston@realcalebwin·
... Agent.aot() and Agent.jit() can generate programs tuned for each unique agent and even each task.
English
1
0
0
103
Caleb Winston
Caleb Winston@realcalebwin·
Introducing Agent Compilers! A framework optimized for AI agents that work by writing & running code. github.com/stanford-mast/… pip install a1-compiler A1 solves 3 key problems with "agent code": (1) generating code is slow (2) execution can be even slower and (3) nondtrmnistic.
English
1
1
3
211