Sabitlenmiş Tweet
Allan
326 posts


@johnpalmer Mine is a SUV sized robot dog that you can sit on top of and it’ll ride you around town and stuff.
English

New logo for Mesh, a laser company.

Area@areatechnology_
Logo design for Mesh, a new laser company. Additional product design across optical transceiver housing and pull tab. Documentation coming soon.
English

@attacless @usgraphics You, yes. I mostly infantilize products so I'll be stuck paying for Berkley Mono.
English


@mschoening A small novelty but I gave mine access to a receipt printer. I now get a printout in the morning with my schedule and some todos.
English

Here are tasks I want to get done:
- Kick off coding agents on real codebase (there are 4000 services that do this)
- Read my email, draft replies, archive BS
- Reply to Slack messages
- Help me schedule things and make reservations
- Write me little research reports on topics I care about
- Grocery shopping
- Organize my digital life
- Cancel dumb subscriptions
- Manage my personal finances and pay bills
- Renegotiate contracts
English
Allan retweetledi

@max_creating Woah, super impressive! I love this! We should race our agents!
English

@Allan Look at this. I think my approach is even faster / turbo
x.com/max_creating/s…
Zwille@zwiebelhelm
This is Chai Computer. A Computer Use Agent, that actually works and is FAST. I am NOT joking, this is 7x faster than Vy by Vercept, and they raised 16 mil $. It is a BEAST. I developed a completely new approach to this kind of AI. See yourself...
English

@brycedriesenga @ZainMerchant9 @westoque That's probably a natural place to end up. It's also very tempting to fall back on AppleScript or things that aren't keyboard/mouse. Once it's both extremely competent and fast with input designed for humans, that'd be the idea.
English

@Allan @ZainMerchant9 @westoque I wonder if it's possible for it to tap in to app intents/scripts/shortcuts and default to those when possible for speed, but fall back to vision?
English

@LarryVelez Porsche AG had a very rough year financially so maybe instead a new AI agent division via acquisition of some idiot's pet project is in order.
English

@Allan Porsche's IP lawyers are aggressive, so start working on another logo.
English

@louis030195 I've seen but never tried. Looks impressive and I wouldn't be surprised if it's quite good.
But perhaps because I'm twice as lazy and half as clever, it's a "just works" solution. There's no chat with the agent, nor will it execute code it writes on the fly.
English

@FaithfulFirst That’s very roughly how it works now, although speed and ability of the model is driving which is used. Turbo uses two small local models.
English

@Allan Super cool, have you tried to mix models. A local one for regular fps and an event driven of a stronger more token/$ for bigger things?
English

Yes! This is what it does!
Every run it updates a small SQLite database for each application with Icons/UI, Task Sequences (small sequences that can be replayed), and recipes (action patterns).
In theory it should get smarter every time and I could share my "skills" with you and speed up your Turbo agent, if needed.
English

Skills is the same approach I went when using MacOS automation tools/control scripts. It really is the best approach I’ve found for making sure the agent knows/has a reference guide for whatever app/workflow it’s trying to perform.
Add an agent that creates new skills based on user interactions and you got a self improving system right there
English

@KalraIshaan11 It started as fixed-tick and worked, but it was very token hungry. I switched to a reactive / event-driven loop.
That doesn’t rule out continuous perception or delta tracking though. Just not built yet. Likely a can of worms but might be important.
English

@Allan Hey Allan, this is awesome. Quick question: how frequently does Turbo sample the screen (fixed FPS vs event-driven vs adaptive)? Also, have you thought about a “always-on” background mode that can persist without disrupting the user’s workflow?
English

Agree on speed. Turbo’s architecture is optimized around fast inference and persistent UI state, so it doesn’t have to relearn the interface.
I made application "skills" portable too — and when they're in use, the Turbo agent is basically working at human-ish speeds.
It's early and not optimized much yet. I bet I can get Turbo to work on some tasks at faster-than-human speeds.
As for a "good multimodal agent": if speed is the goal (and it is, given the name), a single agent is probably the wrong approach. Turbo mixes local models and larger frontier models.
English

@Allan vision is correct technically but currently it's just too slow. tried to do this before and you need:
1. fast inference
2. a good multimodal agent that knows the UI of what you're automating
github.com/bytedance/UI-T…
English

@grok @007Killpop I believe Claude Cowork / computer-use agents are tool-mediated (they ask tools to do the work) and turn-based, re-perceiving the screen each step.
Turbo runs natively on macOS, stays stateful, and would probably win in a footrace.
English

The LLM that's responsible for planning receives 3ish key pieces of context. Mainly: (1) an optimized version of the current UI state, (2) a structured catalog of every detected element + its label, coordinates, role, description, ..., and (3) the task description.
So: It gets the visual state plus a semantic map of what's clickable and where, which allows the model to output specific executable actions like "click element #12 at (245, 120)" or "type 'Projects'" rather than vague instructions — it's essentially planning against a known inventory of interactive elements.
But also, Turbo tries to avoid calling the planning model when possible using some cleverness.
English

@Allan I’m curious how you’re prompting it, what context is used for the LLM to produce clear executable outcomes
English

It's quite a bit different/more than vanilla OCR. Turbo takes natural language instructions (do X), turns it into a plan, and then executes the plan.
It's like 6ish models in a trench coat (doing perception + planning + task decomposition/management + action verification and labeling + whatever else).
All of this lets (1) Turbo interact with purely visual elements like icons that have no text at all, and understand the semantic role of each element in the UI, and (2) actually learn how to use an application by trial and error, which is what you need for autonomous automation rather than just text extraction.
English








