Allan

326 posts

Allan banner
Allan

Allan

@Allan

★★★★☆

New York, NY Katılım Mart 2007
813 Takip Edilen5.8K Takipçiler
Sabitlenmiş Tweet
Allan
Allan@Allan·
The future is bright.
English
24
0
38
26.4K
Allan
Allan@Allan·
@johnpalmer Mine is a SUV sized robot dog that you can sit on top of and it’ll ride you around town and stuff.
English
0
0
2
180
John Palmer
John Palmer@johnpalmer·
my billion dollar hardware idea is a NEO robot but it’s six inches tall and just hangs out on your desk
English
10
0
55
5.9K
Allan
Allan@Allan·
@attacless @usgraphics You, yes. I mostly infantilize products so I'll be stuck paying for Berkley Mono.
English
1
0
1
64
attac
attac@attacless·
ping just passed the vibe check @usgraphics use berkley mono for our next update?
attac tweet media
English
2
0
19
2.5K
David McGillivray
David McGillivray@dmcgco·
There must be a simpler way to manage/setup multiple (5+) email addresses from different domains vs hooking up a bunch of separate google/biz accounts? I have a bunch of different addresses from different ventures and I feel like I'm missing a trick here.
English
8
0
12
4.6K
Allan
Allan@Allan·
@mschoening A small novelty but I gave mine access to a receipt printer. I now get a printout in the morning with my schedule and some todos.
English
0
0
0
50
Max Schoening
Max Schoening@mschoening·
Here are tasks I want to get done: - Kick off coding agents on real codebase (there are 4000 services that do this) - Read my email, draft replies, archive BS - Reply to Slack messages - Help me schedule things and make reservations - Write me little research reports on topics I care about - Grocery shopping - Organize my digital life - Cancel dumb subscriptions - Manage my personal finances and pay bills - Renegotiate contracts
English
2
0
5
1.2K
Max Schoening
Max Schoening@mschoening·
What is the most useful thing you’ve seen an OpenClaw do? I love the tinkering. But, what does it actually do for you?
English
5
0
2
2.4K
Allan retweetledi
David
David@dayonefoundry·
I'm scared to launch my new iOS app. I'm in my happy place right now coding. I know as soon as I hit publish, I have to start shaking my ass on tiktok for downloads.
English
249
85
1.9K
112.5K
Allan
Allan@Allan·
@max_creating Woah, super impressive! I love this! We should race our agents!
English
2
0
2
38
Allan
Allan@Allan·
@brycedriesenga @ZainMerchant9 @westoque That's probably a natural place to end up. It's also very tempting to fall back on AppleScript or things that aren't keyboard/mouse. Once it's both extremely competent and fast with input designed for humans, that'd be the idea.
English
0
0
0
29
Bryce Driesenga
Bryce Driesenga@brycedriesenga·
@Allan @ZainMerchant9 @westoque I wonder if it's possible for it to tap in to app intents/scripts/shortcuts and default to those when possible for speed, but fall back to vision?
English
1
0
1
23
Allan
Allan@Allan·
@LarryVelez Porsche AG had a very rough year financially so maybe instead a new AI agent division via acquisition of some idiot's pet project is in order.
English
0
0
3
75
Larry Velez
Larry Velez@LarryVelez·
@Allan Porsche's IP lawyers are aggressive, so start working on another logo.
English
1
0
0
81
Allan
Allan@Allan·
@louis030195 I've seen but never tried. Looks impressive and I wouldn't be surprised if it's quite good. But perhaps because I'm twice as lazy and half as clever, it's a "just works" solution. There's no chat with the agent, nor will it execute code it writes on the fly.
English
0
0
0
93
Allan
Allan@Allan·
@FaithfulFirst That’s very roughly how it works now, although speed and ability of the model is driving which is used. Turbo uses two small local models.
English
0
0
1
172
Jason Of Damascus ☦️
Jason Of Damascus ☦️@FaithfulFirst·
@Allan Super cool, have you tried to mix models. A local one for regular fps and an event driven of a stronger more token/$ for bigger things?
English
1
0
1
206
Allan
Allan@Allan·
Yes! This is what it does! Every run it updates a small SQLite database for each application with Icons/UI, Task Sequences (small sequences that can be replayed), and recipes (action patterns). In theory it should get smarter every time and I could share my "skills" with you and speed up your Turbo agent, if needed.
English
2
0
2
58
Zain Merchant
Zain Merchant@ZainMerchant9·
Skills is the same approach I went when using MacOS automation tools/control scripts. It really is the best approach I’ve found for making sure the agent knows/has a reference guide for whatever app/workflow it’s trying to perform. Add an agent that creates new skills based on user interactions and you got a self improving system right there
English
1
0
0
60
Allan
Allan@Allan·
@KalraIshaan11 It started as fixed-tick and worked, but it was very token hungry. I switched to a reactive / event-driven loop. That doesn’t rule out continuous perception or delta tracking though. Just not built yet. Likely a can of worms but might be important.
English
0
0
3
242
Ishaan Kalra
Ishaan Kalra@KalraIshaan11·
@Allan Hey Allan, this is awesome. Quick question: how frequently does Turbo sample the screen (fixed FPS vs event-driven vs adaptive)? Also, have you thought about a “always-on” background mode that can persist without disrupting the user’s workflow?
English
1
0
0
251
Allan
Allan@Allan·
Agree on speed. Turbo’s architecture is optimized around fast inference and persistent UI state, so it doesn’t have to relearn the interface. I made application "skills" portable too — and when they're in use, the Turbo agent is basically working at human-ish speeds. It's early and not optimized much yet. I bet I can get Turbo to work on some tasks at faster-than-human speeds. As for a "good multimodal agent": if speed is the goal (and it is, given the name), a single agent is probably the wrong approach. Turbo mixes local models and larger frontier models.
English
3
0
5
594
William Estoque
William Estoque@westoque·
@Allan vision is correct technically but currently it's just too slow. tried to do this before and you need: 1. fast inference 2. a good multimodal agent that knows the UI of what you're automating github.com/bytedance/UI-T…
English
1
0
3
713
Allan
Allan@Allan·
@grok @007Killpop I believe Claude Cowork / computer-use agents are tool-mediated (they ask tools to do the work) and turn-based, re-perceiving the screen each step. Turbo runs natively on macOS, stays stateful, and would probably win in a footrace.
English
1
0
0
41
Allan
Allan@Allan·
The LLM that's responsible for planning receives 3ish key pieces of context. Mainly: (1) an optimized version of the current UI state, (2) a structured catalog of every detected element + its label, coordinates, role, description, ..., and (3) the task description. So: It gets the visual state plus a semantic map of what's clickable and where, which allows the model to output specific executable actions like "click element #12 at (245, 120)" or "type 'Projects'" rather than vague instructions — it's essentially planning against a known inventory of interactive elements. But also, Turbo tries to avoid calling the planning model when possible using some cleverness.
English
0
0
2
490
Will Laverty
Will Laverty@Will1365·
@Allan I’m curious how you’re prompting it, what context is used for the LLM to produce clear executable outcomes
English
1
0
1
536
Allan
Allan@Allan·
It's quite a bit different/more than vanilla OCR. Turbo takes natural language instructions (do X), turns it into a plan, and then executes the plan. It's like 6ish models in a trench coat (doing perception + planning + task decomposition/management + action verification and labeling + whatever else). All of this lets (1) Turbo interact with purely visual elements like icons that have no text at all, and understand the semantic role of each element in the UI, and (2) actually learn how to use an application by trial and error, which is what you need for autonomous automation rather than just text extraction.
English
0
0
2
272
Chaz
Chaz@chazwitt·
@Allan What exactly does this do again? How is this any different from Apple OCR ?
English
1
0
0
304