murat 🍥

15.6K posts

murat 🍥

@mayfer

Vancouver, BC Katılım Mart 2008

8.4K Takip Edilen19.9K Takipçiler

murat 🍥@mayfer·2d

@caguilar51 @vikhyatk yes easy

English

Carlos Aguilar@caguilar51·2d

@mayfer @vikhyatk I have an M3 with 48GB of RAM. Could I run it on it

English

murat 🍥@mayfer·3d

what if i told you... computer use can be faster on local models moondream3 with its photon update today that gives it mac support can see your screen and use it with 1s latency, ty @vikhyatk here we have whisper+qwen+moondream triple model pipeline working offline flawlessly

English

105

2.1K

378.4K

murat 🍥@mayfer·2d

@azlenelza meme

English

Azlen@azlenelza·3d

If you are into word games and interested in playtesting this and other word games I’m tinkering on, let me know!

English

550

Azlen@azlenelza·3d

Making new word game where adjacent cells have to be meaningfully related / associated with each other

English

156

8.7K

murat 🍥@mayfer·2d

@azlenelza yay

160

murat 🍥@mayfer·2d

@nuwandavek @vikhyatk gemma could be better at certain tasks or prompts too i wouldn't be surprised

English

Vivek Aithal@nuwandavek·2d

@mayfer @vikhyatk nice! i was playing around with a similarish idea (finding all untagged basketball/tennis courts in sf on google maps by browsing around) gemma4 + efficientsam3 was hilariously good for the size. will try qwen! github.com/SimonZeng7108/…

English

murat 🍥@mayfer·2d

@mig4ng @vikhyatk m3 max tho i also use it on base m4 mac mini note that moondream requires 32gb

English

127

mig4ng@mig4ng·2d

@mayfer @vikhyatk What apple machine is it?

English

murat 🍥@mayfer·2d

@shubham_arora_0 @vikhyatk thank you sir

English

141

Shubham Arora@shubham_arora_0·2d

@mayfer @vikhyatk goat website!

English

169

murat 🍥@mayfer·2d

@Laythe_li_suwi @vikhyatk converting user command to actions, in this case it's just "click <item>" but it can do more things like use keyboard shortcuts, applescript, or type things directly

English

Laythe@Laythe_li_suwi·2d

@mayfer @vikhyatk i am curious, where is qwen involved all of this?

English

murat 🍥@mayfer·2d

@chris_ineg @vikhyatk defo not haha

English

309

Christopher Inegbedion@chris_ineg·2d

@mayfer @vikhyatk this is really cool and the future of computer/browser use but when i think of giving an agent control of my computer it needs to meet 2 criteria. speed and intelligence. your demo looks quick but how intelligent is it with more complex usecases?

English

400

murat 🍥@mayfer·2d

@nuwandavek @vikhyatk i've tried basically every small llm in existence and settled on qwen3.5 4B at q4_k_m there may be different prompts that make gemma work too but with my prompts gemma was unusably worse. qwen has really great general ability

English

128

Vivek Aithal@nuwandavek·2d

@mayfer @vikhyatk which qwen is it? can you go even tinier with gemma? what’s the smallest stack where this can work reasonably well?

English

144

murat 🍥@mayfer·3d

@alexpsouthwell @vikhyatk idk you tell me sounds interesting tho

English

279

Alex Southwell (he/him)@alexpsouthwell·3d

@mayfer @vikhyatk can you preprocess by capturing the screen, looking at the parts of the image that are diff from the last and then identifying the content an action surfaces, then you have a text model of screen that the agent can interact with immediately. trade off speed / processing

English

321

murat 🍥@mayfer·3d

@mdlahfir @yacineMTB @vikhyatk im using screenshots theyre more universal and tbh faster often

English

870

Lahfir@mdlahfir·3d

@mayfer @yacineMTB @vikhyatk I wonder whether it is using screenshot loop to predict the co-ordinates or using accessibility trees? If you want your LLMs to control using accessibility trees fully headless, you can use this: github.com/lahfir/agent-d…

English

1.2K

murat 🍥@mayfer·3d

@tombielecki @vikhyatk changes from active app and even url yes

English

Tom Bielecki@tombielecki·3d

@mayfer @vikhyatk is the prompt prefix static, or does it change based on what applications are focused?

English

murat 🍥@mayfer·3d

@tombielecki @vikhyatk whisper-large-v3-turbo or whatever its called. it's hands down best especially with text prompt prefix to set the context. if you need faster due to hardware restrictions parakeet or apple speech recognition are ok

English

469

Tom Bielecki@tombielecki·3d

@mayfer @vikhyatk which whisper model are you using? I went into creating a next-action prediction model using hammerspoon and markov chains to help local STT understand what I might be trying to say (context for short utterances). Wonder if it might be useful here.

English

515

murat 🍥@mayfer·3d

@m0ches @vikhyatk keyboard shortcuts and applescript yes, we use base mac mini for our tv. for local moondream you need 32gb memory

English

638

Micha@m0ches·3d

@mayfer @vikhyatk Wait, is this actually usable without needing a beefy machine attached?

English

674

murat 🍥@mayfer·3d

@lyc_aon yeah when i said "low latency" i thought what i meant would come across

English

lycaon@lyc_aon·3d

@mayfer Depends on how detailed/accurate you need the image analysis to be and how fast you need the loop. For live computer use like tasks local will prob win out, but for automatic detailed aesthetics / design / accurate text extraction, cloud will be ahead / faster for a while imo.

English

101

murat 🍥@mayfer·3d

yeah this is super significant it's absurdly bad capital allocation to run low latency AI image processing on cloud. local computer use will win really hard

vik@vikhyatk

Running on Apple Silicon will never be as fast as an H100. But for interactive workloads like computer use, wall-clock latency is dominated by the network, not the accelerator. Skipping a large image uploads buys you more than the H100 buys back. x.com/mayfer/status/…

English

5.8K

murat 🍥@mayfer·3d

@joodalooped it helps but doesn't solve it until you make text unreadable

English

judah@joodalooped·3d

@mayfer image upload point honestly really good, although part of me is like "would low res work?"

English

126

murat 🍥@mayfer·3d

the harness is GoatRemote goatremote.com it has an absurdly optimized qwen3.5 pipeline. the LLM call takes 300ms which determines what action to take based on the user's request imo forget traditional agent harnesses, they can't achieve this kind of latency, they're not built for it

English

822

Oleg 🇺🇦@yaroshevych·3d

@mayfer @vikhyatk What harness does it use to achieve 1s latency? In my tests, Desktop Control CLI achieves ~400-500ms for local perception, but you also need to add LLM call layency. See demo: x.com/yaroshevych/st…

Oleg 🇺🇦@yaroshevych

I learned to appreciate fast models: Mercury model by @_inception_ai, driven by @opencode via @OpenRouter By the numbers, #DesktopCtl took under 600ms for most UI operations (mostly driven by OCR cost), while model latency was under 2-3sec. @lmstudio was used for the demo purposes.

English

1.7K

murat 🍥@mayfer·3d

@vikhyatk amazing to hear. every time latency is reduced it opens up new use cases

English

203

vik@vikhyatk·3d

@mayfer more speedup coming soon btw, i stuck with the constraint of sticking with the same precision we trained the model on. but mac really needs heavier quantization... now that we know, can address it during training so there's no accuracy loss from PTQ

English

237

murat 🍥@mayfer·3d

@m13v_ @vikhyatk that's a good optimization

English

224

Matt@m13v_·3d

@mayfer @vikhyatk fair on prefill, chromium-scale trees eat context fast. native apps with stable refs are where AX still wins, target the same element across multiple actions without re-grounding.

English

205

Keşfet

@caguilar51 @vikhyatk @azlenelza @nuwandavek @mig4ng @shubham_arora_0 @Laythe_li_suwi @chris_ineg