murat 🍥

15.6K posts

murat 🍥 banner
murat 🍥

murat 🍥

@mayfer

Vancouver, BC Katılım Mart 2008
8.4K Takip Edilen19.9K Takipçiler
murat 🍥
murat 🍥@mayfer·
what if i told you... computer use can be faster on local models moondream3 with its photon update today that gives it mac support can see your screen and use it with 1s latency, ty @vikhyatk here we have whisper+qwen+moondream triple model pipeline working offline flawlessly
English
52
105
2.1K
378.4K
Azlen
Azlen@azlenelza·
If you are into word games and interested in playtesting this and other word games I’m tinkering on, let me know!
English
5
0
13
550
Azlen
Azlen@azlenelza·
Making new word game where adjacent cells have to be meaningfully related / associated with each other
English
14
7
156
8.7K
Vivek Aithal
Vivek Aithal@nuwandavek·
@mayfer @vikhyatk nice! i was playing around with a similarish idea (finding all untagged basketball/tennis courts in sf on google maps by browsing around) gemma4 + efficientsam3 was hilariously good for the size. will try qwen! github.com/SimonZeng7108/…
English
1
0
1
65
murat 🍥
murat 🍥@mayfer·
@mig4ng @vikhyatk m3 max tho i also use it on base m4 mac mini note that moondream requires 32gb
English
1
0
1
127
murat 🍥
murat 🍥@mayfer·
@Laythe_li_suwi @vikhyatk converting user command to actions, in this case it's just "click <item>" but it can do more things like use keyboard shortcuts, applescript, or type things directly
English
0
0
1
33
Christopher Inegbedion
Christopher Inegbedion@chris_ineg·
@mayfer @vikhyatk this is really cool and the future of computer/browser use but when i think of giving an agent control of my computer it needs to meet 2 criteria. speed and intelligence. your demo looks quick but how intelligent is it with more complex usecases?
English
2
0
1
400
murat 🍥
murat 🍥@mayfer·
@nuwandavek @vikhyatk i've tried basically every small llm in existence and settled on qwen3.5 4B at q4_k_m there may be different prompts that make gemma work too but with my prompts gemma was unusably worse. qwen has really great general ability
English
1
0
1
128
Vivek Aithal
Vivek Aithal@nuwandavek·
@mayfer @vikhyatk which qwen is it? can you go even tinier with gemma? what’s the smallest stack where this can work reasonably well?
English
1
0
1
144
Alex Southwell (he/him)
Alex Southwell (he/him)@alexpsouthwell·
@mayfer @vikhyatk can you preprocess by capturing the screen, looking at the parts of the image that are diff from the last and then identifying the content an action surfaces, then you have a text model of screen that the agent can interact with immediately. trade off speed / processing
English
1
0
1
321
Tom Bielecki
Tom Bielecki@tombielecki·
@mayfer @vikhyatk is the prompt prefix static, or does it change based on what applications are focused?
English
1
0
0
65
murat 🍥
murat 🍥@mayfer·
@tombielecki @vikhyatk whisper-large-v3-turbo or whatever its called. it's hands down best especially with text prompt prefix to set the context. if you need faster due to hardware restrictions parakeet or apple speech recognition are ok
English
1
0
4
469
Tom Bielecki
Tom Bielecki@tombielecki·
@mayfer @vikhyatk which whisper model are you using? I went into creating a next-action prediction model using hammerspoon and markov chains to help local STT understand what I might be trying to say (context for short utterances). Wonder if it might be useful here.
English
1
0
1
515
murat 🍥
murat 🍥@mayfer·
@m0ches @vikhyatk keyboard shortcuts and applescript yes, we use base mac mini for our tv. for local moondream you need 32gb memory
English
0
0
1
638
Micha
Micha@m0ches·
@mayfer @vikhyatk Wait, is this actually usable without needing a beefy machine attached?
English
1
0
1
674
murat 🍥
murat 🍥@mayfer·
@lyc_aon yeah when i said "low latency" i thought what i meant would come across
English
1
0
0
57
lycaon
lycaon@lyc_aon·
@mayfer Depends on how detailed/accurate you need the image analysis to be and how fast you need the loop. For live computer use like tasks local will prob win out, but for automatic detailed aesthetics / design / accurate text extraction, cloud will be ahead / faster for a while imo.
English
2
0
1
101
judah
judah@joodalooped·
@mayfer image upload point honestly really good, although part of me is like "would low res work?"
English
1
0
1
126
murat 🍥
murat 🍥@mayfer·
the harness is GoatRemote goatremote.com it has an absurdly optimized qwen3.5 pipeline. the LLM call takes 300ms which determines what action to take based on the user's request imo forget traditional agent harnesses, they can't achieve this kind of latency, they're not built for it
English
0
0
12
822
Oleg 🇺🇦
Oleg 🇺🇦@yaroshevych·
@mayfer @vikhyatk What harness does it use to achieve 1s latency? In my tests, Desktop Control CLI achieves ~400-500ms for local perception, but you also need to add LLM call layency. See demo: x.com/yaroshevych/st…
Oleg 🇺🇦@yaroshevych

I learned to appreciate fast models: Mercury model by @_inception_ai, driven by @opencode via @OpenRouter By the numbers, #DesktopCtl took under 600ms for most UI operations (mostly driven by OCR cost), while model latency was under 2-3sec. @lmstudio was used for the demo purposes.

English
1
0
4
1.7K
murat 🍥
murat 🍥@mayfer·
@vikhyatk amazing to hear. every time latency is reduced it opens up new use cases
English
0
0
1
203
vik
vik@vikhyatk·
@mayfer more speedup coming soon btw, i stuck with the constraint of sticking with the same precision we trained the model on. but mac really needs heavier quantization... now that we know, can address it during training so there's no accuracy loss from PTQ
English
1
0
6
237
Matt
Matt@m13v_·
@mayfer @vikhyatk fair on prefill, chromium-scale trees eat context fast. native apps with stable refs are where AX still wins, target the same element across multiple actions without re-grounding.
English
1
0
1
205