
The interesting unit isn't any single tool.
It's "vision LLM + remote browser" as a primitive. Once you have it, you stop writing selectors.
Full code, .env setup, the one footgun, end-to-end:
@midscene/i-let-an-open-source-vision-model-drive-a-cloud-browser-no-dom-no-selectors-just-screenshots-6eae8cbf34cc" target="_blank" rel="nofollow noopener">medium.com/@midscene/i-le…
English



