Alexander Yue
97 posts

Alexander Yue
@Alezander907
Physics & CS @ Stanford SLAC | Agents @ Browser Use
Stanford Katılım Temmuz 2017
50 Takip Edilen260 Takipçiler
Sabitlenmiş Tweet

@mamagnus00 @disruptor37 Benchmark evals are still running. Opus 4.7, sonnet 4.6, gpt 5.5 are clearly best. On the best value side it looks like kimi k2.6 and qwen3.6-plus are about tied the top. DeepSeek v-4 is a runner up, Gemini is behaving weirdly. Need to look into before I call it
English

Today I have carefully updated all browser-harness install instructions. I tested e2e on mac, windows, and linux and across agent frameworks
Every agent on any OS should now one-shot connecting to the browser. Just give it this link and ask it to use the browser github.com/browser-use/br…
Its not the flashy work that sets you apart
English

The +13% score increase for browser-harness from my last post was because I switched the agent framework
I went from using Claude Code to using @opencode and the performance immediately spiked. It's open-source, allowing a more direct connection to the harness

English

@Alezander907 Where would bu-2 be in this chart? I still use it with local browser use. Am I outdated?
English

@Alezander907 what’s the hardest browser action to make reliable?
English

I think @PrivacyHQ may have succeeded. Will be trying their agent cards today, one thing I wanted that is missing is discussion of how they will handle force-posts beyond spend limits
Alexander Yue@Alezander907
Is it even possible to solve Agent payment with how credit cards fundamentally work? Seems like the possibility of a force push prevents anyone moving forward with VCC, but its what is required by most vendors
English

I have now built my own agent router and it is amazing. But I was wrong about needing to rename sessions (for now).
Instead, just integrate into slack. One thread = one session. Feels very natural, now I can work from anywhere
Alexander Yue@Alezander907
The next developments in AI coding should be about session management. @cursor_ai has it right with their new update. Low hanging fruit for all, including @opencode (pls). Have the LLMs rename the session every few messages. Soon there will be routing LLMs too
English

Introducing Bud.
The first AI Human Emulator.
Bud has a full computer with storage, compute, and memory to build and code, sms and telegram to communicate, a full browser to use, can create/store/edit files, connect and use your tools, learn custom skills, work fully autonomously, and complete any task end to end just like a human.
Text the number below or try free at bud [dot] app.
Comment for 100k free credits.
English




