Alexander Yue

192 posts

Alexander Yue banner
Alexander Yue

Alexander Yue

@Alezander907

Physics & CS @ Stanford SLAC | Agents @ Browser Use

Stanford Beigetreten Temmuz 2017
54 Folgt563 Follower
Angehefteter Tweet
Alexander Yue
Alexander Yue@Alezander907·
Hi all, I am a 3rd year undergrad at Stanford studying computational physics. I also lead agent evaluations at Browser Use. I'm starting this X account as a place to voice my thoughts on AI and agentic developments
English
0
1
8
2.4K
Евгений Гончаренко
@Alezander907 how do the open-weight ones hold up on multi-step authenticated flows? running agents on ~300 supplier portals (login + 2FA) and trying to figure out if the cheaper models break too often to be worth it at scale
English
1
0
0
5
Alexander Yue
Alexander Yue@Alezander907·
My new top models to use in Browser Use Cloud v4
Alexander Yue tweet media
English
0
1
6
734
Larsen Cundric
Larsen Cundric@larsencc·
Hear me out... Browser Harness but in the Cloud (beta). Built on: > Browsercode (thanks @Alezander907) > AWS AgentCore > Custom Control Plane Try it in the UI, or comment API V4 for early API access ↓
English
11
0
37
11.1K
Alexander Yue
Alexander Yue@Alezander907·
We reached 100k github stars!
Alexander Yue tweet media
English
2
1
11
627
Alexander Yue
Alexander Yue@Alezander907·
Human memory is still the best memory for agents. Seek to understand everything in your company. Give your agents the context they need. It wouldn’t be better to replace this human layer with LLM memory. Maybe a faster search tool would help though
English
0
0
0
115
Alexander Yue
Alexander Yue@Alezander907·
@_halshin Opus 4.8 gives more refusals about doing browser tasks
English
0
0
4
243
Hal Shin
Hal Shin@_halshin·
@Alezander907 Great to see this beating GPT-5.5, but why is the benchmark against Opus 4.7 and not Opus 4.8?
English
1
0
0
976
Alexander Yue
Alexander Yue@Alezander907·
GLM 5.2 is a huge improvement for browser agents, offering near opus level score, beating GPT 5.5 Minimax M3 is a sonnet level score at just $0.30 input, my new best value model (cheaper than deepseek v4 pro) Kimi k2.7 is a +9% improvement from k2.6 but is outclassed by M3
Alexander Yue tweet media
English
3
8
63
78.3K
Alexander Yue
Alexander Yue@Alezander907·
@watchereth_ Browser Harness is a huge new way to use browser - but just tools, no agent BrowserCode is a opencode fork with browser harness included Browse use v4 is running BrowserCode on our cloud for you, no installs required, looks like a chat app
English
3
0
1
42
Alexander Yue
Alexander Yue@Alezander907·
Try it now, completely open source: github.com/browser-use/br…
Russ Salakhutdinov@rsalakhu

Congrats to the @browser_use team for taking the #1 spot on Odysseys, a highly challenging benchmark for long-horizon web agents: odysseys-website.pages.dev/leaderboard Odysseys evaluates realistic, multi-hour web workflows that require sustained planning, memory, reasoning, and verification across numerous websites and tools, far beyond short single-step browser tasks. Exciting progress toward truly capable long-horizon agents.

English
1
0
2
447
Alexander Yue retweetet
Russ Salakhutdinov
Russ Salakhutdinov@rsalakhu·
Congrats to the @browser_use team for taking the #1 spot on Odysseys, a highly challenging benchmark for long-horizon web agents: odysseys-website.pages.dev/leaderboard Odysseys evaluates realistic, multi-hour web workflows that require sustained planning, memory, reasoning, and verification across numerous websites and tools, far beyond short single-step browser tasks. Exciting progress toward truly capable long-horizon agents.
Russ Salakhutdinov tweet media
English
7
13
50
17.8K
Alexander Yue
Alexander Yue@Alezander907·
We forked OpenCode
Alexander Yue tweet media
English
0
0
3
96
Alexander Yue
Alexander Yue@Alezander907·
BrowserCode has now been verified at the top of the Odysseys leaderboard. And its the same capability in the new browser-use version we released!
Alexander Yue tweet media
English
0
0
2
114