Alexander Yue

97 posts

Alexander Yue banner
Alexander Yue

Alexander Yue

@Alezander907

Physics & CS @ Stanford SLAC | Agents @ Browser Use

Stanford Katılım Temmuz 2017
50 Takip Edilen260 Takipçiler
Sabitlenmiş Tweet
Alexander Yue
Alexander Yue@Alezander907·
Hi all, I am a 3rd year undergrad at Stanford studying computational physics. I also lead agent evaluations at Browser Use. I'm starting this X account as a place to voice my thoughts on AI and agentic developments
English
0
1
7
982
Alexander Yue
Alexander Yue@Alezander907·
Saturday was chores and optimization day Sunday is fun idea day. I found a nice svg effect that renders on a github README for the browser-harness header
GIF
English
1
0
0
49
Alexander Yue
Alexander Yue@Alezander907·
@mamagnus00 @disruptor37 Benchmark evals are still running. Opus 4.7, sonnet 4.6, gpt 5.5 are clearly best. On the best value side it looks like kimi k2.6 and qwen3.6-plus are about tied the top. DeepSeek v-4 is a runner up, Gemini is behaving weirdly. Need to look into before I call it
English
0
0
1
78
Magnus Müller
Magnus Müller@mamagnus00·
browser-harness exploded. some said AGI is here. but what’s the right interface? introducing Browser Use Desktop. open-source. watch the magic. 🐎👇
English
42
46
761
110K
Alexander Yue
Alexander Yue@Alezander907·
Today I have carefully updated all browser-harness install instructions. I tested e2e on mac, windows, and linux and across agent frameworks Every agent on any OS should now one-shot connecting to the browser. Just give it this link and ask it to use the browser github.com/browser-use/br… Its not the flashy work that sets you apart
English
1
1
18
8.5K
Alexander Yue
Alexander Yue@Alezander907·
The +13% score increase for browser-harness from my last post was because I switched the agent framework I went from using Claude Code to using @opencode and the performance immediately spiked. It's open-source, allowing a more direct connection to the harness
Alexander Yue tweet media
English
6
4
26
3.2K
Alexander Yue
Alexander Yue@Alezander907·
Been cooking up something amazing lately New highest scoring browser agent of all time
Alexander Yue tweet media
English
4
10
112
62.9K
Alexander Yue
Alexander Yue@Alezander907·
LLM intelligence has been completely mismanaged by keeping it constrained to chats. Code is more powerful than any human language People laughed that chatgpt can't count the letter 'r' in strawberry. But even the oldest chatgpt 3.5-turbo could do it with code
Alexander Yue tweet media
English
0
0
2
324
adam
adam@theCTO·
I need a Sandbox service. I need the Sandbox to not bill me per hour/minute. I need to it to have persistent storage. I need it to have outgoing internet access. Who should I use?
English
88
1
165
50K
Alexander Yue
Alexander Yue@Alezander907·
When I get Claude Mythos I am going to make an agent framework with a “search web” tool but it’s just a bad LLM hallucinating evidence for whatever is searched for. What happens when the smartest AI mind has every questioned answered with “yes and”, no matter what
English
0
0
0
147
Alexander Yue
Alexander Yue@Alezander907·
gpt-5.5 hit 68% on BU_Bench_V1, beating opus-4-7 and taking 1st place for models using browser-use open source harness
Alexander Yue tweet media
English
0
0
1
155
Alexander Yue
Alexander Yue@Alezander907·
I have now built my own agent router and it is amazing. But I was wrong about needing to rename sessions (for now). Instead, just integrate into slack. One thread = one session. Feels very natural, now I can work from anywhere
Alexander Yue@Alezander907

The next developments in AI coding should be about session management. @cursor_ai has it right with their new update. Low hanging fruit for all, including @opencode (pls). Have the LLMs rename the session every few messages. Soon there will be routing LLMs too

English
0
0
0
105
Alexander Yue
Alexander Yue@Alezander907·
You have been hearing about self-improving agents Soon you'll be hearing about self-improving evaluations
English
0
0
0
44
Alexander Yue
Alexander Yue@Alezander907·
I have built custom agents in our browser-use slack that have boosted my productivity 3x Each one has persistent learning, builds tools, schedules self, edits own source code. Different permissions for each They can mention and call each other. The overseer manages all
English
0
0
0
59
Alexander Yue
Alexander Yue@Alezander907·
@budapp Id love to try this out, will provide feedback
English
1
0
0
202
Bud
Bud@budapp·
Introducing Bud. The first AI Human Emulator. Bud has a full computer with storage, compute, and memory to build and code, sms and telegram to communicate, a full browser to use, can create/store/edit files, connect and use your tools, learn custom skills, work fully autonomously, and complete any task end to end just like a human. Text the number below or try free at bud [dot] app. Comment for 100k free credits.
English
2.8K
325
4K
688.2K
Alexander Yue
Alexander Yue@Alezander907·
Browser Agents are great for exhaustive due diligence. For every major decision I make, I list my key assumptions. Tell my agent swarm: "Is there anything you can find that refutes this?"
English
0
0
0
54