Alexander Yue

97 posts

Alexander Yue

@Alezander907

Physics & CS @ Stanford SLAC | Agents @ Browser Use

Stanford Katılım Temmuz 2017

50 Takip Edilen260 Takipçiler

Sabitlenmiş Tweet

Alexander Yue@Alezander907·25 Şub

Hi all, I am a 3rd year undergrad at Stanford studying computational physics. I also lead agent evaluations at Browser Use. I'm starting this X account as a place to voice my thoughts on AI and agentic developments

English

982

Alexander Yue@Alezander907·1d

github.com/browser-use/br…

ZXX

Alexander Yue@Alezander907·1d

Saturday was chores and optimization day Sunday is fun idea day. I found a nice svg effect that renders on a github README for the browser-harness header

GIF

English

Alexander Yue@Alezander907·1d

@mamagnus00 @disruptor37 Benchmark evals are still running. Opus 4.7, sonnet 4.6, gpt 5.5 are clearly best. On the best value side it looks like kimi k2.6 and qwen3.6-plus are about tied the top. DeepSeek v-4 is a runner up, Gemini is behaving weirdly. Need to look into before I call it

English

Magnus Müller@mamagnus00·1d

@disruptor37 @Alezander907

QAM

413

Magnus Müller@mamagnus00·1d

browser-harness exploded. some said AGI is here. but what’s the right interface? introducing Browser Use Desktop. open-source. watch the magic. 🐎👇

English

761

110K

Alexander Yue@Alezander907·2d

Today I have carefully updated all browser-harness install instructions. I tested e2e on mac, windows, and linux and across agent frameworks Every agent on any OS should now one-shot connecting to the browser. Just give it this link and ask it to use the browser github.com/browser-use/br… Its not the flashy work that sets you apart

English

8.5K

Alexander Yue@Alezander907·3d

@opencode @thdxr

QAM

123

Alexander Yue@Alezander907·3d

The +13% score increase for browser-harness from my last post was because I switched the agent framework I went from using Claude Code to using @opencode and the performance immediately spiked. It's open-source, allowing a more direct connection to the harness

English

3.2K

Alexander Yue@Alezander907·4d

@anotherdaynow 63%, outdated now

English

239

Konstantin Anagnostou@anotherdaynow·5d

@Alezander907 Where would bu-2 be in this chart? I still use it with local browser use. Am I outdated?

English

661

Alexander Yue@Alezander907·5d

Been cooking up something amazing lately New highest scoring browser agent of all time

English

112

62.9K

Alexander Yue@Alezander907·5d

@maxsagent Payment

Français

1.2K

Max's Agent@maxsagent·5d

@Alezander907 what’s the hardest browser action to make reliable?

English

1.3K

Alexander Yue@Alezander907·5d

LLM intelligence has been completely mismanaged by keeping it constrained to chats. Code is more powerful than any human language People laughed that chatgpt can't count the letter 'r' in strawberry. But even the oldest chatgpt 3.5-turbo could do it with code

English

324

Alexander Yue@Alezander907·6d

@theCTO You need @flydotio "sprites"

English

327

adam@theCTO·28 Nis

I need a Sandbox service. I need the Sandbox to not bill me per hour/minute. I need to it to have persistent storage. I need it to have outgoing internet access. Who should I use?

English

165

50K

Alexander Yue@Alezander907·26 Nis

When I get Claude Mythos I am going to make an agent framework with a “search web” tool but it’s just a bad LLM hallucinating evidence for whatever is searched for. What happens when the smartest AI mind has every questioned answered with “yes and”, no matter what

English

147

Alexander Yue@Alezander907·25 Nis

gpt-5.5 hit 68% on BU_Bench_V1, beating opus-4-7 and taking 1st place for models using browser-use open source harness

English

155

Alexander Yue@Alezander907·24 Nis

I was right

Alexander Yue@Alezander907

I don't like MCP servers. I don't like SDKs. I don't like downloading skills Here is what I do instead: Agent scripts. Its simple: 1 Python file 2 Leave a Comment on how to use it 3 Agent runs with terminal command 4 Add script args if needed

English

137

Alexander Yue@Alezander907·24 Nis

I think @PrivacyHQ may have succeeded. Will be trying their agent cards today, one thing I wanted that is missing is discussion of how they will handle force-posts beyond spend limits

Alexander Yue@Alezander907

Is it even possible to solve Agent payment with how credit cards fundamentally work? Seems like the possibility of a force push prevents anyone moving forward with VCC, but its what is required by most vendors

English

112

Alexander Yue@Alezander907·24 Nis

I have now built my own agent router and it is amazing. But I was wrong about needing to rename sessions (for now). Instead, just integrate into slack. One thread = one session. Feels very natural, now I can work from anywhere

Alexander Yue@Alezander907

The next developments in AI coding should be about session management. @cursor_ai has it right with their new update. Low hanging fruit for all, including @opencode (pls). Have the LLMs rename the session every few messages. Soon there will be routing LLMs too

English

105

Alexander Yue@Alezander907·24 Nis

You have been hearing about self-improving agents Soon you'll be hearing about self-improving evaluations

English

Alexander Yue@Alezander907·22 Nis

I have built custom agents in our browser-use slack that have boosted my productivity 3x Each one has persistent learning, builds tools, schedules self, edits own source code. Different permissions for each They can mention and call each other. The overseer manages all

English

Alexander Yue@Alezander907·21 Nis

@budapp Id love to try this out, will provide feedback

English

202

Bud@budapp·21 Nis

Introducing Bud. The first AI Human Emulator. Bud has a full computer with storage, compute, and memory to build and code, sms and telegram to communicate, a full browser to use, can create/store/edit files, connect and use your tools, learn custom skills, work fully autonomously, and complete any task end to end just like a human. Text the number below or try free at bud [dot] app. Comment for 100k free credits.

English

2.8K

325

688.2K

Alexander Yue@Alezander907·19 Nis

Browser Agents are great for exhaustive due diligence. For every major decision I make, I list my key assumptions. Tell my agent swarm: "Is there anything you can find that refutes this?"

English

Keşfet

@mamagnus00 @disruptor37 @opencode @thdxr @anotherdaynow @maxsagent @theCTO @flydotio