Zi

28 posts

Zi banner
Zi

Zi

@dongyangzi

wrangling uis | prev @Databricks @Samsara @Stanford

San Francisco Katılım Mayıs 2011
147 Takip Edilen65 Takipçiler
Zi retweetledi
ikka
ikka@Shahules786·
(1/n) Today, we’re releasing Cloning Bench. Labs are paying 6-7 figures for clones of web apps to do web/computer use-based RL training. At @VibrantLabsAI , our fundamental goal is to automate the creation of RL environments. For web/CUAs, one way that we do that is by using coding agents and custom harness to automatically generated the simulation environment. We tested Codex, Gemini, Claude Code, and GLM using our harness on their ability to recreate a Slack workspace and benchmarked their performances. We have published our methods, results and analysis here today: vibrantlabs.com/blog/cloning-b…
ikka tweet media
English
7
12
141
11.5K
Zi
Zi@dongyangzi·
Excited to announced we're sponsoring the first ever ClickConf in SF on March 26! This event is specifically for computer use developers and researchers to spend time with peers exchanging ideas. Hope you can join us, register here luma.com/nbib4oev?utm_s…
English
1
0
5
67
Zi
Zi@dongyangzi·
@BrianEMcGrath Absolutely-- it's hard to reproduce the myriad of configurations, networking, and settings that make people's computers work.
English
0
0
0
29
BrianEMcGrath
BrianEMcGrath@BrianEMcGrath·
Computer use as a layer on top of existing desktop software is a smarter wedge than most people realize. You do not have to rebuild the software stack. You just need the agent to understand the UI. That "infrastructure desktop software already runs on" framing is the key insight here.
English
1
0
2
68
Zi
Zi@dongyangzi·
@nakul 🔥(the good kind)
English
0
0
0
22
Zi
Zi@dongyangzi·
@gwintrob lets gooooooo
English
0
0
0
43
Zi
Zi@dongyangzi·
@Chung thanks for all the support!
English
0
0
0
15
Zi
Zi@dongyangzi·
@brexton @nakul it's a hiatus x hiatus reference
English
1
0
2
86
Zi
Zi@dongyangzi·
@Chung Thanks so much for all the support!
English
0
0
0
48
Zi
Zi@dongyangzi·
Powerful stuff from the Simular team, and congrats on the launch @angli_ai It's amazing how a small team can take on large labs and exceed at canonical benchmarks like OSWorld. Excited to see this available to general users today
Simular@SimularAI

In another universe, you missed your kid's recital. Your mom's birthday dinner. That anniversary celebration with your person. In this one, you have 𝐒𝐚𝐢. The AI co-worker that does your computer work so you don't have to choose.

English
1
1
4
439
Zi
Zi@dongyangzi·
Amazing to see Mercury 2 pushing the Pareto Frontier between speed and accuracy. This is huge for computer use-- sub second latency between actions gets close to the average human clicks per second. Computer use often doesn't actually require the same deep reasoning that powers mathematical reasoning or scientific research-- the action space per screenshot is quite flat.
Stefano Ermon@StefanoErmon

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English
0
0
1
95
Zi
Zi@dongyangzi·
2023 > openai.chat("1+1") 2024 > chain.invoke({"input": "1+1"}) 2025 > tools = [{ "name": "add_one_and_one"}] 2026 > #SKILL.md 1+1=2 1+2=3
English
0
0
1
46
Zi
Zi@dongyangzi·
It's the crab theory of compute. Saw this happen at Databricks where notebooks eventually became the frontend form factor for isolated compute units, with local filesystems and arbitrary code execution, plus good defaults on execution environments, useful libraries and APIs for data workflows
English
0
0
1
54
Rafael Garcia
Rafael Garcia@rfgarcia·
Feeling bearish on agent frameworks as they are currently defined… 2025: an agent is an LLM calling tools in a loop. Lots of frameworks built around this concept. 2026: an agent is Claude Code (or equivalent) in a VM, with useful programs (CLIs, browser, MCP servers), and a bunch of markdown (skills, commands, AGENTS .md) I want a framework that treats this as the primitive—not the LLM-with-tools abstraction
English
48
12
229
34.9K
Zi
Zi@dongyangzi·
Just finished the excellent shellgame.co season 2 podcast where @ev_rat perfectly nails the "human frustration" of trying to get AI agents to actually work. If an agent asks me if I want to "handle the remaining 174 entries manually," did I really build an agent? Or just a polite wall? 🧱 I'm reading this in Kyle's voice @HurumoAI
Zi tweet media
English
0
0
3
51
Zi
Zi@dongyangzi·
Prediction: a new component in comp package for startup hiring will be coding agent budget. Engineers would rather have a >$1000 a month coding agent spend than better health benefits
English
0
0
1
63