28 posts

Zi

@dongyangzi

wrangling uis | prev @Databricks @Samsara @Stanford

San Francisco Katılım Mayıs 2011

147 Takip Edilen65 Takipçiler

Zi retweetledi

ikka@Shahules786·1d

(1/n) Today, we’re releasing Cloning Bench. Labs are paying 6-7 figures for clones of web apps to do web/computer use-based RL training. At @VibrantLabsAI , our fundamental goal is to automate the creation of RL environments. For web/CUAs, one way that we do that is by using coding agents and custom harness to automatically generated the simulation environment. We tested Codex, Gemini, Claude Code, and GLM using our harness on their ability to recreate a Slack workspace and benchmarked their performances. We have published our methods, results and analysis here today: vibrantlabs.com/blog/cloning-b…

English

141

11.5K

Zi@dongyangzi·3d

Come hear Zengyi speak and get $20 in Lux credits! One of the fastest and most accurate models available today

Zengyi Qin@qinzytech

Wanna see how we got Lux to be 10x cheaper and 3x faster at the same accuracy as SOTA models? Come here me speak at the inaugural ClickConf in San Francisco luma.com/nbib4oev?utm_c… Sign up now for $20 in Lux credits

English

Zi@dongyangzi·11 Mar

Excited to announced we're sponsoring the first ever ClickConf in SF on March 26! This event is specifically for computer use developers and researchers to spend time with peers exchanging ideas. Hope you can join us, register here luma.com/nbib4oev?utm_s…

English

Zi@dongyangzi·5 Mar

@BrianEMcGrath Absolutely-- it's hard to reproduce the myriad of configurations, networking, and settings that make people's computers work.

English

BrianEMcGrath@BrianEMcGrath·5 Mar

Computer use as a layer on top of existing desktop software is a smarter wedge than most people realize. You do not have to rebuild the software stack. You just need the agent to understand the UI. That "infrastructure desktop software already runs on" framing is the key insight here.

English

Zi@dongyangzi·4 Mar

Computer use is the future. We’ve been building the bridge between the world’s fastest models and the world’s most battle-tested software on desktop. DM me for access

Nen@getnenai

Nen is live. After months in stealth, we're launching the developer platform for computer use agents on Windows. Build, deploy, and scale UI agents on the infrastructure desktop software already runs on. No custom setup. Just ship. getnen.ai/blog/launch-po…

English

15.4K

Zi@dongyangzi·5 Mar

@mzaveri 🫶

QME

Muzzammil Zaveri (MZ)@mzaveri·4 Mar

@dongyangzi Congrats Zi!

Català

Zi@dongyangzi·5 Mar

@nakul 🔥(the good kind)

English

Nakul Mandan@nakul·4 Mar

Congrats on the launch, @dongyangzi 🔥!!

Zi@dongyangzi

Computer use is the future. We’ve been building the bridge between the world’s fastest models and the world’s most battle-tested software on desktop. DM me for access

English

409

Zi@dongyangzi·5 Mar

@gwintrob lets gooooooo

English

Gordon Wintrob@gwintrob·4 Mar

@dongyangzi Congrats on the launch Zi! LFG!!!

English

Zi@dongyangzi·5 Mar

@Chung thanks for all the support!

English

Chung-Man Tam 🇺🇸@Chung·5 Mar

@dongyangzi Excited to see the evolution, Nen!

English

Zi@dongyangzi·5 Mar

@brexton @nakul it's a hiatus x hiatus reference

English

brexton@brexton·4 Mar

@dongyangzi @nakul Pls tell me that Nen is a hunter x hunter reference

English

127

Zi@dongyangzi·5 Mar

@Chung Thanks so much for all the support!

English

Zi@dongyangzi·3 Mar

Powerful stuff from the Simular team, and congrats on the launch @angli_ai It's amazing how a small team can take on large labs and exceed at canonical benchmarks like OSWorld. Excited to see this available to general users today

Simular@SimularAI

In another universe, you missed your kid's recital. Your mom's birthday dinner. That anniversary celebration with your person. In this one, you have 𝐒𝐚𝐢. The AI co-worker that does your computer work so you don't have to choose.

English

439

Zi@dongyangzi·27 Şub

Amazing to see Mercury 2 pushing the Pareto Frontier between speed and accuracy. This is huge for computer use-- sub second latency between actions gets close to the average human clicks per second. Computer use often doesn't actually require the same deep reasoning that powers mathematical reasoning or scientific research-- the action space per screenshot is quite flat.

Stefano Ermon@StefanoErmon

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

Zi@dongyangzi·17 Şub

OSWorld-Verified 61.4% -> 72.5% let's go

Claude@claudeai

This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.

English

Zi@dongyangzi·23 Oca

2023 > openai.chat("1+1") 2024 > chain.invoke({"input": "1+1"}) 2025 > tools = [{ "name": "add_one_and_one"}] 2026 > #SKILL.md 1+1=2 1+2=3

English

Zi@dongyangzi·22 Oca

It's the crab theory of compute. Saw this happen at Databricks where notebooks eventually became the frontend form factor for isolated compute units, with local filesystems and arbitrary code execution, plus good defaults on execution environments, useful libraries and APIs for data workflows

English

Rafael Garcia@rfgarcia·20 Oca

Feeling bearish on agent frameworks as they are currently defined… 2025: an agent is an LLM calling tools in a loop. Lots of frameworks built around this concept. 2026: an agent is Claude Code (or equivalent) in a VM, with useful programs (CLIs, browser, MCP servers), and a bunch of markdown (skills, commands, AGENTS .md) I want a framework that treats this as the primitive—not the LLM-with-tools abstraction

English

229

34.9K

Zi@dongyangzi·22 Oca

Just finished the excellent shellgame.co season 2 podcast where @ev_rat perfectly nails the "human frustration" of trying to get AI agents to actually work. If an agent asks me if I want to "handle the remaining 174 entries manually," did I really build an agent? Or just a polite wall? 🧱 I'm reading this in Kyle's voice @HurumoAI

English

Zi@dongyangzi·20 Kas

Prediction: a new component in comp package for startup hiring will be coding agent budget. Engineers would rather have a >$1000 a month coding agent spend than better health benefits

English

Keşfet

@VibrantLabsAI @BrianEMcGrath @mzaveri @nakul @gwintrob @Chung @brexton @angli_ai