Yujia Qin

366 posts

Yujia Qin

@TsingYoga

ByteDance Seed, Agent, Previously Tsinghua Univ.

Beijing Katılım Şubat 2019

342 Takip Edilen5.6K Takipçiler

Sabitlenmiş Tweet

Yujia Qin@TsingYoga·17 Şub

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2

English

183

13.6K

Yujia Qin@TsingYoga·18 Şub

@turingbook 过年前就做好了哈哈

中文

220

刘江/LIU Jiang@turingbook·17 Şub

@TsingYoga 太卷了，大过节的你们也不放假啊😎

中文

849

Yujia Qin@TsingYoga·17 Şub

English

183

13.6K

Yujia Qin@TsingYoga·17 Şub

@Oli82817545 We did not add GUI data in post-training to avoid potential abuse of this capability. We are still exploring better ways to bring GUI capabilities to users.

English

514

Oli@Oli82817545·17 Şub

@TsingYoga whats seed 2.0 os world score? it wasnt listed in the blog

English

602

Yujia Qin@TsingYoga·26 Ara

This is the only meaningful benchmark

tphuang@tphuang

Doubao becomes 1st Chinese AI app to reach 100m DAU (only counting China). Volcano Engine recently reported that Doubao LLM token consumption has grown to 50T/day (3x May figures), so popularity of its text, image & video models are all very popular.

English

7.8K

Yujia Qin@TsingYoga·19 Ara

Also, unlike most US/CN language models, Seed1.8 is trained without incorporating any distillation data from external sources

Yujia Qin@TsingYoga

Proud to introduce Seed1.8, our latest generalized agent model The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy! github.com/ByteDance-Seed…

English

110

16.9K

Yujia Qin@TsingYoga·19 Ara

@nikivdev The model is closed-sourced

English

671

nikiv.dev@nikivdev·19 Ara

@TsingYoga are there open weights for this?

English

820

Yujia Qin@TsingYoga·19 Ara

Proud to introduce Seed1.8, our latest generalized agent model The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy! github.com/ByteDance-Seed…

English

244

43.9K

Yujia Qin@TsingYoga·19 Ara

@_TobiasLee 🥰

QME

1.2K

Lei Li@_TobiasLee·19 Ara

@TsingYoga Big con on the release! awesome work!

English

1.4K

Yujia Qin@TsingYoga·11 Ara

@TaylorOgan ⚡️⚡️⚡️

QME

Taylor Ogan@TaylorOgan·11 Ara

@TsingYoga How soon, and how fast? 😉

English

185

Yujia Qin@TsingYoga·11 Ara

🫡 Soon it will be super fast⏩

Taylor Ogan@TaylorOgan

I just told my phone: "Play wordle and win." It opened the app for the first time, read the rules, guessed a word, waited for an ad to play, and got the word in three attempts. Insane!

English

3.2K

Yujia Qin@TsingYoga·5 Ara

@IronRedSandHive buy the phone, and try it~ The utility is awesome

English

101

IronRed | SandHive@IronRedSandHive·5 Ara

@TsingYoga No benchmark reports at all? That’s bold… how else are we supposed to gauge its performance?

English

116

Yujia Qin@TsingYoga·5 Ara

See what Doubao Agent (backed up by UI-TARS) can do! No need to report benchmarks, usage is our best evaluation!

Taylor Ogan@TaylorOgan

Another DeepSeek moment. This is the world’s first actual smart phone. It’s an engineering prototype of ZTE’s Nubia M153 running ByteDance’s Doubao AI agent fused into Android at the OS level. It has complete control over the phone. It can see the UI, choose/download apps, tap/type, call, and run multi-step task chains. Here I just say (in English) “find someone to wait in line for me” (something you can do in China), and it picks which app to open, configures the job, and hands me one confirm screen. I wouldn’t otherwise know how to do this, and here the phone just did it in a matter of seconds.

English

5.9K

Yujia Qin@TsingYoga·13 Kas

Let's play Genshin step by step

Weihao Tan@WeihaoTan64

🚀Introducing Lumine, a generalist AI agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions and following diverse instructions within complex 3D open-world environments.🎮 Website: lumine-ai.org 1/6

English

3.2K

Yujia Qin@TsingYoga·4 Kas

OSWorld remains one of the best open-source evals Real-world envs allow far more reward hacking, e.g., Claude 4.5 Sonnet often uses the terminal to solve GUI tasks instead of sticking to pure GUI interactions But in 2025, if you trust benchmarks, you will have a tough time

Epoch AI@EpochAIResearch

We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!

English

3.2K

Yujia Qin@TsingYoga·29 Eki

A blog from @ycjcl detailing the AIO sandbox sandbox.agent-infra.com/blog/announcin…

Yujia Qin@TsingYoga

The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳 sandbox.agent-infra.com github.com/agent-infra/sa…

English

2.4K

Yujia Qin@TsingYoga·29 Eki

Check out Game-TARS, a generalized multimodal game agent. It's literally the best general game AI in the world, and it's very small~ Paper: arxiv.org/abs/2510.23691 Blog: seed-tars.com/game-tars/

Zihao Wang@RealZihaoWang

🚀 Thrilled to introduce Game-TARS: our next-gen generalist multimodal game agent! Tired of AI that needs custom code for every new game? Game-TARS is a single VLM that learns to master any game just like a human: by watching the screen and using a keyboard & mouse. Read more.

English

109

14.5K

Yujia Qin@TsingYoga·17 Eki

RL for pretraining 🚫 RL as continual pretraining 🫡

Rishabh Agarwal@agarwl_

*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable (e.g., we extrapolated to 3x compute for a 17Bx16 MoE from 16k GPU Hours to 50k hours). The other is when comparing algorithms, embrace the bitter lesson (try to predict how well it would scale with compute using a given performance curve, instead of just performance at a fixed compute). Most algorithmic tricks in a scalable RL method don't change the asymptote performance, but things like model size, context length, batch size, and data does. There are of course many design choices in RL, so we don’t think that the ScaleRL recipe is the end of the story.

English

3.9K

Yujia Qin retweetledi

Tianyu Pang@TianyuPang1·29 Eyl

🚨Variational Reasoning for Language Models🚨 We show how treating thinking traces as latent variables unlocks a principled, stable, and unified framework for training reasoning LLMs.

English

365

24.5K

Yujia Qin retweetledi

Tianyu Pang@TianyuPang1·29 Eyl

🚀LLMs can learn directly from verbal feedback — no scalar rewards needed! 😥Scalar rewards compress rich feedback— “redundant but correct” vs “concise but typo-ridden” might both be 0.8 💡We propose to learn Feedback-Conditional Policy (FCP), an extremely scalable paradigm!