Yujia Qin

366 posts

Yujia Qin banner
Yujia Qin

Yujia Qin

@TsingYoga

ByteDance Seed, Agent, Previously Tsinghua Univ.

Beijing Katılım Şubat 2019
342 Takip Edilen5.6K Takipçiler
Sabitlenmiş Tweet
Yujia Qin
Yujia Qin@TsingYoga·
Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2
Yujia Qin tweet media
English
8
24
183
13.6K
Yujia Qin
Yujia Qin@TsingYoga·
Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2
Yujia Qin tweet media
English
8
24
183
13.6K
Yujia Qin
Yujia Qin@TsingYoga·
@Oli82817545 We did not add GUI data in post-training to avoid potential abuse of this capability. We are still exploring better ways to bring GUI capabilities to users.
English
0
0
3
514
Oli
Oli@Oli82817545·
@TsingYoga whats seed 2.0 os world score? it wasnt listed in the blog
English
1
0
0
602
Yujia Qin
Yujia Qin@TsingYoga·
Proud to introduce Seed1.8, our latest generalized agent model The model achieves competitive agentic capabilities, while maintaining high LLM/VLM scores, enjoy! github.com/ByteDance-Seed…
Yujia Qin tweet media
English
9
33
244
43.9K
Lei Li
Lei Li@_TobiasLee·
@TsingYoga Big con on the release! awesome work!
English
1
0
2
1.4K
IronRed | SandHive
IronRed | SandHive@IronRedSandHive·
@TsingYoga No benchmark reports at all? That’s bold… how else are we supposed to gauge its performance?
English
1
0
0
116
Yujia Qin
Yujia Qin@TsingYoga·
OSWorld remains one of the best open-source evals Real-world envs allow far more reward hacking, e.g., Claude 4.5 Sonnet often uses the terminal to solve GUI tasks instead of sticking to pure GUI interactions But in 2025, if you trust benchmarks, you will have a tough time
Epoch AI@EpochAIResearch

We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!

English
0
1
22
3.2K
Yujia Qin retweetledi
Tianyu Pang
Tianyu Pang@TianyuPang1·
🚨Variational Reasoning for Language Models🚨 We show how treating thinking traces as latent variables unlocks a principled, stable, and unified framework for training reasoning LLMs.
Tianyu Pang tweet media
English
8
72
365
24.5K
Yujia Qin retweetledi
Tianyu Pang
Tianyu Pang@TianyuPang1·
🚀LLMs can learn directly from verbal feedback — no scalar rewards needed! 😥Scalar rewards compress rich feedback— “redundant but correct” vs “concise but typo-ridden” might both be 0.8 💡We propose to learn Feedback-Conditional Policy (FCP), an extremely scalable paradigm!
Tianyu Pang tweet media
English
15
87
469
79.1K
JamesYu
JamesYu@jwyu10·
@TsingYoga sandbox 镜像 的 dockerfile在哪里呀 想自己来构建
中文
1
0
0
30