Liang Chen

276 posts

Liang Chen

@liangchen5518

Cofounder of @UniPat_AI. I worked at Moonshot AI, Alibaba Qwen and Microsoft Research Asia.

Beijing Katılım Şubat 2022

203 Takip Edilen3K Takipçiler

Liang Chen@liangchen5518·12 Nis

GLM 5.1 from @Zai_org ranks as the top open model on the newly released Monthly-SWEBench by @UniPat_AI—second only to Claude-Opus-4.6. Congrats to the team! 🚀Explore the benchmark: unipat.ai/benchmarks/Mon…

English

135

15.3K

Liang Chen retweetledi

Wenhao Chai@wenhaocha1·11 Nis

Great progress on BabyVision!

Monishwaran Maheswaran@sudomonish

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io

English

3.2K

Liang Chen@liangchen5518·6 Nis

@gietema @JRobertsAI @anshsharma009 @SamuelAlbanie @UniPat_AI Really nice blog about the VLM evaluations!

English

Jochem Gietema@gietema·5 Nis

@JRobertsAI @anshsharma009 @SamuelAlbanie BabyVision by @liangchen5518 and others from @UniPat_AI is a good dataset as well - especially given that they actually reported the performance of 3, 6, 10 and 12-year-olds on (a subset of) this dataset

English

164

Jochem Gietema@gietema·5 Nis

New blog post: an overview of what I learnt by browsing 28 VLM eval datasets.

English

199

Liang Chen@liangchen5518·2 Nis

@_TobiasLee 有点复古了

中文

Lei Li@_TobiasLee·1 Nis

2026年了还在做这个呢早说了 vision token 会传播到 text token 里被后续 token readout

-Zho-@ZHO_ZHO_ZHO

啊？多模态模型有时根本没看图，却答得像看了一样真？斯坦福最新研究指出：多模态模型存在 “mirage reasoning（海市蜃楼式推理）”，即模型根本没看/有图像，照样生成详细图像描述和推理，甚至在医学 benchmark 上拿高分不少多模态其实并非真正视觉理解，论文因此提出了更好的评估方案：B-Clean

中文

11.4K

Liang Chen@liangchen5518·31 Mar

@_TobiasLee @UniPat_AI ❤️

QME

Lei Li@_TobiasLee·31 Mar

@liangchen5518 @UniPat_AI Big con Liang!

English

214

Liang Chen@liangchen5518·30 Mar

In 2026, we’re seeing intelligence translate more directly into real economic impact. Echo is a distinctive example of this—creating value by predicting the future. Lucky to be building with such a great team @UniPat_AI.

UniPat AI@UniPat_AI

Today we’re introducing Echo — our full-stack prediction intelligence system, which turns uncertainty🔮 into profit📈. We Make Prediction General, Evaluable, Trainable and Profitable. 🌐Website: echo.unipat.ai

English

781

Liang Chen@liangchen5518·24 Mar

The scope of CUA (Computer Use Agent) should be extended more than GUI agent. What do you think CUA is?

English

309

Liang Chen retweetledi

Viviennn@0xViviennn·23 Mar

x.com/i/article/2035…

ZXX

Liang Chen@liangchen5518·23 Mar

Also check our opensource code here! github.com/UniPat-AI/SWE-…

English

606

Liang Chen@liangchen5518·22 Mar

Introduce SWE-Vision: A Minimal Agent for Advancing Visual Intelligence While LLMs' coding have surpassed human performance in many benchmarks, visual reasoning still lags behind. We apply a simple stateful env and reach the sota perf. The blog in 👉 unipat.ai/blog/SWE-Vision

English

281

22.9K

Liang Chen retweetledi

UniPat AI@UniPat_AI·9 Mar

UniPat AI introduces UniScientist — a 30B model (3B active) for autonomous scientific research: hypothesis → evidence → verification → iterative refinement until convergence. With just 3B active params, it scores 28.3 on FrontierScience-Research.

English

Liang Chen@liangchen5518·7 Mar

@yihengxu_ congrat

English

Yiheng Xu@yihengxu_·5 Mar

We have taken a huge step forward in computer use. :cua-sleep-rave-repeat:

OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English

4.3K

Liang Chen@liangchen5518·4 Mar

@JustinLin610 Take Care！

English

106

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

730

13.6K

6.6M

Liang Chen@liangchen5518·24 Şub

Congrats Qwen👑! Already feeling the acceleration of visual language intelligence in 2026 — looking forward to the “adult vision” moment.

UniPat AI@UniPat_AI

🔥 BabyVision Leaderboard Update Qwen3.5-397B-A17B now ranks as the #1 open-source model on the BabyVision Benchmark — scoring 43.3 without tools. Huge congrats to the team @Alibaba_Qwen! Check the full leaderboard from unipat.ai/benchmarks/Bab…

English

766

Liang Chen retweetledi

Qwen@Alibaba_Qwen·16 Şub

🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series. 🖼️Native multimodal. Trained for real-world agents. ✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling. ⚡8.6x–19.0x decoding throughput vs Qwen3-Max 🌍201 languages & dialects 📜Apache2.0 licensed 🔗Dive in: GitHub: github.com/QwenLM/Qwen3.5 Chat: chat.qwen.ai API：modelstudio.console.alibabacloud.com/ap-southeast-1… Qwen Code: github.com/QwenLM/qwen-co… Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… blog: qwen.ai/blog?id=qwen3.5

English

271

874

5.3K

1.3M

Liang Chen retweetledi

UniPat AI@UniPat_AI·15 Şub

NEW SOTA 👑 Seed-2.0 Pro is now the #1 model (60.6) on the BabyVision Benchmark, overtaking Gemini-3 Pro (49.7). Huge congrats to the #BytedanceSeed team!

English

4.2K

Liang Chen@liangchen5518·31 Oca

Congrats! @HaoningTimothy @zxytim @ppwwyyxx

UniPat AI@UniPat_AI

BREAKING 🚨: Kimi K2.5 is now the #1 open model on the BabyVision Benchmark, and #2 overall, trailing only Gemini-3-Pro. From 12.4% → 36.5% in 9 months — an incredible leap for VLMs. Huge congrats to the @Kimi_Moonshot team 👏🔥

English

376

Liang Chen retweetledi

AK@_akhaliq·13 Oca

BabyVision Visual Reasoning Beyond Language huggingface.co/papers/2601.06…

English

113

26.7K

Liang Chen@liangchen5518·13 Oca

Excited to introduce BabyVision, a benchmark at the “beginning” of human visual intelligence yet challenging to frontier multimodal LLMs!

UniPat AI@UniPat_AI

Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇

English

469

Liang Chen@liangchen5518·19 Kas

btw, Gemini3-Pro is still the best VLM so far.

English

219

Liang Chen@liangchen5518·19 Kas

Despite Gemini 3's superhuman performance in various domains, accuracy in basic visual tasks like cube counting is still a work in progress. Advancement in VLM seems to be slower than that of LLM. New pattern and paradigm are needed. gemini.google.com/share/a18f65dc…

English

351

Keşfet

@Zai_org @UniPat_AI @gietema @JRobertsAI @anshsharma009 @SamuelAlbanie @_TobiasLee @yihengxu_