

Liang Chen
276 posts

@liangchen5518
Cofounder of @UniPat_AI. I worked at Moonshot AI, Alibaba Qwen and Microsoft Research Asia.



Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework — then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: squeeze-evolve.github.io




啊? 多模态模型有时根本没看图,却答得像看了一样真? 斯坦福最新研究指出:多模态模型存在 “mirage reasoning(海市蜃楼式推理)”,即模型根本没看/有图像,照样生成详细图像描述和推理,甚至在医学 benchmark 上拿高分 不少多模态其实并非真正视觉理解,论文因此提出了更好的评估方案:B-Clean

Today we’re introducing Echo — our full-stack prediction intelligence system, which turns uncertainty🔮 into profit📈. We Make Prediction General, Evaluable, Trainable and Profitable. 🌐Website: echo.unipat.ai





GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

🔥 BabyVision Leaderboard Update Qwen3.5-397B-A17B now ranks as the #1 open-source model on the BabyVision Benchmark — scoring 43.3 without tools. Huge congrats to the team @Alibaba_Qwen! Check the full leaderboard from unipat.ai/benchmarks/Bab…







Can frontier MLLMs see like a 3-year-old? We’re releasing BabyVision — a vision-centric benchmark that isolates pre-linguistic visual primitives kids solve effortlessly, but models still struggle with.👇
