Zhiheng Liu

80 posts

Zhiheng Liu

@__Johanan

Ph.D. student at the Department of Computer Science, The University of Hong Kong (HKU).

Katılım Ağustos 2021

393 Takip Edilen127 Takipçiler

Sabitlenmiş Tweet

Zhiheng Liu@__Johanan·2 Ara

Huge thanks to @_akhaliq for sharing our work! We introduce TUNA, a unified multimodal model that handles both image/video understanding and generation/editing. The key is a unified, end-to-end learned visual representation.

AK@_akhaliq

Meta presents TUNA Taming Unified Visual Representations for Native Unified Multimodal Models

English

13.4K

Zhiheng Liu retweetledi

Jiawei Yang@JiaweiYang118·4d

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.

English

152

894

198K

Zhiheng Liu@__Johanan·5d

@bdsqlsz Thanks for sharing!

English

167

青龍聖者@bdsqlsz·5d

Abandoning VAE and moving towards pixel space seems to be a trend.🧐

AK@_akhaliq

Meta presents Tuna-2 Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation paper: huggingface.co/papers/2604.24…

English

126

13.8K

Zhiheng Liu@__Johanan·6d

@felixudr @_akhaliq @liuziwei7 Thanks for sharing!🫡

English

Felix Juefei Xu@felixudr·6d

🍣🍣🍣🎉🤩🤩🤩 great work from Meta. Also great work from SenseNova-U1 github.com/OpenSenseNova/… @liuziwei7

AK@_akhaliq

Meta presents Tuna-2 Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation paper: huggingface.co/papers/2604.24…

English

11.4K

Zhiheng Liu@__Johanan·6d

@JiaweiYang118 Thanks Jiawei! I learned a lot from your work!

English

186

Jiawei Yang@JiaweiYang118·6d

Cool work! Congrats on the release!

Yuren Cong@CongYuren

1/🚀 Excited to announce Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! We built an omni model utilizing direct patch embedding layers for raw image inputs and achieves SOTA in multimodal understanding AND generation. Paper: huggingface.co/papers/2604.24… Code: github.com/facebookresear… Thanks to all the co-authors! @__Johanan, @wmren993, @xiaoke_shawn_h, @ShoufaChen, @TianhongLi6, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, @WenhuChen, Ping Luo, @LukeZettlemoyer!

English

1.1K

Zhiheng Liu retweetledi

AK@_akhaliq·6d

Meta presents Tuna-2 Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation paper: huggingface.co/papers/2604.24…

English

291

59.8K

Zhiheng Liu retweetledi

Yuren Cong@CongYuren·6d

English

84.8K

Zhiheng Liu@__Johanan·6d

@rosinality Thanks for sharing our work! I remember that you also shared tuna 1. Thank you for your attention to our work!🥳

English

604

Zhiheng Liu retweetledi

Rosinality@rosinality·6d

Pixel-based unified understanding and generation model using JiT. Uses MAE for representation learning.

English

340

23K

Zhiheng Liu retweetledi

Tianbao Xie@TianbaoX·12 Nis

Nicely done. I would also shamelessly recommend my collection on test time training, which has overlap with online learning topic. github.com/Timothyxxx/Tes…

Tianle Cai@tianle_cai

x.com/i/article/2042…

English

5.3K

Zhiheng Liu retweetledi

Yuwei Niu@purshow04·10 Nis

x.com/i/article/2042…

ZXX

11.3K

Zhiheng Liu@__Johanan·28 Mar

@DavidJFan Congrats!

English

153

David Fan@DavidJFan·28 Mar

This week I joined AMI Labs as a founding member of the NYC lab! I'm super excited to build AI systems that truly understand the physical world. I'm also excited to help build the research agenda, culture, and team from the ground up, and learn what it takes to build a company. It's a real privilege to be here. The last ~2 years at FAIR have been the most rewarding of my professional life so far. When I first started doing research almost 10 years ago, joining FAIR felt like a pipe dream. I became a better researcher, open-sourced for the first time (V-JEPA 2, WebSSL, MetaMorph, DexWM), grew as a person, made life-long friends, and even rekindled old hobbies — like playing clarinet with the Meta NYC orchestra. I want to thank Mike Rabbat, Maryam Fazel-Zarandi, the JEPA team, and all my amazing colleagues + collaborators who made this dream come true. The research world is small, so I know our paths will cross again. Please stay in touch!

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

492

58.7K

Zhiheng Liu retweetledi

Zhaochong An@ZhaochongAn·20 Mar

🚀New survey paper "Video Understanding: From Geometry and Semantics to Unified Models" 💡A structured review of video understanding across geometry, semantics, and unified models, with discussion on emerging joint paradigms and future directions. 📖arxiv.org/pdf/2603.17840… 🧵👇

English

1.7K

Zhiheng Liu retweetledi

Han Lin@hanlin_hl·18 Mar

🚀 Excited to share V-Co, a diffusion model that jointly denoises pixels and pretrained semantic features (e.g., DINO). We find a simple but effective recipe: 1️⃣ architecture matters a lot --> fully dual-stream JiT 2️⃣ CFG needs a better unconditional branch --> semantic-to-pixel masking for CFG 3️⃣ the best semantic supervision is hybrid --> perceptual-drifting hybrid loss 4️⃣ calibration is essential --> RMS-based feature rescaling We conducted a systematic study on V-Co, which is highly competitive at a comparable scale, and outperforms JiT-G/16 (~2B, FID 1.82) with fewer training epochs. 🧵 👇

English

129

21.7K

Zhiheng Liu@__Johanan·13 Mar

@LiangJeff95 @UtopaiStudios 🐮🍺

QME

Jeff Liang@LiangJeff95·13 Mar

用同事年轻照片和他暗恋对象做了个晴天的MV，一把直出 @UtopaiStudios

中文

499

Zhiheng Liu retweetledi

Vincent Sitzmann@vincesitzmann·16 Şub

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

164

380.6K

Zhiheng Liu@__Johanan·17 Şub

@FanqingMengAI 不是大过年的也卷啊

中文

766

Fanqing Meng@FanqingMengAI·17 Şub

1. 个人认为现在大部分的agent产品都是vibe几小时即可出来的产物 2. agent产品的难度完全不在于agent，在于基模以及底层的手脚（比如完善的训练infra ，数据infra，飞书这种对bot支持很好的东西） 3. 人就写skill 附上昨晚跨年vibe 2小时的产物）基本可以实现手机语音research了，server + 手机解决一切

中文

13.7K

Zhiheng Liu retweetledi

傅盛@FuSheng_0306·6 Oca

【黄仁勋演示Agent概念视频】前沿模型API调用+本地模型处理私密信息+能识别意图的模型选择器+小型机器人多模态交互（Reachy mini）= 私人智能助手。可以看看原视频，感受下未来和Agent一起工作和生活的一天是什么样吧。

中文

172

683

112.5K

Zhiheng Liu retweetledi

bycloud@bycloudai·20 Ara

for more context on OpenAI's MRCR benchmark, curated at contextarena.ai by @DillonUzar , Gemini 3 flash achieved 90% acc @ 1 million ctx this performance is SoTA across all models, most SoTA models cant even go past 256k ctx at this length, you cant be using standard attention, it'll perform bad anyways, and ofc it'll be very expensive. (Gemini 3 flash is $0.5 in $3 out) Some sort of efficient attention is implemented, so thats why the price is hitting the same level as a linear/sparse attention model BUT linear attention (hybrid) is only good at long ctx bench, and suck at knowledge task. G3F is great at knowledge, even #3 on Artificial Analysis Index. they cant be using any SSM/mamba variants hybrid (at least w/ standard attn) either as those suck at long ctx. Same as sparse attention, as you can see from DeepSeek 3.2's DSA So what black magic did they do? guess i'll never find out :( (unless i join google...??) contextarena.ai/?models=cohere…

English

320

181.3K

Zhiheng Liu retweetledi

Ronak Malde@rronak_·22 Ara

This might be my favorite paper of the year🤯 Rich Sutton claims that current RL methods won't get us to continual learning because they don't compound upon previous knowledge, every rollout starts from scratch. Researchers in Switzerland introduce Meta-RL which might crack that code. Optimize across episodes with a meta-learning objective, which then incentivizes agents to explore first and then exploit. And then reflect upon previous failures for future agent runs. Incredible results and incredible read of a paper overall. Authors: @YulunJiang @LiangzeJ @DamienTeney @Michael_D_Moor @mariabrbic

English

104

885

120.8K

Zhiheng Liu retweetledi

周弈帆 (Yifan Zhou)@zhouyifan1107·19 Ara

We introduce Log-linear Sparse Attention (LLSA), a trainable sparse attention mechanism that reduces attention complexity from O(N²) to O(N log N). 📄 Paper: arxiv.org/abs/2512.16615 💻 Code: github.com/SingleZombie/L…

English

2.1K

Keşfet

@bdsqlsz @felixudr @_akhaliq @liuziwei7 @JiaweiYang118 @wmren993 @xiaoke_shawn_h @ShoufaChen