Zhiheng Liu

80 posts

Zhiheng Liu

Zhiheng Liu

@__Johanan

Ph.D. student at the Department of Computer Science, The University of Hong Kong (HKU).

Katılım Ağustos 2021
393 Takip Edilen127 Takipçiler
Zhiheng Liu retweetledi
Jiawei Yang
Jiawei Yang@JiaweiYang118·
Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.
Jiawei Yang tweet media
English
53
152
894
198K
Zhiheng Liu retweetledi
AK
AK@_akhaliq·
Meta presents Tuna-2 Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation paper: huggingface.co/papers/2604.24…
AK tweet media
English
14
40
291
59.8K
Zhiheng Liu retweetledi
Yuren Cong
Yuren Cong@CongYuren·
1/🚀 Excited to announce Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! We built an omni model utilizing direct patch embedding layers for raw image inputs and achieves SOTA in multimodal understanding AND generation. Paper: huggingface.co/papers/2604.24… Code: github.com/facebookresear… Thanks to all the co-authors! @__Johanan, @wmren993, @xiaoke_shawn_h, @ShoufaChen, @TianhongLi6, Mengzhao Chen, Yatai Ji, Sen He, Jonas Schult, Belinda Zeng, Tao Xiang, @WenhuChen, Ping Luo, @LukeZettlemoyer!
Yuren Cong tweet media
English
11
11
88
84.8K
Zhiheng Liu
Zhiheng Liu@__Johanan·
@rosinality Thanks for sharing our work! I remember that you also shared tuna 1. Thank you for your attention to our work!🥳
English
1
0
3
604
Zhiheng Liu retweetledi
Rosinality
Rosinality@rosinality·
Pixel-based unified understanding and generation model using JiT. Uses MAE for representation learning.
Rosinality tweet media
English
4
54
340
23K
David Fan
David Fan@DavidJFan·
This week I joined AMI Labs as a founding member of the NYC lab! I'm super excited to build AI systems that truly understand the physical world. I'm also excited to help build the research agenda, culture, and team from the ground up, and learn what it takes to build a company. It's a real privilege to be here. The last ~2 years at FAIR have been the most rewarding of my professional life so far. When I first started doing research almost 10 years ago, joining FAIR felt like a pipe dream. I became a better researcher, open-sourced for the first time (V-JEPA 2, WebSSL, MetaMorph, DexWM), grew as a person, made life-long friends, and even rekindled old hobbies — like playing clarinet with the Meta NYC orchestra. I want to thank Mike Rabbat, Maryam Fazel-Zarandi, the JEPA team, and all my amazing colleagues + collaborators who made this dream come true. The research world is small, so I know our paths will cross again. Please stay in touch!
AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English
36
16
492
58.7K
Zhiheng Liu retweetledi
Zhaochong An
Zhaochong An@ZhaochongAn·
🚀New survey paper "Video Understanding: From Geometry and Semantics to Unified Models" 💡A structured review of video understanding across geometry, semantics, and unified models, with discussion on emerging joint paradigms and future directions. 📖arxiv.org/pdf/2603.17840… 🧵👇
Zhaochong An tweet media
English
1
9
25
1.7K
Zhiheng Liu retweetledi
Han Lin
Han Lin@hanlin_hl·
🚀 Excited to share V-Co, a diffusion model that jointly denoises pixels and pretrained semantic features (e.g., DINO). We find a simple but effective recipe: 1️⃣ architecture matters a lot --> fully dual-stream JiT 2️⃣ CFG needs a better unconditional branch --> semantic-to-pixel masking for CFG 3️⃣ the best semantic supervision is hybrid --> perceptual-drifting hybrid loss 4️⃣ calibration is essential --> RMS-based feature rescaling We conducted a systematic study on V-Co, which is highly competitive at a comparable scale, and outperforms JiT-G/16 (~2B, FID 1.82) with fewer training epochs. 🧵 👇
Han Lin tweet media
English
2
41
129
21.7K
Jeff Liang
Jeff Liang@LiangJeff95·
用同事年轻照片和他暗恋对象做了个晴天的MV,一把直出 @UtopaiStudios
中文
1
1
9
499
Zhiheng Liu retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…
English
43
164
1K
380.6K
Fanqing Meng
Fanqing Meng@FanqingMengAI·
1. 个人认为现在大部分的agent产品都是vibe几小时即可出来的产物 2. agent产品的难度完全不在于agent,在于基模以及底层的手脚(比如完善的训练infra ,数据infra, 飞书这种对bot支持很好的东西) 3. 人就写skill 附上昨晚跨年vibe 2小时的产物 ) 基本可以实现手机语音research了,server + 手机解决一切
Fanqing Meng tweet mediaFanqing Meng tweet media
中文
4
5
63
13.7K
Zhiheng Liu retweetledi
傅盛
傅盛@FuSheng_0306·
【黄仁勋演示Agent概念视频】前沿模型API调用+本地模型处理私密信息+能识别意图的模型选择器+小型机器人多模态交互(Reachy mini)= 私人智能助手。可以看看原视频,感受下未来和Agent一起工作和生活的一天是什么样吧。
中文
26
172
683
112.5K
Zhiheng Liu retweetledi
bycloud
bycloud@bycloudai·
for more context on OpenAI's MRCR benchmark, curated at contextarena.ai by @DillonUzar , Gemini 3 flash achieved 90% acc @ 1 million ctx this performance is SoTA across all models, most SoTA models cant even go past 256k ctx at this length, you cant be using standard attention, it'll perform bad anyways, and ofc it'll be very expensive. (Gemini 3 flash is $0.5 in $3 out) Some sort of efficient attention is implemented, so thats why the price is hitting the same level as a linear/sparse attention model BUT linear attention (hybrid) is only good at long ctx bench, and suck at knowledge task. G3F is great at knowledge, even #3 on Artificial Analysis Index. they cant be using any SSM/mamba variants hybrid (at least w/ standard attn) either as those suck at long ctx. Same as sparse attention, as you can see from DeepSeek 3.2's DSA So what black magic did they do? guess i'll never find out :( (unless i join google...??) contextarena.ai/?models=cohere…
bycloud tweet media
English
16
27
320
181.3K
Zhiheng Liu retweetledi
Ronak Malde
Ronak Malde@rronak_·
This might be my favorite paper of the year🤯 Rich Sutton claims that current RL methods won't get us to continual learning because they don't compound upon previous knowledge, every rollout starts from scratch. Researchers in Switzerland introduce Meta-RL which might crack that code. Optimize across episodes with a meta-learning objective, which then incentivizes agents to explore first and then exploit. And then reflect upon previous failures for future agent runs. Incredible results and incredible read of a paper overall. Authors: @YulunJiang @LiangzeJ @DamienTeney @Michael_D_Moor @mariabrbic
Ronak Malde tweet media
English
31
104
885
120.8K