Zinan Lin

124 posts

Zinan Lin

Zinan Lin

@lin_zinan

Principal Researcher at @MSFTResearch, PhD from @CarnegieMellon

United States Katılım Mart 2020
64 Takip Edilen215 Takipçiler
Zinan Lin retweetledi
Xihui Liu
Xihui Liu@XihuiLiu·
🚀 Thrilled to share our work **CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation**, selected as a **CVPR 2026 Highlight**! 🏆 Video generation currently faces a dilemma: ❌ **2D Diffusion**: Lacks spatial consistency in large camera moves. ❌ **Explicit 3D**: Complex, slow, and prone to reconstruction artifacts. We bridge this gap by using **Implicit 3D as a Spatial Anchor**, using this representation as **context** into the video generation model. ⚓️ Through **Scene-Decoupled Diffusion**, we represent the static environment via implicit 3D features, decoupling scene priors from dynamic motion. This enables unprecedented scene consistency and precise control over complex scenes, camera paths, and characters. 🎬 CineScene is a versatile toolkit for the future of filmmaking & World Models: 📍 **Virtual Stage**: High-fidelity, consistent environments for virtual production. 🎬 **Scene Blocking**: Directorial control over scene background, camera paths, and prompt-driven foreground character dynamics. 🌍 **World Simulators**: A potential step towards stable, consistent world modeling. We are also excited to open-source the **Scene-Decoupled Video Dataset**, a large-scale, high-quality collection to empower the community! 🎁 🔗 Project: [karine-huang.github.io/CineScene/](google.com/url?sa=E&q=htt…) 📊 Dataset: [huggingface.co/datasets/Kling…](google.com/url?sa=E&q=htt…) 📄 ArXiv: [arxiv.org/abs/2602.06959](google.com/url?sa=E&q=htt…) Huge thanks to the amazing co-authors! 🙏 @KaiyiHUANG84276 @xinntao @yukun6414 @yujiwenHK @jianhongbai @lin_zinan @FiNingm @wanfufeng \#CVPR2026 #AIvideo #GenerativeAI #FilmGeneration #WorldModels #ComputerVision @CVPR
English
2
6
51
4.2K
Kangwook Lee
Kangwook Lee@Kangwook_Lee·
@lin_zinan Thanks for sharing this with us. Seems very relevant. We will definitely take a good look at it!
English
1
0
0
50
Kangwook Lee
Kangwook Lee@Kangwook_Lee·
DLLMs seem promising... but parallel generation is not always possible Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one. This makes diffusion-based LLMs highly attractive when we need fast generation with less compute. A big question is … is parallel generation possible without losing modeling accuracy? The answer is no. There are fundamental limits on how much parallelism we can achieve. Consider this example: “Pick one city uniformly at random from the following four cities: New York, New Orleans, Mexico City, or Panama City.” Then, P(Y₁ = New, Y₂ = York) = 1/4, P(Y₁ = New, Y₂ = Orleans) = 1/4, and so on. Thus, P(Y₁ = New) = 1/2, P(Y₂ = City) = 1/2. If you choose to generate Y₁ and Y₂ in parallel, no matter which decoding algorithm you use … You’re doomed to sample “New City.” None of today’s DLLMs can generate these two words correctly without giving up parallelism. ----- Why is this the case? In fact, we never train LLMs to learn the joint distribution over multiple tokens in one forward iteration. We always teach a single-token marginal distribution conditioned on context. (The same holds for autoregressive models too.) Therefore, sampling multiple tokens at once is only possible when those tokens are mutually independent given the current context. And this limitation of parallel sampling can be precisely formalized. One can derive an information-theoretic limit that’s decoding-strategy agnostic, and also derive strategy-specific limits. ----- So are DLLMs doomed? No! They have huge potential to save compute and time. But: (1) we need to be aware of their fundamental limitations, and (2) we need to design better training and decoding strategies. In particular, there’s huge room for improvement in decoding. Why? Ideally, we want the model to control the degree of parallelism during generation. At the same time, it should choose a subset of future tokens that are almost mutually independent given the current context. Are current decoding strategies good at this? Hard to tell. Most DLLMs were never stress-tested for it. ----- That’s why we introduced a synthetic benchmark to stress-test DLLMs. We call it ParallelBench. The idea is simple: these are natural language tasks, but carefully designed so that parallel generation is inherently difficult. (Think “New City”, but more natural, real tasks.) What did we find? We tested popular DLLMs with various decoding algorithms, and none came close to “oracle” performance, the ideal performance you’d get if the model could optimally adjust its parallelism during decoding. ----- Takeaway: (1) Parallel generation is not always possible and check out our paper for more details :) (2) If you can design a DLLM that matches oracle performance on our benchmark, well, who knows, you might just get a call from someone in Menlo Park. 😉
Kangwook Lee tweet mediaKangwook Lee tweet mediaKangwook Lee tweet media
English
12
53
336
66.7K
Zinan Lin
Zinan Lin@lin_zinan·
Super cool! Really innovative use of Private Evolution votes as the RL reward to fine-tune LLMs — a big step forward for DP synthetic data. Congrats, @hou_char!
Charlie Hou@hou_char

Gave a talk at @OpenAI on our work 🌸 POPri “Policy Optimization for Private Data”. POPri is a huge improvement in synthetic data generation under security+privacy constraints! Learn more:

English
0
0
2
148
Stat.ML Papers
Stat.ML Papers@StatMLPapers·
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification ift.tt/oMkDGEN
English
1
0
2
639
Zinan Lin retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Microsoft introduces Latent Zoning Network (LZN) A unified principle for generative modeling, representation learning, and classification. LZN uses a shared Gaussian latent space and modular encoders/decoders to tackle all three core ML problems at once!
DailyPapers tweet media
English
4
29
220
20.4K
Zinan Lin retweetledi
fly51fly
fly51fly@fly51fly·
[LG] Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Z Lin, E Liu, X Ning, J Zhu... [Microsoft Research & Tsinghua University] (2025) arxiv.org/abs/2509.15591
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
1
6
17
2.2K
Zinan Lin retweetledi
cool ai and ml papers
cool ai and ml papers@aimodelsfyi·
researchers really said what if we made ONE model that does everything and called it latent zoning like theyre planning a neighborhood 🏘️ but honestly the ambition of trying to unify generative modeling, representation learning AND classification is... aimodels.fyi/papers/arxiv/l…
English
2
1
1
170
AI 原生基金会
AI 原生基金会@AINativeF_zh·
3. Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification     潜在分区网络:生成建模、表示学习和分类的统一原理 🔑 关键词: 潜在分区网络, 生成建模, 表征学习, 分类 💡 类别: 生成模型 🌟 研究目标: - 本研究旨在通过创建一个共享的潜在空间——称为潜在分区网络(LZN),统一生成建模、表征学习和分类任务。 🛠️ 研究方法: - LZN 创建了一个共享的高斯潜在空间,使得跨多种数据类型(如图像、文本和标签)的编码和解码成为可能。机器学习任务则通过编码器和解码器的组合进行配置。 💬 研究结论: - LZN 提升了现有模型在图像生成等任务中的性能,改善了无辅助损失的无监督表征学习,并且能够联合任务执行,提高了 FID,并在 CIFAR10 等数据集上达到了最先进的分类准确率。 👉论文地址: huggingface.co/papers/2509.15…
AI 原生基金会 tweet media
中文
2
0
1
20
AI 原生基金会
AI 原生基金会@AINativeF_zh·
📚 AI Native 每日论文摘要 - 2025-09-22🌟 关注我们 @AINativeF_zh,获取AI原生领域的最新洞察。 本期介绍下图中来自Hugging Face的AI研究论文,帮助您及时了解最新研究趋势,让我们一起探索AI的未来! #AI #HuggingFace #AIPaper #AINative #AINF — 附录:今日AI研究论文 — 1. RPG: 用于统一和可扩展代码库生成的存储库规划图 2. MANZANO:一种简单且可扩展的混合视觉标记器的统一多模态模型 3. 潜在分区网络:生成建模、表示学习和分类的统一原理 4. BaseReward: 强大的多模态奖励模型基线 5. SPATIALGEN: 布局引导的三维室内场景生成 6. Lynx:迈向高保真个性化视频生成 7. 用于机器人真实环境强化学习的视觉-语言-动作-评论模型 8. BTL-UI: 面向GUI代理的眨眼-思考-链接推理模型 9. 动态场景中仅RGB监督的相机参数优化 10. 你听懂我的意思了吗?量化指导性表达文本转语音系统中的指令感知差距 11. WhisTLE:深度监督的、仅文本的预训练语音识别转换器领域自适应 12. Video2Roleplay: 一个用于视频引导角色扮演代理的多模态数据集和框架 13. 提出澄清:通过多轮对话解决指令歧义问题
AI 原生基金会 tweet media
中文
1
0
1
61
AI Native Foundation
AI Native Foundation@AINativeF·
3. Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification 🔑 Keywords: Latent Zoning Network, generative modeling, representation learning, classification 💡 Category: Generative Models 🌟 Research Objective: - The study aims to unify generative modeling, representation learning, and classification through the creation of a shared latent space known as the Latent Zoning Network (LZN). 🛠️ Research Methods: - LZN creates a shared Gaussian latent space, enabling encoding and decoding across various data types like images, text, and labels. ML tasks are configured using combinations of encoders and decoders. 💬 Research Conclusions: - LZN enhances existing models for tasks like image generation, improves unsupervised representation learning without auxiliary losses, and enables joint task execution, improving FID and achieving state-of-the-art classification accuracy on datasets like CIFAR10. 👉 Paper link: huggingface.co/papers/2509.15…
AI Native Foundation tweet media
English
2
0
1
35
AI Native Foundation
AI Native Foundation@AINativeF·
📚 AI Native Daily Paper Digest - 2025-09-22🌟 Follow @AINativeF for the latest insights on AI Native. Covering AI research papers from Hugging Face, featured in the image. 💡 Stay updated with the latest research trends and dive deep into the future of AI! 🚀 #AI #HuggingFace #AIPaper #AINative #AINF — Appendix: Today's AI research papers — 1. RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation 2. MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer 3. Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification 4. BaseReward: A Strong Baseline for Multimodal Reward Model 5. SPATIALGEN: Layout-guided 3D Indoor Scene Generation 6. Lynx: Towards High-Fidelity Personalized Video Generation 7. A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning 8. BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent 9. RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes 10. Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems 11. WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers 12. Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents 13. Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue
AI Native Foundation tweet media
English
1
0
4
300
simapofang
simapofang@nojobafterphoto·
A #NeurIPS2025 paper from @Microsoft arxiv.org/pdf/2509.15591 Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
simapofang tweet media
English
1
0
1
84
AK
AK@_akhaliq·
Latent Zoning Network A Unified Principle for Generative Modeling, Representation Learning, and Classification
AK tweet media
English
4
7
74
26.1K