Yanshu Zhang (@yszhang170) - Twitter Profili | Zamantika Mersobahis Locabet

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·20 Nis

Introducing WIMLE, a model-based RL method that substantially improves sample efficiency and asymptotic performance on hard tasks. Rather assuming a Gaussian world model, WIMLE trains a world model with IMLE. Joint w/ @mehranag, @Moazeni_Alireza, @yszhang170. See 👇 for links.

English

1

7

29

3.9K

Yanshu Zhang retweetledi

Mike Wong@artixels·20 Eyl

#bn8192 #dotdot #bluenoise #LagrangianImage #meow

GIF

QME

54

1.8K

12K

0

Yanshu Zhang retweetledi

Ruiqi Wang@RuiqisNotes·17 Haz

🎉 Excited to share our new work: 👓Ego-R1! We cracked ultra-long egocentric video reasoning! 🤯 Think days/weeks of footage processed efficiently with Chain-of-Tool-Thought ⛓️🔧 🌐 egolife-ai.github.io/Ego-R1

Shulin Tian@shulin_tian

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: egolife-ai.github.io/Ego-R1 📄 Paper: arxiv.org/abs/2506.13654 💻 Code: github.com/egolife-ai/Ego…

English

1

8

27

3.1K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·3 Eki

Diffusion models turn the data into a mixture of isotropic Gaussians, and so struggle to capture the underlying structure when trained on small datasets. In our new #ECCV2024 paper, we introduce RS-IMLE, a generative model that gets around this issue. Website: serchirag.github.io/rs-imle Code: github.com/SerChirag/rs-i… Joint work w/ @researchirag and @PengShichong If you are at #ECCV2024, come and check out poster 279 on Thursday afternoon from 4:30pm-6:30pm. (1/6) Thread 👇

English

6

123

760

73.4K

Yanshu Zhang retweetledi

AK@_akhaliq·7 Ağu

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion discuss: huggingface.co/papers/2408.03… We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in polygonal meshes. This method allows us to use image generation models, such as Diffusion Transformers, directly for 3D shape generation. Evaluated on the ABO dataset, our generated shapes with patch structures achieve point cloud FID comparable to recent 3D generative models, while naturally supporting PBR material generation.

English

11

155

811

66.4K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·12 Haz

Gaussian splatting is nice, but breaks under large/non-rigid scene changes. Our #CVPR2024 spotlight paper introduces a solution by doing away with splatting altogether - this does great at scene interpolation. Joint w/ @PengShichong & @yszhang170 More at niopeng.github.io/PAPR-in-Motion/

English

1

8

52

6.8K

Yanshu Zhang retweetledi

SFU School of Computing Science@SFU_CompSci·14 Mar

#AI is making waves - including in bringing statues to life. @SFU professor @KL_Div and his students have unveiled a new AI technology capable of turning 2D photos into a realistic, fully editable 3D model that you can view from any angle. ow.ly/HIKC50QSR04 @SFUResearch

English

0

5

12

3.8K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·22 Ara

The code and pretrained models for PAPR have now been released, just in time for the holidays🎄. Happy hacking!

Ke Li 🍁@KL_Div

NeRF reconstructs 3D scenes accurately, but editing them is hard. Introducing PAPR, a method for learning a point cloud from multiple views from scratch and enables zero-shot editing. Details at zvict.github.io/papr/. Joint work w/ @yszhang170, @PengShichong & @Moazeni_Alireza

English

0

2

10

2K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·9 Ara

PAPR will be presented next week at @NeurIPSConf as a spotlight! A sneak peek of how it stacks up against Gaussian splatting is below. Come by poster #119 and chat with the lead authors @yszhang170 & @PengShichong next Tuesday at 5:15pm! More details at zvict.github.io/papr/.

Ke Li 🍁@KL_Div

NeRF reconstructs 3D scenes accurately, but editing them is hard. Introducing PAPR, a method for learning a point cloud from multiple views from scratch and enables zero-shot editing. Details at zvict.github.io/papr/. Joint work w/ @yszhang170, @PengShichong & @Moazeni_Alireza

English

0

5

19

2.7K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·22 Eyl

Super proud of my students @yszhang170 @PengShichong @Moazeni_Alireza for having their paper on "PAPR: Proximity Attention Point Rendering" accepted to @NeurIPSConf and recognized with a spotlight - congratulations! Special thanks to reviewers & AC for the insightful feedback!

Ke Li 🍁@KL_Div

NeRF reconstructs 3D scenes accurately, but editing them is hard. Introducing PAPR, a method for learning a point cloud from multiple views from scratch and enables zero-shot editing. Details at zvict.github.io/papr/. Joint work w/ @yszhang170, @PengShichong & @Moazeni_Alireza

English

1

6

42

5.9K

Yanshu Zhang retweetledi

Ke Li 🍁@KL_Div·21 Tem

NeRF reconstructs 3D scenes accurately, but editing them is hard. Introducing PAPR, a method for learning a point cloud from multiple views from scratch and enables zero-shot editing. Details at zvict.github.io/papr/. Joint work w/ @yszhang170, @PengShichong & @Moazeni_Alireza

English

2

40

200

25.8K

Yanshu Zhang

Keşfet