Ruizhi Shao

61 posts

Ruizhi Shao

@RZ_Shao

Building True Intelligence in Real World @Rhoda_AI_ Prev. PhD @Tsinghua_uni

Beijing, China Katılım Haziran 2023

196 Takip Edilen198 Takipçiler

Ruizhi Shao retweetledi

Rhoda AI@RhodaAI·18 Mar

Most robot demos are “golden runs”: a perfect take selected from many attempts. But real-world deployment is about Continuous Operation. Watch our DVA model tackle a real-world decanting task for 1.5 hours straight: Uncut, Zero human intervention. 🧵👇

English

4.1K

Rhoda AI@RhodaAI·10 Mar

The gap between robotics in the lab and robotics in the real world has been one of the hardest unsolved problems in the industry. We’re excited to come out of stealth and show the research community how we’re tackling the issue. Bloomberg article in comment.

English

19.9K

Ruizhi Shao@RZ_Shao·10 Mar

As robots tackle increasingly general tasks, scaling robot action data has become a critical bottleneck. Our proposed DVA pioneers a new path, learning physical world and experience from tons of videos, transfering across any robotic platform and task. Super cool direction!

Jagdeep Singh@startupjag

After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.

English

360

Ruizhi Shao@RZ_Shao·10 Mar

@rhodaai Thrilled by the future of Physical AI! My journey in Rhoda has strengthened my conviction in our path toward generalist robotics. Onward to making true intelligence a reality.

English

Ruizhi Shao retweetledi

Rhoda AI@RhodaAI·9 Mar

03.10.26

249

48.1K

Ruizhi Shao retweetledi

Rhoda AI@RhodaAI·6 Mar

ZXX

147

25.9K

Ruizhi Shao@RZ_Shao·24 May

Our new work for human video generation is out! Thanks to Prof. Gordon Wetzstein for the great mentorship.

Gordon Wetzstein@GordonWetzstein

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

English

335

Ruizhi Shao retweetledi

Gordon Wetzstein@GordonWetzstein·13 Haz

Really exciting progress towards fully autonomous humanoids! Fantastic work, team HumanPlus!

Zipeng Fu@zipengfu

Introduce HumanPlus - Shadowing part Humanoids are born for using human data. We build a real-time shadowing system using a single RGB camera and a whole-body policy for cloning human motion. Examples: - boxing🥊 - playing the piano🎹/ping pong - tossing - typing Open-sourced!

English

6.1K

Ruizhi Shao retweetledi

William Shen@shenbokui·13 Haz

AI is entering the REAL-TIME era. We are here to give it a visual embodiment. Our journey has just begun. DM is open if you want to talk!

Apparate Labs@apparatelabs

Introducing Proteus 0.1, REAL-TIME video generation that brings life to your AI. Proteus can laugh, rap, sing, blink, smile, talk, and more. From a single image! Come meet Proteus on Twitch in real-time. ↓ Sign up for API waitlist: apparate.ai/early-access.h… 1/11

English

7.5K

Ruizhi Shao retweetledi

Ruilong Li@ruilong_li·7 Haz

Excited to announce 🚀gsplat v1.0🚀: a ⏩efficient⏩ CUDA backend for 3D Gaussian Splatting! docs.gsplat.studio A drop-in replacement of the official impl. with: - Up to 2x faster training; - Up to 4x less GPU memory; - Render millions of GSs in real-time; - And more;

English

106

523

87.7K

Ruizhi Shao@RZ_Shao·28 May

@prime_cai Looks very cool!

English

427

Prime (Shengqu) Cai@prime_cai·28 May

We've been exploring a fun and challenging idea: combining top-notch video generation models with multi-view capabilities. This seems like a natural next step after all the amazing video and multi-image generation models. The question was, could we create a multi-view video generation model without needing a 4D dataset? 🤔 Check out our latest effort, where we've made a first attempt into crafting multi-view, or more precisely multi-trajectory videos that maintain consistent underlying dynamic content — all without the need for 4D data! 🌐Website: collaborativevideodiffusion.github.io 📄Paper: arxiv.org/abs/2405.17414 👾Code: github.com/CollaborativeV… Thanks @_akhaliq for sharing!

AK@_akhaliq

Collaborative Video Diffusion Consistent Multi-video Generation with Camera Control Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation

English

25.7K

Ruizhi Shao@RZ_Shao·24 May

Thank you @_akhaliq for featuring our project!

AK@_akhaliq

Tele-Aloha A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems,

English

186

Ruizhi Shao@RZ_Shao·24 May

Thank you @janusch_patas for covering our work!

MrNeRF@janusch_patas

Incredible #3DGS application 🤯 [SIGGRAPH '24] Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras arxiv.org/abs/2405.14866 1 I 3

English

155

Ruizhi Shao retweetledi

GRASP Laboratory@GRASPlab·23 May

Join us TOMORROW in welcoming Dr. Yinghao Xu @YinghaoXu1 as he presents “Large Reconstruction Model for Efficient 3D Reconstruction and Generation” from 1PM - 2PM. For more info: grasp.upenn.edu/events/grasp-s… #GRASP #GRASPLab #GRASPSeminar

English

1.1K

Ruizhi Shao retweetledi

AK@_akhaliq·13 Ara

COLMAP-Free 3D Gaussian Splatting paper page: huggingface.co/papers/2312.07… While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.

English

278

63K

Ruizhi Shao retweetledi

Linus ✦ Ekenstam@LinusEkenstam·8 Ara

Relightable Real-Time Avatars Meta Codec Avatars 2.0 gets an update, building on 3D Gaussian Splatting from Meta. Accuracy is down to the human hair strand level 🔬 🧵 A thread

English

331

1.8K

351.9K

Ruizhi Shao retweetledi

Anton Obukhov@AntonObukhov1·8 Ara

Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out: 🌐 Website: marigoldmonodepth.github.io 🤗 Hugging Face Space: huggingface.co/spaces/toshas/… 📄 Paper: arxiv.org/abs/2312.02145 👾 Code: github.com/prs-eth/marigo… The team: Bingxin Ke (@KBingxin), yours truly (@AntonObukhov1), Shengyu Huang (@ShengyHuang), Nando Metzger (@NandoMetzger), Rodrigo Caye Daudt (@rcdaudt), and Konrad Schindler. #ComputerVision #PRS #ETHZurich

English

267

1.4K

489.6K

Ruizhi Shao retweetledi

Hong-Xing (Koven) Yu@Koven_Yu·8 Ara

Can generative AI imagine what Alice saw in her journey in the Wonderland 🏞️🚶‍♀️? Introducing WonderJourney: Create a journey (a long sequence of diverse yet connected 3D scenes) from a single image or text! 🧵1/N Web: kovenyu.com/wonderjourney/ arxiv: arxiv.org/abs/2312.03884

English

116

505

109.1K

Ruizhi Shao retweetledi

A.I.Warper@AIWarper·9 Ara

Here's probably one of the hardest shots I've ever attempted 1) Two characters 2) Two different race of humans (prompt bleeding) 3) Very fast paced with a lot of motion Was very fun and I learned a lot. Also develop a bunch of new workflow #StableDiffusion #Matrix #Pixar

English

894

489

4.4K

8.6M

Ruizhi Shao retweetledi

AK@_akhaliq·7 Ara

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians paper page: huggingface.co/papers/2312.03… Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.

English

117

512

65.8K

Keşfet

@rhodaai @prime_cai @_akhaliq @janusch_patas @YinghaoXu1 @KBingxin @AntonObukhov1 @ShengyHuang