Zhenzhen Weng (@JenWeng4) - Hồ sơ Twitter | Zamantika Mersobahis Locabet

Zhenzhen Weng đã retweet

Just published in @ScienceAdvances, our work demonstrating the ability of AI and 3D computer vision to produce automated measurement of human interactions in video data from early child development research -- providing over 100x time savings compared to human annotation and enabling quantitative, big data studies. We use our method, HARMONI, to characterize longitudinal trends in infant and toddler interaction with caregivers, in over 500 hours of video data. Work led by @JenWeng4 together with co-PI @SandersMDMPH and @K_L_Humphreys, and with a great interdiscplinary team including Laura Bravo Sanchez, @bergelsonlab, @akanazawa, @StanfordCERC, and many others! science.org/doi/10.1126/sc…

English

2

13

52

5.8K

Zhenzhen Weng đã retweet

Jing-Jing Li@drjingjing2026·14 Ara

1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.

English

176

552

3.5K

1M

Zhenzhen Weng đã retweet

Serena Yeung-Levy@yeung_levy·12 Eyl

Our lab at Stanford has postdoc openings! Candidates should have expertise and interests in one or multiple of: multimodal large language models, video understanding (including video-language models), AI for science / biology, or AI for surgery. Please send inquiries by email and see marvl.stanford.edu for more information.

English

4

38

195

36.6K

Zhenzhen Weng đã retweet

Zhaorun Chen@ZRChen_AISafety·9 Tem

🤗First benchmark on multimodal judge’s feedback for text-to-image generation!! 🏃Come and pick up your personal advice and package to choose the best judge to fine-tune your diffusion model 👉 mj-bench.github.io Paper: huggingface.co/papers/2407.04… Code: github.com/MJ-Bench/MJ-Be…

Huaxiu Yao@HuaxiuYaoML

🌟NEW Paper Alert 🌟 👩‍⚖️MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? (mj-bench.github.io) 🧐Also wonder about the best judge model to provide feedback for your diffusion models? We evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Key findings: 👉1. While closed-source VLM judges typically perform better, smaller CLIP-based models offer better text-image alignment and image quality feedback due to extensive pre-training on text-vision corpus. Conversely, VLMs provide more accurate feedback on safety and generation bias, thanks to their stronger reasoning capabilities. 👉2. VLM judges can provide more accurate and stable feedback in natural language (e.g. Poor, Average, Good) than numerical scales. Led by @ZRChen_AISafety, Yichao Du, Zichen Wen, @AiYiyangZ. arxiv.org/pdf/2407.04842

English

0

4

19

3.6K

Zhenzhen Weng đã retweet

Huaxiu Yao@HuaxiuYaoML·9 Tem

🌟NEW Paper Alert 🌟 👩‍⚖️MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? (mj-bench.github.io) 🧐Also wonder about the best judge model to provide feedback for your diffusion models? We evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Key findings: 👉1. While closed-source VLM judges typically perform better, smaller CLIP-based models offer better text-image alignment and image quality feedback due to extensive pre-training on text-vision corpus. Conversely, VLMs provide more accurate feedback on safety and generation bias, thanks to their stronger reasoning capabilities. 👉2. VLM judges can provide more accurate and stable feedback in natural language (e.g. Poor, Average, Good) than numerical scales. Led by @ZRChen_AISafety, Yichao Du, Zichen Wen, @AiYiyangZ. arxiv.org/pdf/2407.04842

English

3

36

139

20.5K

Zhenzhen Weng@JenWeng4·18 Haz

🌟Just completed my PhD at @Stanford! 🌟 A huge thanks to my advisor @yeung_levy, my family and friends, committee and collaborators, and everyone who supported me along the way. Excited to start my next chapter at @Waymo, working on foundation models for self-driving cars!

English

7

4

149

25.7K

Zhenzhen Weng@JenWeng4·23 Mar

@charles_rqi Congrats Charles!

Français

0

77

Charles Qi@charles_rqi·23 Mar

Career Update: Today, I bid farewell to Waymo, marking the end of a chapter in my career. Joining Waymo in late 2019, I entered a world where Level 4 (L4) robotaxi services were a concept rather than a reality. Now, in 2024, Waymo operates a rider-only robotaxi service in four major U.S. cities: Phoenix, San Francisco, Los Angeles, and Austin, delivering O(100k) paid trips weekly. We have proved to the world that robotaxi in dense urban is practical. It's a matter of time for mankind to achieve true scaling and profitability of L4 autonomy. I believe the world is only beginning to grasp the extent of our achievements. I am sincerely thankful for the opportunities Waymo has provided me. I was fortunate to be able to learn and grow in many different roles, from an IC researcher to a tech lead, then a research manager, and more recently as an engineering manager. Each role has offered me invaluable lessons and growth. Above all, my deepest gratitude goes to my team, my mentors, and every colleague at Waymo. The talent here is unparalleled, and it has been an honor to work alongside such exceptional individuals. You will be the part I miss the most (besides our ping-pong 🏓 group!) New Chapter: I will be joining the Tesla Autopilot team to work on FSD. While this move may come as a surprise to some, it has been a carefully considered decision. Back in 2019, when I was choosing my first full-time job, Tesla was one of my top choices, alongside Waymo. I was even fortunate to be interviewed by Elon and got his offer. Although I initially chose Waymo, my interest in Tesla has remained strong. I became a Tesla Model 3 owner and have kept a close eye on the team since then. I believe there are multiple paths to achieving full L4 autonomy. Diversity in approaches is not just beneficial but essential for innovation and progress. To quote Andrej Karpathy @karpathy as a closing remark (from our email exchange in 2019 regarding my decision between Tesla and Waymo): "At the end of the day, we're still building towards the same goal, and that future can't come soon enough."

English

203

149

2.8K

651.5K

Zhenzhen Weng đã retweet

Jonathon Luiten@JonathonLuiten·18 Mar

If you’re in Davos, we just started giving a tutorial on Gaussian Splatting at 3DV. With @GKopanas @Snosixtytwo @antoine_guedon 3dgstutorial.github.io 3dvconf.github.io/2024/tutorials/

Jonathon Luiten@JonathonLuiten

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis dynamic3dgaussians.github.io We model the world as a set of 3D Gaussians that move & rotate over time. This extends Gaussian Splatting to dynamic scenes, with accurate novel-view synthesis and dense 3D trajectories.

English

2

10

71

8K

Zhenzhen Weng đã retweet

Xiaohan Wang@XiaohanWang96·18 Mar

Thanks @_akhaliq for sharing our work! Letting LLM be an agent and long-form videos as an environment, and allowing LLM to interact with videos and decide where to look iteratively, we achieve SoTA zero-shot performance and show potential on processing extremely long videos!

AK@_akhaliq

VideoAgent Long-form Video Understanding with Large Language Model as Agent Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences. Motivated by the human cognitive

English

1

12

37

23.8K

Zhenzhen Weng đã retweet

Judy Shen@judyhshen·20 Şub

Are you hiring top AI talent? Here is a list of Ph.D. students affiliated with @StanfordAILab who are on the industry and academic job markets this year! This list showcases diverse research areas and 41% of these graduates are URMs! Check it out: ai.stanford.edu/blog/sail-grad…

English

4

50

216

54.7K

Zhenzhen Weng@JenWeng4·23 Oca

@_akhaliq Arxiv: arxiv.org/abs/2401.12175 Project page: zzweng.github.io/humanlrm/

English

1

2

321

AK@_akhaliq·23 Oca

Single-View 3D Human Digitalization with Large Reconstruction Models paper page: huggingface.co/papers/2401.12… introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.

English

2

18

110

35.7K

Zhenzhen Weng@JenWeng4·23 Oca

Check out our recent work on generalizable human NeRF prediction! Arxiv: arxiv.org/abs/2401.12175 Project page: zzweng.github.io/humanlrm/

AK@_akhaliq

Single-View 3D Human Digitalization with Large Reconstruction Models paper page: huggingface.co/papers/2401.12… introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.

English

2

7

36

13.4K

Zhenzhen Weng đã retweet

Serena Yeung-Levy@yeung_levy·7 Ara

What are differences between image datasets? (e.g. ImageNet & ImageNetv2) Errors by one model vs. another? (e.g. CLIP & ResNet) Correct vs. incorrect predictions? VisDiff can answer by describing differences in image sets w/ language. Work led by @Zhang_Yu_hui and @lisabdunlap!

Lisa Dunlap@lisabdunlap

[1/5] Introducing VisDiff - an #AI tool that describes differences in image sets with natural language. VisDiff can summarize model failures, compare models, find nuanced dataset differences, discover what makes an image memorable, and so much more! …derstanding-visual-datasets.github.io/VisDiff-websit…

English

0

2

10

4.1K

Zhenzhen Weng đã retweet

Jimei Yang@jimei_yang·13 Eki

Sneak peek in video GenAI projects we’re working on @AdobeResearch : 1/3 compositing videos in 3D with NeRF

Kris Kashtanova@icreatelife

#ProjectSceneChange You’ll be able to put yourself in any scene including your artwork. #AdobeMAX #CommunityxAdobe

English

0

4

24

7.8K

Zhenzhen Weng đã retweet

CG Channel@theCGchannel·12 Eki

Check out Adobe's Project Scene Change The interesting experimental #AI tech automatically composites an actor from one shot into the environment from another without the need for #rotoscoping or #cameratracking cgchannel.com/2023/10/sneak-… #compositing #VFX #motiongraphics

English

0

8

31

5.5K

Zhenzhen Weng@JenWeng4·16 Haz

Pls stop at #CVPR2023 poster *Tue AM 110* to learn about GC-KPL: a novel method for learning 3D human keypoints from point clouds w/o human labels. Project: cvpr2023.thecvf.com/virtual/2023/p… Joint work w/ awesome folks @gorban Jingwei Ji, @MahyarNajibi, Yin Zhou, Dragomir Anguelov, @Waymo

Alexander Gorban@gorban

Check out our #CVPR2023 paper on 3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels youtu.be/vXGPW2nDHZ4 Huge shout out to @JenWeng4 who interned in our team last summer and did all the work!

English

0

5

8

1.8K

Zhenzhen Weng đã retweet

Jackson Wang@kcjacksonwang·16 Haz

Have videos of your tennis practice and wish you can put your own motion in 3D? 🎾 👟 🏋🏻 #CVPR2023 We present, NeMo, a 3D motion recovery method that is more accurate by leveraging information shared across multiple instances/repetitions! 👇🏻Resources in 🧵

English

2

19

60

9.2K

Zhenzhen Weng đã retweet

Alexander Gorban@gorban·2 Haz

Check out our #CVPR2023 paper on 3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels youtu.be/vXGPW2nDHZ4 Huge shout out to @JenWeng4 who interned in our team last summer and did all the work!

YouTube

English

0

1

4

2.4K

Zhenzhen Weng đã retweet

Nick Greenawalt@motionbynick·20 May

SO much potential for this:

English

50

222

1.6K

189.3K

Zhenzhen Weng đã retweet

Yuhui Zhang@Zhang_Yu_hui·9 Şub

(1/8) Can you diagnose and rectify a #vision model using #language? Check our work in #ICLR2023! Our analysis reveals when and how text embeddings can be used as a proxy for image embeddings to debug vision models. Paper: arxiv.org/abs/2302.04269 Code: github.com/yuhui-zh15/drml

GIF

English

1

16

103

26.5K

Zhenzhen Weng

Khám phá