Zhenzhen Weng

29 posts

Zhenzhen Weng

Zhenzhen Weng

@JenWeng4

Perception@Waymo. PhD @Stanford.BSCS@CMU🚀. 💻 Ex-intern @Waymo @Adobe.

Palo Alto, CA Tham gia Haziran 2021
256 Đang theo dõi324 Người theo dõi
Zhenzhen Weng đã retweet
Serena Yeung-Levy
Serena Yeung-Levy@yeung_levy·
Just published in @ScienceAdvances, our work demonstrating the ability of AI and 3D computer vision to produce automated measurement of human interactions in video data from early child development research -- providing over 100x time savings compared to human annotation and enabling quantitative, big data studies. We use our method, HARMONI, to characterize longitudinal trends in infant and toddler interaction with caregivers, in over 500 hours of video data. Work led by @JenWeng4 together with co-PI @SandersMDMPH and @K_L_Humphreys, and with a great interdiscplinary team including Laura Bravo Sanchez, @bergelsonlab, @akanazawa, @StanfordCERC, and many others! science.org/doi/10.1126/sc…
English
2
13
52
5.8K
Zhenzhen Weng đã retweet
Jing-Jing Li
Jing-Jing Li@drjingjing2026·
1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.
Jing-Jing Li tweet media
English
176
552
3.5K
1M
Zhenzhen Weng đã retweet
Serena Yeung-Levy
Serena Yeung-Levy@yeung_levy·
Our lab at Stanford has postdoc openings! Candidates should have expertise and interests in one or multiple of: multimodal large language models, video understanding (including video-language models), AI for science / biology, or AI for surgery. Please send inquiries by email and see marvl.stanford.edu for more information.
English
4
38
195
36.6K
Zhenzhen Weng đã retweet
Zhaorun Chen
Zhaorun Chen@ZRChen_AISafety·
🤗First benchmark on multimodal judge’s feedback for text-to-image generation!! 🏃Come and pick up your personal advice and package to choose the best judge to fine-tune your diffusion model 👉 mj-bench.github.io Paper: huggingface.co/papers/2407.04… Code: github.com/MJ-Bench/MJ-Be…
Huaxiu Yao@HuaxiuYaoML

🌟NEW Paper Alert 🌟 👩‍⚖️MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? (mj-bench.github.io) 🧐Also wonder about the best judge model to provide feedback for your diffusion models? We evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Key findings: 👉1. While closed-source VLM judges typically perform better, smaller CLIP-based models offer better text-image alignment and image quality feedback due to extensive pre-training on text-vision corpus. Conversely, VLMs provide more accurate feedback on safety and generation bias, thanks to their stronger reasoning capabilities. 👉2. VLM judges can provide more accurate and stable feedback in natural language (e.g. Poor, Average, Good) than numerical scales. Led by @ZRChen_AISafety, Yichao Du, Zichen Wen, @AiYiyangZ. arxiv.org/pdf/2407.04842

English
0
4
19
3.6K
Zhenzhen Weng đã retweet
Huaxiu Yao
Huaxiu Yao@HuaxiuYaoML·
🌟NEW Paper Alert 🌟 👩‍⚖️MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? (mj-bench.github.io) 🧐Also wonder about the best judge model to provide feedback for your diffusion models? We evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Key findings: 👉1. While closed-source VLM judges typically perform better, smaller CLIP-based models offer better text-image alignment and image quality feedback due to extensive pre-training on text-vision corpus. Conversely, VLMs provide more accurate feedback on safety and generation bias, thanks to their stronger reasoning capabilities. 👉2. VLM judges can provide more accurate and stable feedback in natural language (e.g. Poor, Average, Good) than numerical scales. Led by @ZRChen_AISafety, Yichao Du, Zichen Wen, @AiYiyangZ. arxiv.org/pdf/2407.04842
Huaxiu Yao tweet media
English
3
36
139
20.5K
Zhenzhen Weng
Zhenzhen Weng@JenWeng4·
🌟Just completed my PhD at @Stanford! 🌟 A huge thanks to my advisor @yeung_levy, my family and friends, committee and collaborators, and everyone who supported me along the way. Excited to start my next chapter at @Waymo, working on foundation models for self-driving cars!
Zhenzhen Weng tweet media
English
7
4
149
25.7K
Charles Qi
Charles Qi@charles_rqi·
Career Update: Today, I bid farewell to Waymo, marking the end of a chapter in my career. Joining Waymo in late 2019, I entered a world where Level 4 (L4) robotaxi services were a concept rather than a reality. Now, in 2024, Waymo operates a rider-only robotaxi service in four major U.S. cities: Phoenix, San Francisco, Los Angeles, and Austin, delivering O(100k) paid trips weekly. We have proved to the world that robotaxi in dense urban is practical. It's a matter of time for mankind to achieve true scaling and profitability of L4 autonomy. I believe the world is only beginning to grasp the extent of our achievements. I am sincerely thankful for the opportunities Waymo has provided me. I was fortunate to be able to learn and grow in many different roles, from an IC researcher to a tech lead, then a research manager, and more recently as an engineering manager. Each role has offered me invaluable lessons and growth. Above all, my deepest gratitude goes to my team, my mentors, and every colleague at Waymo. The talent here is unparalleled, and it has been an honor to work alongside such exceptional individuals. You will be the part I miss the most (besides our ping-pong 🏓 group!) New Chapter: I will be joining the Tesla Autopilot team to work on FSD. While this move may come as a surprise to some, it has been a carefully considered decision. Back in 2019, when I was choosing my first full-time job, Tesla was one of my top choices, alongside Waymo. I was even fortunate to be interviewed by Elon and got his offer. Although I initially chose Waymo, my interest in Tesla has remained strong. I became a Tesla Model 3 owner and have kept a close eye on the team since then. I believe there are multiple paths to achieving full L4 autonomy. Diversity in approaches is not just beneficial but essential for innovation and progress. To quote Andrej Karpathy @karpathy as a closing remark (from our email exchange in 2019 regarding my decision between Tesla and Waymo): "At the end of the day, we're still building towards the same goal, and that future can't come soon enough."
Charles Qi tweet mediaCharles Qi tweet mediaCharles Qi tweet mediaCharles Qi tweet media
English
203
149
2.8K
651.5K
Zhenzhen Weng đã retweet
Jonathon Luiten
Jonathon Luiten@JonathonLuiten·
If you’re in Davos, we just started giving a tutorial on Gaussian Splatting at 3DV. With @GKopanas @Snosixtytwo @antoine_guedon 3dgstutorial.github.io 3dvconf.github.io/2024/tutorials/
Jonathon Luiten@JonathonLuiten

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis dynamic3dgaussians.github.io We model the world as a set of 3D Gaussians that move & rotate over time. This extends Gaussian Splatting to dynamic scenes, with accurate novel-view synthesis and dense 3D trajectories.

English
2
10
71
8K
Zhenzhen Weng đã retweet
Xiaohan Wang
Xiaohan Wang@XiaohanWang96·
Thanks @_akhaliq for sharing our work! Letting LLM be an agent and long-form videos as an environment, and allowing LLM to interact with videos and decide where to look iteratively, we achieve SoTA zero-shot performance and show potential on processing extremely long videos!
AK@_akhaliq

VideoAgent Long-form Video Understanding with Large Language Model as Agent Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences. Motivated by the human cognitive

English
1
12
37
23.8K
Zhenzhen Weng đã retweet
Judy Shen
Judy Shen@judyhshen·
Are you hiring top AI talent? Here is a list of Ph.D. students affiliated with @StanfordAILab who are on the industry and academic job markets this year! This list showcases diverse research areas and 41% of these graduates are URMs! Check it out: ai.stanford.edu/blog/sail-grad…
English
4
50
216
54.7K
AK
AK@_akhaliq·
Single-View 3D Human Digitalization with Large Reconstruction Models paper page: huggingface.co/papers/2401.12… introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.
English
2
18
110
35.7K
Zhenzhen Weng
Zhenzhen Weng@JenWeng4·
Check out our recent work on generalizable human NeRF prediction! Arxiv: arxiv.org/abs/2401.12175 Project page: zzweng.github.io/humanlrm/
AK@_akhaliq

Single-View 3D Human Digitalization with Large Reconstruction Models paper page: huggingface.co/papers/2401.12… introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.

English
2
7
36
13.4K
Zhenzhen Weng đã retweet
Serena Yeung-Levy
Serena Yeung-Levy@yeung_levy·
What are differences between image datasets? (e.g. ImageNet & ImageNetv2) Errors by one model vs. another? (e.g. CLIP & ResNet) Correct vs. incorrect predictions? VisDiff can answer by describing differences in image sets w/ language. Work led by @Zhang_Yu_hui and @lisabdunlap!
Lisa Dunlap@lisabdunlap

[1/5] Introducing VisDiff - an #AI tool that describes differences in image sets with natural language. VisDiff can summarize model failures, compare models, find nuanced dataset differences, discover what makes an image memorable, and so much more! …derstanding-visual-datasets.github.io/VisDiff-websit…

English
0
2
10
4.1K
Zhenzhen Weng
Zhenzhen Weng@JenWeng4·
Pls stop at #CVPR2023 poster *Tue AM 110* to learn about GC-KPL: a novel method for learning 3D human keypoints from point clouds w/o human labels. Project: cvpr2023.thecvf.com/virtual/2023/p… Joint work w/ awesome folks @gorban Jingwei Ji, @MahyarNajibi, Yin Zhou, Dragomir Anguelov, @Waymo
Alexander Gorban@gorban

Check out our #CVPR2023 paper on 3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels youtu.be/vXGPW2nDHZ4 Huge shout out to @JenWeng4 who interned in our team last summer and did all the work!

English
0
5
8
1.8K
Zhenzhen Weng đã retweet
Jackson Wang
Jackson Wang@kcjacksonwang·
Have videos of your tennis practice and wish you can put your own motion in 3D? 🎾 👟 🏋🏻 #CVPR2023 We present, NeMo, a 3D motion recovery method that is more accurate by leveraging information shared across multiple instances/repetitions! 👇🏻Resources in 🧵
English
2
19
60
9.2K
Zhenzhen Weng đã retweet
Alexander Gorban
Alexander Gorban@gorban·
Check out our #CVPR2023 paper on 3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels youtu.be/vXGPW2nDHZ4 Huge shout out to @JenWeng4 who interned in our team last summer and did all the work!
YouTube video
YouTube
English
0
1
4
2.4K
Zhenzhen Weng đã retweet
Nick Greenawalt
Nick Greenawalt@motionbynick·
SO much potential for this:
English
50
222
1.6K
189.3K