Helen Jiang

37 posts

Helen Jiang

Helen Jiang

@helenqjiang

Skild AI | Nvidia | Robotics Ph.D. @ CMU | Computer Science B.S. @ Stanford

Katılım Haziran 2020
112 Takip Edilen445 Takipçiler
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
Check out Kyle’s paper, which uses a VLM to post-train a diffusion AE! Learned image compression trades off file size with perceptual quality. Normally, quality is measured using a calibrated network based on human perception (eg LPIPS). Instead, we query Gemini. [1/N]
Jason Y. Zhang tweet media
Kyle Sargent@KyleSargentAI

Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ @GoogleDeepMind, @GoogleResearch, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:

English
2
7
82
9.3K
Helen Jiang retweetledi
Helen Jiang
Helen Jiang@helenqjiang·
I remember when we got to witness Skild first hand when one of their robots was our ring bearer last September 🤖💍🐻 Super excited to see them sharing their innovations with the world!
Skild AI@SkildAI

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

English
1
1
16
1.5K
Helen Jiang retweetledi
NVIDIA Data Center
NVIDIA Data Center@NVIDIADC·
📣 TSMC, a leader in semiconductor manufacturing, has started production with NVIDIA's cuLitho platform, accelerating chip manufacturing and pushing technological boundaries. #DataCenter Learn more now ➡️ nvda.ws/3YfL6dk
NVIDIA Data Center tweet media
English
3
17
92
4K
Helen Jiang retweetledi
Raunaq Bhirangi
Raunaq Bhirangi@Raunaqmb·
The sense of touch is fundamental to how we interact with the world. But the most exciting developments in robotics continue to focus primarily on vision. I spent the last four years trying to understand why. And we might have found a pretty good fix. Introducing AnySkin
English
11
64
437
49.1K
Helen Jiang retweetledi
Kenneth Marino
Kenneth Marino@Kenneth_Marino·
Very excited to announce that in Fall 2025, I will be starting as an Assistant Professor at the Kahlert School of Computing at the University of Utah @UUtah.
Kenneth Marino tweet media
English
13
19
299
34.2K
Helen Jiang retweetledi
Sudeep Dasari
Sudeep Dasari@SudeepDasari·
Robots need strong visuo-motor representations to manipulate objects, but it’s hard to learn these using demo data alone. Our #RSS2024 project vastly improves robotic representations, using human affordances mined from Ego4D! w/ @mohansrirama @shikharbahl @gupta_abhinav_
English
1
17
89
11K
Helen Jiang retweetledi
Daniel Geng
Daniel Geng@dangengdg·
I'm at CVPR presenting "Visual Anagrams" on - Tuesday: 10am, Poster #429 - Friday: Oral6B @ 1pm, Poster #118 (pm) Let me know if you want to chat! Also, we manufactured a bunch of these "jigsaws with two solutions." If you want one, just hunt me down in the conference hall :)
English
7
22
162
19.2K
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
Today, I defended my PHD thesis! A huge thanks to my committee and everyone who made these 5 years amazing 🎉🥳
Jason Y. Zhang tweet media
English
29
5
288
20.2K
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
Had a great time chatting with Itzik today about our new ICLR paper on diffusing camera rays for pose estimation! Check out our discussion on ray-based camera representations, diffusion models, and learning to predict camera pose: youtu.be/KgHwv3Nf8rg
YouTube video
YouTube
Talking Papers Podcast@talking_papers

1/ Exciting news, academia Twitter! 🎓🎧 A new episode of #TalkingPapersPodcast is live where I dive deep into a fresh approach to camera pose estimation. My guest? The remarkable @jasonyzhang2 , a PhD student at @CMU_Robotics. Tune in 👉 youtu.be/KgHwv3Nf8rg

English
1
8
55
6.4K
Helen Jiang retweetledi
Shubham Tulsiani
Shubham Tulsiani@shubhtuls·
[1/6] What representation comes to mind when you think of a ‘camera’? Perhaps an extrinsic + intrinsic matrix? In our ICLR (oral) paper, we instead infer a distributed representation where each pixel is associated with a ray, and show SoTA results for few-view pose estimation.
English
19
128
974
141.7K
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
[1/6] The first step to 3D is getting camera poses. But typical pipelines struggle in sparse-view setups bc of texture-less surfaces, symmetries, or insufficient overlap Our #3DV2024 paper RelPose++ uses a probabilistic energy-based model to get accurate 6D poses from <10 views!
GIF
English
4
36
284
29.9K
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
Our #ICLR2024 (Oral) paper parameterizes cameras as bundles of rays for sparse-view pose estimation. We train a diffusion model to predict this representation which can be seamlessly converted to classic camera representations using least-squares! [1/N] jasonyzhang.com/RayDiffusion
GIF
Shubham Tulsiani@shubhtuls

[1/6] What representation comes to mind when you think of a ‘camera’? Perhaps an extrinsic + intrinsic matrix? In our ICLR (oral) paper, we instead infer a distributed representation where each pixel is associated with a ray, and show SoTA results for few-view pose estimation.

English
6
46
289
31.9K
Helen Jiang retweetledi
Jason Y. Zhang
Jason Y. Zhang@jasonyzhang2·
If you're at #ECCV2022, come stop by our poster on Tuesday Afternoon to talk about sparse-view pose estimation! From just a few images, our approach outputs coherent camera rotations using an energy-based relative pose predictor
Jason Y. Zhang tweet media
Shubham Tulsiani@shubhtuls

[1/4] Camera poses are essential for (neural) 3D reconstruction. But what about sparse-view settings where obtaining these via COLMAP isn’t feasible? Our ECCV paper tackles this using an energy-based formulation for predicting relative rotation (jasonyzhang.com/relpose)

English
0
5
37
0