Tianyu (Steve) Wang

236 posts

Tianyu (Steve) Wang

@VisionSteve

Research Scientist @AdobeResearch | Ph.D. @ CUHK | Prev. Research Intern @AdobeResearch | Photographer | Opinions are my own

California, USA Katılım Kasım 2019

484 Takip Edilen706 Takipçiler

Sabitlenmiş Tweet

Tianyu (Steve) Wang@VisionSteve·17 Mar

Check out our project that answers how to train any-step T2I model from scratch. We release the code for everyone to explore this area. Looking forward to seeing more on this!! #CVPR #CVPR2026

Xin Yu (Andy)@andy_yx27

😀😀We’re excited to release the training code for Self-E (accepted to CVPR 2026): github.com/XinYu-Andy/Sel… Self-E is a training-from-scratch, self-contained framework for any-step text-to-image generation, without teacher distillation.

English

485

Tianyu (Steve) Wang retweetledi

Yukang Chen@yukangchen_·19 May

🚀 Excited to release LongLive 2.0! 🎬 An end-to-end infrastructure for long video generation, with FP4 and parallelism at the core of both training and inference. ⚡45.7 FPS generation speed on 5B model⚡ ✨ LongLive 2.0 supports real-video training, few-step distillation, multi-shot training/inference, sequence-parallel acceleration, NVFP4 KV cache, and async VAE decoding deployment. 🧩 To our knowledge, this is the first open-source 4-bit long video generation infra that covers both training and inference. 🙌 Welcome to check it out, try it, and share feedback! 🔗 Code: github.com/NVlabs/LongLive 📰 Paper: huggingface.co/papers/2605.18… 🎥 Demo: nvlabs.github.io/LongLive/LongL… #LongVideoGeneration #VideoGeneration #Realtime #AIInfra #EfficientAI #FP4 #Parallel #NVIDIA

English

235

55.8K

Tianyu (Steve) Wang@VisionSteve·20 May

@jin_linyi @holynski_ So cool!!!

English

199

Linyi Jin@jin_linyi·20 May

Excited to share what we are building -- Genie experience grounded in real-world street view. Try it out at labs.google/fx/projectgenie

Google@Google

Project Genie is a @GoogleLabs experiment that lets you simulate dynamic worlds you can navigate in real time with Genie, our general-purpose world model. Today, we’re connecting Project Genie to nearly 20 years of Street View data from Google Maps — so you can now build interactive spaces based on real-world locations. Street View imagery in Project Genie is available now for places in the U.S., and will expand to more locales over time. #GoogleIO

English

427

66.7K

Tianyu (Steve) Wang@VisionSteve·2 May

@xiaolonw @Lianhuiq @Meta Congrats!

English

161

Xiaolong Wang@xiaolonw·1 May

Excited to share that Assured Robot Intelligence (ARI) has joined @Meta to help build the future of humanoid intelligence! When we started ARI one year ago, our mission was clear: achieve physical AGI. Through deep customer engagements and real-world deployments, it became clear to us that serving the massive opportunity ahead requires training a truly general-purpose physical agent. We believe this agent will be humanoid — and that scaling will come from learning directly from human experience, not teleoperation alone. Meta’s ecosystem brings together the key components needed to make this vision possible. We will be joining Meta Superintelligence Labs (MSL) to help bring personal superintelligence into the physical world. We are incredibly grateful to the brilliant minds, robotics researchers, engineers, partners, and supporters who have worked with us on this journey. Thank you to our investors and angels, led by @aixventureshq , for believing in our mission. This is just the beginning.

Bloomberg@business

Meta Platforms Inc. has acquired Assured Robot Intelligence, a startup developing artificial intelligence models for robots, as part of a major initiative to build humanoid technology. bloomberg.com/news/articles/…

English

113

696

193.7K

Tianyu (Steve) Wang@VisionSteve·30 Nis

RT @BoyuanChen0: If there is one secret recipe about GPT Image 2, it must be codex. In the last 6 months, codex has been zero-shoting every…

English

Tianyu (Steve) Wang@VisionSteve·25 Nis

If you’re attending, please stop by our oral session for EditVerse: I won’t be there in person, but I’d be very happy if you could check out the work, chat with the team, and join the discussion. 📍 Room 201 A/B 🕚 Apr 25, 11:18–11:28 AM

Xuan Ju@juxuan_27

Excited to share our paper EditedVerse is accepted as oral to ICLR 2026! Many thanks to our amazing coauthors!! Paper Link: arxiv.org/pdf/2509.20360 Project Page: …se.s3-website-us-east-1.amazonaws.com

English

355

Tianyu (Steve) Wang retweetledi

Xuan Ju@juxuan_27·6 Şub

English

835

Tianyu (Steve) Wang retweetledi

GZ Zhou@GengzeZhou·12 Nis

We present LightMover — controllable light movement from a single image. Move lights. Change color. Adjust intensity. All with physically consistent shadows & reflections. 🌐 Project: gengzezhou.github.io/LightMover/ 📄 Paper: arxiv.org/abs/2603.27209 #CVPR2026

English

104

6.1K

Tianyu (Steve) Wang retweetledi

Xin Yu (Andy)@andy_yx27·17 Mar

Xin Yu (Andy)@andy_yx27

Excited to share our new work Self-E: A New Training Paradigm for Text-to-Image! One model, any compute: Unlock any-step text-to-image generation. Fully trained from scratch, no teacher distillation needed. xinyu-andy.github.io/SelfE-project The secret? Let the model evaluate itself. 👇

English

6.3K

Tianyu (Steve) Wang@VisionSteve·14 Mar

@natanielruizg @JaredSleeper It would be good to say “I will finish your job, please go to sleep” and then finish the job 🤣

English

Nataniel Ruiz@natanielruizg·14 Mar

@JaredSleeper tbh it’s a bit triggering for it to tell you to go to sleep when you are doing your job (even if yes claude it is a little late)

English

1.9K

Jared Sleeper@JaredSleeper·14 Mar

One of Claude’s best features is that it will often tell you to “go to sleep.” Can’t imagine chatGPT doing that- it acts like it is desperate for every last second of engagement. It’s so clear which one is healthier/better.

English

648

303.4K

Tianyu (Steve) Wang@VisionSteve·6 Mar

@natanielruizg @CSProfKGD Great work!!

English

211

Nataniel Ruiz@natanielruizg·6 Mar

Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen. I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful. All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer. Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online. Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future. This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.

English

577

104K

Tianyu (Steve) Wang retweetledi

Xun Huang@xxunhuang·27 Şub

This is the first video world model to support multi agent interactions, a truly groundbreaking milestone. Awesome work by @sainingxie and the team! Excited to see Self Forcing powering multiplayer world models as well.

Oscar Michel@ojmichel4

Self Forcing gives a huge improvement in quality! Here we see the same PvP sequence before and after Checkpointed Self Forcing. [9/10]

English

11.6K

Tianyu (Steve) Wang retweetledi

Xin Yu (Andy)@andy_yx27·7 Oca

English

148

16.2K

Tianyu (Steve) Wang retweetledi

Yao-Chih Lee@YaoChihLee·2 Ara

Excited to share our new work: Generative Video Motion Editing with 3D Point Tracks. We propose a framework that uses 3D point tracks to precisely edit both camera and object motion in a video, unlocking a wide range of new editing applications.

English

135

902

107.8K

Tianyu (Steve) Wang@VisionSteve·14 Kas

The last version of FSD V13 is good! Confident and smooth. But the v14.1.4 is bad, not confident and not smooth. I put my hand back to the wheel again.😅 Cancel subscription again and wait for a stable version.

Andrej Karpathy@karpathy

I took delivery of a beautiful new shiny HW4 Tesla Model X today, so I immediately took it out for an FSD test drive, a bit like I used to do almost daily for 5 years. Basically... I'm amazed - it drives really, really well, smooth, confident, noticeably better than what I'm used to on HW3 (my previous car) and eons ahead of the version I remember driving up highway 280 on my first day at Tesla ~9 years ago, where I had to intervene every time the road mildly curved or sloped. (note this is v13, my car hasn't been offered the latest v14 yet) On the highway, I felt like a passenger in some super high tech Maglev train pod - the car is locked in the center of the lane while I'm looking out from Model X's higher vantage point and its panoramic front window, listening to the (incredible) sound system, or chatting with Grok. On city streets, the car casually handled a number of tricky scenarios that I remember losing sleep over just a few years ago. It negotiated incoming cars in tight lanes, it gracefully went around construction and temporarily in-lane stationary cars, it correctly timed tricky left turns with incoming traffic from both sides, it gracefully gave way to the car that went out of order in the 4-way stop sign, it found a way to squeeze into a bumper to bumper traffic to make its turn, it overtook the bus that was loading passengers but still stopped for the stop sign that was blocked by the bus, and at the end of the route it circled around a parking lot, found a spot and... parked. Basically a flawless drive. For context, I'm used to going out for a brief test drive around the neighborhood to return with 20 clips of things that could be improved. It's new for me to do just that and exactly like I used to, but come back with nothing. Perfect drive, no notes. I expect there's still more work for the team in the long march of 9s, but it's just so cool to see that we're beyond finding issues on any individual ~1 hour drive around the neighborhood, you actually have to go to the fleet and mine them. Back then, I processed the incredible promise of vehicle autonomy at scale (in the fully scaleable, vision only, end-to-end Tesla way) only intellectually, but now it is possible to feel it intuitively too if you just go out for a drive. Wait, of course surround video stream at 60Hz processed by a fully dedicated "driving brain" neural net will work, and it will be so much better and safer than a human driver. Did anyone else think otherwise? I also watched @aelluswamy 's new ICCV25 talk last week (x.com/aelluswamy/sta…) that hints at some of the recent under the hood technical components driving this progress. Sensor streams (videos, maps, kinematics, audio, ...) over long contexts (e.g. ~30 seconds) go into a big neural net, steering/acceleration comes out, optionally with visualization auxiliary data. This is the dream of the complete Software 1.0 -> Software 2.0 re-write that scales fully with data streaming from millions of cars in the fleet and the compute capacity of your chip, not some engineer's clever new DoubleParkedCarHandler C++ abstraction with undefined test-time characteristics of memory and runtime. There's a lot more hints in the video on where things are going with the emerging "robotics+AI at scale stack". World reconstructors, world simulators "dreaming" dynamics, RL, all of these components general, foundational, neural net based, how the car is really just one kind of robot... are people getting this yet? Huge congrats to the team - you're building magic objects of the future, you rock! And I love my car <3.

English

168

Tianyu (Steve) Wang retweetledi

Umesh@umesh_ai·5 Kas

🚨 UNLIMITED EVERYTHING for 30 days! From Oct 28–Dec 1, every Adobe Creative Cloud & Firefly subscriber unlocks: ✨ Unlimited access to Firefly & partner AI models (ChatGPT, Ideogram 3.0, nano banana, Flux, Firefly 5 and more). ✨ Even video generation is unlimited in Relax mode. No credits. No limits. Let your imagination run wild → I’ve linked some of my favorite prompts and tips to get started. Sponsored by @Adobe as an Adobe Firefly Ambassador. #AdobeFireflyAmbassadors #Ad #FireflyPromo

English

145

18.2K

Tianyu (Steve) Wang retweetledi

Xun Huang@xxunhuang·4 Kas

We present MotionStream — real-time, long-duration video generation that you can interactively control just by dragging your mouse. All videos here are raw, real-time screen captures without any post-processing. Model runs on a single H100 at 29 FPS and 0.4s latency.

English

150

1.1K

98K

Tianyu (Steve) Wang@VisionSteve·30 Eki

@redlilyrose19 thanks!!

English

𝓛𝓲𝓵𝔂🌹𝓡𝓸𝓼𝓮@redlilyrose19·30 Eki

@VisionSteve every year i'm blown away by adobe max. it always seems like magic

English

Tianyu (Steve) Wang@VisionSteve·30 Eki

Very proud that two of the projects I worked on were featured in this year’s Adobe MAX Sneaks! I co-led Project Frame Forward, which builds upon our previous GenProp — with major improvements in stability and image editing & video alignment. #AdobeMax #ProjectFrameForward

Jerrod Lew@jerrod_lew

AI video editing on show here at Adobe Max! #ProjectFrameForward lets you alter the start frame but keep the same motion of the original video. Here’s a quick demo:

English

706

Tianyu (Steve) Wang retweetledi

Tingting Liao@tingtin36139994·8 Eki

🎬 Introducing: Character Mixing for Video Generation Imagine Mr. Bean stepping into Tom & Jerry's world 🐭✨ Now it's possible! ✨ Our framework first enables natural cross-character interactions in text-to-video generation while preserving identity and style fidelity.

English

554

53K

Keşfet

@jin_linyi @holynski_ @xiaolonw @Lianhuiq @Meta @aixventureshq @BoyuanChen0 @natanielruizg