Xianghao Kong

64 posts

Xianghao Kong

@xk_theo7

Video Gen AI Researcher📍Bay Area | PhDone @UCR_CSE | interpretability, alignment, compositionality of diffusion models | EX - @AdobeFirefly, @SonyAI_global

Riverside, CA انضم Ağustos 2021

391 يتبع139 المتابعون

تغريدة مثبتة

Xianghao Kong@xk_theo7·23 Şub

1/8 🚀 AI Breakthrough: "Interpretable Diffusion via Information Decomposition" 🧠 - Quantitative understanding of conditional diffusion models. - Align text-image data using mutual information. - Goes beyond "attention". 🎉 Accepted at #ICLR2024!

English

6.2K

Xianghao Kong أُعيد تغريده

Yunong Liu@yunongliu1·6 Mar

Really excited to see Uni-1 out in the world 🔥Our first unified model. The range of things this model can do is wild: image-to-~100 styles, manga generation, multi-ref with strong identity preservation, temporal storytelling, sketch-to-image, spatial reasoning, multilingual infographics, layering… the capability range is honestly unreal. this is just the start 🫡 check out the blog to learn more lumalabs.ai/uni-1 Proud of the team and what we’re building at @LumaLabsAI 🚀

Luma@LumaLabsAI

Introducing Uni-1, Luma’s first unified understanding and generation model, our next step on the path towards unified general intelligence. lumalabs.ai/uni-1

English

6.2K

Xianghao Kong@xk_theo7·22 Oca

@hudsonyeoce Cool cool! We’ve mastered alignment for nouns in image/video models, but verbs (or more abstract terms) are the real challenge in video. Seeing this kind of motion control proves Runway’s cracking the code on abstract concepts🔥

English

Hudson@hudsonyeoce·22 Oca

@xk_theo7 yes!! the prompt adherence in this model is something we really optimised for 😁

English

Hudson@hudsonyeoce·22 Oca

had so much fun building this with our lovely team <3 cant wait for you all to play with it, tell me what you think!

Runway@runwayml

Introducing Image to Video for Gen-4.5, the world's best video model. Built for longer stories. Precise camera control. Coherent narratives. And characters that stay consistent. Gen-4.5 Image to Video is available now for all paid plans.

English

609

Xianghao Kong@xk_theo7·7 Ara

🤯💥

karim_yourself@karim_yourself

Jujutsu Kaisen Live Action.

ART

311

Xianghao Kong@xk_theo7·2 Ara

I’m currently in transit to San Diego for NeurIPS. If you’re also killing time, feel free to check out a 2-minute-30-second horror sci-fi short film Michael and I recently created. We’d love any comments or likes: devpost.com/software/dream… Looking forward to catching up at the venue! 🎥

English

209

Xianghao Kong@xk_theo7·28 Kas

Why must robots be human-shaped? Bringing impossible creatures into the real world can create just as beautiful an emotional bond ❤️

Disneyland Paris EN@DisneyParis_EN

It's official! From 29 March 2026 you'll be able to discover World of Frozen and lots of other experiences at Disney Adventure World! 🤩

English

246

Xianghao Kong@xk_theo7·23 Ağu

I feel the debate shouldn’t only be about whether DiT is effective, but also about how information preservation is the key to accelerating diffusion training. Our MicroDiT (arxiv.org/abs/2407.15811) paper showed this: by letting masked token info mix into unmasked ones, we can cut down a lot of tokens with only minor performance loss. Interestingly, two months ago, when I caught up with @StefanABaumann at #CVPR, we discussed how TREAD and MicroDiT are conceptually similar from info perspective. Maybe it’s time to look at diffusion through an information-theoretic lens: from post-training (for the better alignment) to latent space curation, I believe this could lead to some really exciting discoveries!

サメQCU@sameQCU

bros, DiT is wrong. it's mathematically wrong. it's formally wrong. there is something wrong with it

English

1.8K

Xianghao Kong@xk_theo7·14 Ağu

Shout out for Doji!

Doji@doji_com

Introducing Look Studio. Style looks from scratch with 1M+ products from designer brands - including shoes, multiple layers and more. Reply for an invite.

English

297

Xianghao Kong@xk_theo7·7 Ağu

@jfischoff Yura Borisov?😂

388

Jonathan Fischoff@jfischoff·6 Ağu

In some other life, I'm a Russian mob boss

English

966

Xianghao Kong@xk_theo7·31 Tem

@sleenyre Loving the post-training insights 👏

English

NYRE@sleenyre·31 Tem

You can read about how we made the open weights checkpoint at our blog. Some writings that helped me shaped the blog:

KREA AI@krea_ai

if you want to learn about how we trained KREA Flux, we prepared a detailed blog in the link below: krea.ai/blog/flux-krea…

English

1.1K

Xianghao Kong أُعيد تغريده

Reka@RekaAILabs·8 Tem

Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions.

English

118

485.8K

Xianghao Kong أُعيد تغريده

Midjourney@midjourney·18 Haz

Introducing our V1 Video Model. It's fun, easy, and beautiful. Available at 10$/month, it's the first video model for *everyone* and it's available now.

English

359

594

3.9K

1.9M

Xianghao Kong أُعيد تغريده

Özgür Kara@ozgurkara99·12 Haz

+ @cveu_workshop starting at 1:00 PM, 207 A-D.

James Matthew Rehg@RehgJim

Very happy to be in Music City for #CVPR2025 My lab is presenting 7 papers, 4 selected as highlights. My amazing students @IrohXu @zixuan_huang @Wenqi_Jia @bryanislucky Xiang Li @fionakryan and postdoc Sangmin Lee are here! @siebelschool @uofigrainger

English

562

Xianghao Kong@xk_theo7·11 Haz

Heading to Nashville 🎸 for @CVPR (06/11 - 06/16)! Always excited to catch up with old friends and make new connections. Let’s grab a coffee ☕️ or chat about diffusion models, post-training, or just life! #CVPR2025 #Diffusion #GenerativeAI #Nashville

English

289

Xianghao Kong أُعيد تغريده

David@DavidSHolz·7 Haz

you're now closer to the year 2050 than the year 2000

English

1.2K

79.1K

Xianghao Kong@xk_theo7·18 May

@tunahansalih @amazon Congrats on staying in SF for a cool summer! Don’t forget to grab a slice at Tony’s Pizza 🍕 in downtown

English

141

Tuna Meral@tunahansalih·18 May

Starting Monday, I’ll be joining the @amazon AGI team as an Applied Scientist Intern. I’ll be working on something exciting that builds on my research in vision generative AI. Grateful for the opportunity and excited for what’s ahead. I’ll be in San Francisco all summer, let me know if you want to grab coffee!

English

820

Xianghao Kong@xk_theo7·4 Nis

@sameen2080 Try @reveimage 🤔? x.com/reveimage/stat…

Reve@reve

✅ Horse riding an astronaut ✅ Clock reading 5:30 ✅ Full to the brim glass of wine

Xianghao Kong أُعيد تغريده

Jinghan Yao@JinghanYao·7 Mar

📢 Our paper "Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer" has been accepted to hashtag#MLSys2025, taking place May 12-15! Excited to share our research at the intersection of machine learning and systems in San Jose, CA. 🎉 Check out the full program here: lnkd.in/ecyzGbwJ hashtag#MLSys hashtag#MachineLearning hashtag#Systems hashtag#Conference

English

1.3K

Xianghao Kong@xk_theo7·2 Mar

@alec_helbling That's dope🔥! We found mutual info can discover the same thing. And it's beyond attention and model-agnostic!

English

108

Alec Helbling@alec_helbling·28 Şub

Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo 👇

English

356

24.4K

Xianghao Kong أُعيد تغريده

Vikash Sehwag@VSehwag_·27 Şub

Delighted to see MicroDiffusion paper being accepted at CVPR. Checkout the code and models if you are looking for an extremely low cost setup for latent diffusion models.

Vikash Sehwag@VSehwag_

Following fully open-source philosophy, we’ve released the official training code, data code, and model ckpts for our micro-budget training of diffusion models from scratch (MicroDiTs). Now anyone can train a Stable Diffusion v1/v2-quality model from scratch in just 2.5 days using 8 H100 GPUs (<$2000 cost). Github: github.com/SonyResearch/m… Checkpoints: huggingface.co/VSehwag24/Micr… @SonyAI_global 1/3

English

3.6K

اكتشف

@LumaLabsAI @hudsonyeoce @StefanABaumann @jfischoff @sleenyre @cveu_workshop @CVPR @tunahansalih