Liangbing Zhao (@ben_nebulous) - Twitter Profili

Sabitlenmiş Tweet

Liangbing Zhao@ben_nebulous·27 Şub

Can a static image editor learn the laws of physics? Yes, by learning from video dynamics! 🎥 Super excited to share our latest work on physics-aware image editing. Full story below!

Sayak Paul@RisingSayak

Editing images is a series of state transitions between the source image and the edited image that we want. Yet, the existing paradigm doesn't explicitly include any transitioning priors in the editing process. This becomes particularly prevalent for edits, involving causal dynamics (e.g., refraction, deformation). To model this kind of physics-informed information, we leverage the rich priors present in videos and introduce PhysicEdit 🔥 TL;DR: We fine-tune QwenImage Edit on a curated dataset of videos with reasoning traces and fixed-length transition queries to do solid physics-aware image editing! In the process, we introduce a cool dataset "PhysicTran38K", consisting of 38K transition trajectories across five physical domains and devise a method to provide supervision from it QwenImage Edit. Hop in to learn more ⬇️

English

0

1

3

85

Liangbing Zhao retweetledi

Sayak Paul@RisingSayak·27 Şub

Editing images is a series of state transitions between the source image and the edited image that we want. Yet, the existing paradigm doesn't explicitly include any transitioning priors in the editing process. This becomes particularly prevalent for edits, involving causal dynamics (e.g., refraction, deformation). To model this kind of physics-informed information, we leverage the rich priors present in videos and introduce PhysicEdit 🔥 TL;DR: We fine-tune QwenImage Edit on a curated dataset of videos with reasoning traces and fixed-length transition queries to do solid physics-aware image editing! In the process, we introduce a cool dataset "PhysicTran38K", consisting of 38K transition trajectories across five physical domains and devise a method to provide supervision from it QwenImage Edit. Hop in to learn more ⬇️

English

12

39

345

42.5K

Liangbing Zhao retweetledi

KREA AI@krea_ai·31 Tem

if you want to learn about how we trained KREA Flux, we prepared a detailed blog in the link below: krea.ai/blog/flux-krea…

English

1

19

107

26.2K

Liangbing Zhao retweetledi

Le Zhuo@zhuole1025·22 Ağu

One surprising 2D/3D RoPE mismatch between diffusion folks vs. LLM folks: Diffusion models (e.g., WAN, Flux) typically apply 1D RoPE independently per axis. LLMs (e.g., Qwen-VL, VideoRoPE) split the full frequency budget across axes (true 2D/3D).

English

0

2

6

315

Liangbing Zhao retweetledi

Dinghuai Zhang 张鼎怀@zdhnarsil·10 Ağu

After discussion with @thjashin, results from "Diffusion Beats Autoregressive in Data-Constrained Settings" look like an exploit of the AR model's overfitting. Without overfitting, there seems no hope for discrete diffusion to outperform AR; see the 10B token plot for example.

Jinjie Ni@NiJinjie

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks. > No saturation: more repeats = more gains. 🚨 ”x.openreview.net” We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review! 🔗 Blog & details: jinjieni.notion.site/Diffusion-Lang… 18 🧵s ahead:

English

11

26

187

45.8K

Liangbing Zhao retweetledi

Jia-Bin Huang@jbhuang0604·31 Tem

Explaining Flow Matching in 4 minutes

English

9

127

1K

59.8K

Liangbing Zhao retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·1 Tem

Transition Matching: Scalable and Flexible Generative Modeling "This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues."

Tanishq Mathew Abraham, Ph.D. tweet media

English

17

46

273

20.6K

Liangbing Zhao retweetledi

Le Zhuo@zhuole1025·28 Haz

ReflectionFlow is accepted by ICCV'25! We make an early exploration of introducing chain-of-thought reasoning and inference-time scaling by reflection to text-to-image diffusion models🔥 Code, dataset, and checkpoints are all open-source! Huge thanks to my amazing collaborators: @RisingSayak @ben_nebulous

Sayak Paul@RisingSayak

Excited to release ReflectionFlow -- a framework that enables text-to-image diffusion models to refine their own output through reflection 💬🔥 We release GenRef-1M, a large-scale dataset consisting of (good_img, bad_img, reflection) triplets. Through extensive experiments, we show that ReflectionFlow acts as a reliable test-time scaling framework, providing multiple degrees of freedom. Check it out here: diffusion-cot.github.io/reflection2per… 1/7

English

0

1

5

480

Liangbing Zhao retweetledi

Xun Huang@xxunhuang·9 Haz

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

English

29

130

903

180K

Liangbing Zhao retweetledi

Nate Gillman@GillmanLab·28 May

Ever wish you could turn your video generator into a controllable physics simulator? We're thrilled to introduce Force Prompting! Animate any image with physical forces and get fine-grained control, without needing any physics simulator or 3D assets at inference. 🧵(1/n)

English

8

66

318

44.3K

Liangbing Zhao retweetledi

Wenhu Chen@WenhuChen·15 Nis

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing Rule-based verifiers work only for numeric/math answers—can’t verify latex expression, matrices, arrays, and short statement. 2. No high-quality verifiable data outside math. 📢 We're excited to introduce General-Reasoner, a novel framework that expands LLM reasoning to math, physics, chemistry, finance, business, and more! ✨ Key ideas: - A new dataset **WebInstruct-verified** of verifiable reasoning data across many disciplines. - A model-based generative verifier that can verify short answers like latex expression, matrices, arrays, and short statement very accurately. 📈 Big gains across science and math benchmarks: +11–13% on MMLU-Pro (30+ domains) +8–9% on SuperGPQA (285+ domains) +9–11% on GPQA slight gains even on MATH, AMC, AIME vs math-RL models like SImpleRL-Zoo. Now we are releasing the preview version! - Github: github.com/TIGER-AI-Lab/G…, with all the pointers to models and verfiier. - Data: huggingface.co/datasets/TIGER… - Tech Report: github.com/TIGER-AI-Lab/G…

English

8

73

330

44.8K

Liangbing Zhao retweetledi

Lilian Weng@lilianweng·17 May

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

English

128

482

3.5K

371.2K

Liangbing Zhao retweetledi

Sayak Paul@RisingSayak·1 May

Lovely meeting with @zhuole1025 and @ben_nebulous at #ICLR2025 and releasing ReflectionFlow in between 🔥 ReflectionFlow: x.com/RisingSayak/st…

Sayak Paul@RisingSayak

Excited to release ReflectionFlow -- a framework that enables text-to-image diffusion models to refine their own output through reflection 💬🔥 We release GenRef-1M, a large-scale dataset consisting of (good_img, bad_img, reflection) triplets. Through extensive experiments, we show that ReflectionFlow acts as a reliable test-time scaling framework, providing multiple degrees of freedom. Check it out here: diffusion-cot.github.io/reflection2per… 1/7

English

4

3

23

2.9K

Liangbing Zhao retweetledi

Mohamed Elhoseiny@moElhoseiny·25 Nis

@ICLR25: @KAUSTVisionCAIR’s wenxuan zhang and @ben_nebulous are presenting BFPO and Toddler diffusion this morning; posters 277 and 280 Hall 2; stop by to learn more about carefully modeling the dichotomy of safety and helpfulness and more interpretable and efficient diffusion.

Mohamed Elhoseiny@moElhoseiny

#ICLR 2025 🚀 Excited to share that three papers have been accepted at ICLR 2025! 🎉 Huge thanks to my incredibly talented students and collaborators for their dedication and hard work—this wouldn't have been possible without you!

English

0

2

8

809

Liangbing Zhao@ben_nebulous·24 Nis

Thanks for reposting our work!

AK@_akhaliq

From Reflection to Perfection Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

English

0

33

Liangbing Zhao@ben_nebulous·24 Nis

We also introduce the GenRef-1M dataset, which plays a crucial role in training our Reflection Generator and FLUX Corrector. Under our final ReflectionFlow framework, we achieved a remarkable GenEval score of 0.91 with just 32 samples, and the performance is far from saturated.🚀

English

0

15

Liangbing Zhao@ben_nebulous·24 Nis

We’re excited to introduce ReflectionFlow, a framework that empowers text-to-image diffusion models with self-reflection capabilities! 🎉

Sayak Paul@RisingSayak

Excited to release ReflectionFlow -- a framework that enables text-to-image diffusion models to refine their own output through reflection 💬🔥 We release GenRef-1M, a large-scale dataset consisting of (good_img, bad_img, reflection) triplets. Through extensive experiments, we show that ReflectionFlow acts as a reliable test-time scaling framework, providing multiple degrees of freedom. Check it out here: diffusion-cot.github.io/reflection2per… 1/7

English

1

0

2

114

Liangbing Zhao retweetledi

Jun Garvin Chen@garvinchen2·26 Mar

🚀We introduce WikiAutoGen that can automatically generate multi-modal Wikipedia-style articles for the first time! Different from traditional wikipedia, you can input not only text, but also image or image + text as the query check out our paper 👇 🔗wikiautogen.github.io

English

1

2

4

242

Liangbing Zhao retweetledi

Wenhu Chen@WenhuChen·12 Oca

I spent the weekend reading some recent great math+reasoning papers: 1. AceMath (arxiv.org/abs/2412.15084) 2. rStar-Math (arxiv.org/pdf/2501.04519) 3. PRIME (arxiv.org/abs/2412.01981) Here are some of my naive thoughts! It could be wrong. All of these papers are showing possible ways to reach o1. The secret source is pretty much the same thing: **high-quality/difficult prompt with verifiable answer** 1. AceMath takes a simple approach (rejection-fine-tuning -> RFT) to scale up all the SFT dataset to massive size based on the verifiable answer matching. No RM is necessary, but you can still use outcome RM to help boost the performance. 2. rStarMath uses self-evolving SFT approach to gradually boost the data quality and process preference model (PPM) performance. rStarMath is still an RFT, where the samples are coming from MCTS guided with PPM. Still, it requires strong supervision from the verifiable reward in the end. rStarMath also scales up inference compute by utilizing the PPM at each step. 3. PRIME takes a very different angle! PRIME actually uses PPO to train the model, but the major contribution is on how to assign the outcome's reward to each intermediate steps. It also relies heavily on using the verifiable answer to obtain the "correct" on-policy model outputs. The results are quite interesting. It seems that all these approaches are reaching similar results. Eurus-2 might seem weaker due to its smaller training set size. These results are all somewhat on par with o1-mini already/ Given some leakage that o1-mini is ~20B, it basically says that on there is no gap with o1 at least on math problems now. However, o1-mini might win significantly in other broader reasoning tasks, like physics, puzzles, etc. These results might reveal that reaching o1 is more of a data or infra problem than an algorithm problem. As we find great ways to scale up the (good and difficult prompt, verifiable answer) pairs from different domains, the actual algorithm might not influence too much. Some algorithms are more data efficient than the others, but many of them will take us to o1 or even o3.

English

18

156

919

81.5K

Liangbing Zhao retweetledi

Zhou Xian@zhou_xian_·19 Ara

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n

English

562

3K

16.1K

3.8M

Liangbing Zhao

Keşfet