Liangbing Zhao

52 posts

Liangbing Zhao

Liangbing Zhao

@ben_nebulous

Katılım Ocak 2022
167 Takip Edilen15 Takipçiler
Sabitlenmiş Tweet
Liangbing Zhao retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Editing images is a series of state transitions between the source image and the edited image that we want. Yet, the existing paradigm doesn't explicitly include any transitioning priors in the editing process. This becomes particularly prevalent for edits, involving causal dynamics (e.g., refraction, deformation). To model this kind of physics-informed information, we leverage the rich priors present in videos and introduce PhysicEdit 🔥 TL;DR: We fine-tune QwenImage Edit on a curated dataset of videos with reasoning traces and fixed-length transition queries to do solid physics-aware image editing! In the process, we introduce a cool dataset "PhysicTran38K", consisting of 38K transition trajectories across five physical domains and devise a method to provide supervision from it QwenImage Edit. Hop in to learn more ⬇️
Sayak Paul tweet media
English
12
39
345
42.5K
Liangbing Zhao retweetledi
KREA AI
KREA AI@krea_ai·
if you want to learn about how we trained KREA Flux, we prepared a detailed blog in the link below: krea.ai/blog/flux-krea…
English
1
19
107
26.2K
Liangbing Zhao retweetledi
Le Zhuo
Le Zhuo@zhuole1025·
One surprising 2D/3D RoPE mismatch between diffusion folks vs. LLM folks: Diffusion models (e.g., WAN, Flux) typically apply 1D RoPE independently per axis. LLMs (e.g., Qwen-VL, VideoRoPE) split the full frequency budget across axes (true 2D/3D).
English
0
2
6
315
Liangbing Zhao retweetledi
Dinghuai Zhang 张鼎怀
Dinghuai Zhang 张鼎怀@zdhnarsil·
After discussion with @thjashin, results from "Diffusion Beats Autoregressive in Data-Constrained Settings" look like an exploit of the AR model's overfitting. Without overfitting, there seems no hope for discrete diffusion to outperform AR; see the 10B token plot for example.
Dinghuai Zhang 张鼎怀 tweet media
Jinjie Ni@NiJinjie

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks. > No saturation: more repeats = more gains. 🚨 ”x.openreview.net” We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review! 🔗 Blog & details: jinjieni.notion.site/Diffusion-Lang… 18 🧵s ahead:

English
11
26
187
45.8K
Liangbing Zhao retweetledi
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
Explaining Flow Matching in 4 minutes
English
9
127
1K
59.8K
Liangbing Zhao retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Transition Matching: Scalable and Flexible Generative Modeling "This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues."
Tanishq Mathew Abraham, Ph.D. tweet media
English
17
46
273
20.6K
Liangbing Zhao retweetledi
Liangbing Zhao retweetledi
Xun Huang
Xun Huang@xxunhuang·
Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.
English
29
130
903
180K
Liangbing Zhao retweetledi
Nate Gillman
Nate Gillman@GillmanLab·
Ever wish you could turn your video generator into a controllable physics simulator? We're thrilled to introduce Force Prompting! Animate any image with physical forces and get fine-grained control, without needing any physics simulator or 3D assets at inference. 🧵(1/n)
English
8
66
318
44.3K
Liangbing Zhao retweetledi
Wenhu Chen
Wenhu Chen@WenhuChen·
🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing Rule-based verifiers work only for numeric/math answers—can’t verify latex expression, matrices, arrays, and short statement. 2. No high-quality verifiable data outside math. 📢 We're excited to introduce General-Reasoner, a novel framework that expands LLM reasoning to math, physics, chemistry, finance, business, and more! ✨ Key ideas: - A new dataset **WebInstruct-verified** of verifiable reasoning data across many disciplines. - A model-based generative verifier that can verify short answers like latex expression, matrices, arrays, and short statement very accurately. 📈 Big gains across science and math benchmarks: +11–13% on MMLU-Pro (30+ domains) +8–9% on SuperGPQA (285+ domains) +9–11% on GPQA slight gains even on MATH, AMC, AIME vs math-RL models like SImpleRL-Zoo. Now we are releasing the preview version! - Github: github.com/TIGER-AI-Lab/G…, with all the pointers to models and verfiier. - Data: huggingface.co/datasets/TIGER… - Tech Report: github.com/TIGER-AI-Lab/G…
Wenhu Chen tweet media
English
8
73
330
44.8K
Liangbing Zhao retweetledi
Lilian Weng
Lilian Weng@lilianweng·
Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…
English
128
482
3.5K
371.2K
Liangbing Zhao retweetledi
Liangbing Zhao retweetledi
Mohamed Elhoseiny
Mohamed Elhoseiny@moElhoseiny·
@ICLR25: @KAUSTVisionCAIR’s wenxuan zhang and @ben_nebulous are presenting BFPO and Toddler diffusion this morning; posters 277 and 280 Hall 2; stop by to learn more about carefully modeling the dichotomy of safety and helpfulness and more interpretable and efficient diffusion.
Mohamed Elhoseiny tweet mediaMohamed Elhoseiny tweet media
Mohamed Elhoseiny@moElhoseiny

#ICLR 2025 🚀 Excited to share that three papers have been accepted at ICLR 2025! 🎉 Huge thanks to my incredibly talented students and collaborators for their dedication and hard work—this wouldn't have been possible without you!

English
0
2
8
809
Liangbing Zhao
Liangbing Zhao@ben_nebulous·
We also introduce the GenRef-1M dataset, which plays a crucial role in training our Reflection Generator and FLUX Corrector. Under our final ReflectionFlow framework, we achieved a remarkable GenEval score of 0.91 with just 32 samples, and the performance is far from saturated.🚀
English
0
0
0
15
Liangbing Zhao retweetledi
Jun Garvin Chen
Jun Garvin Chen@garvinchen2·
🚀We introduce WikiAutoGen that can automatically generate multi-modal Wikipedia-style articles for the first time! Different from traditional wikipedia, you can input not only text, but also image or image + text as the query check out our paper 👇 🔗wikiautogen.github.io
English
1
2
4
242
Liangbing Zhao retweetledi
Wenhu Chen
Wenhu Chen@WenhuChen·
I spent the weekend reading some recent great math+reasoning papers: 1. AceMath (arxiv.org/abs/2412.15084) 2. rStar-Math (arxiv.org/pdf/2501.04519) 3. PRIME (arxiv.org/abs/2412.01981) Here are some of my naive thoughts! It could be wrong. All of these papers are showing possible ways to reach o1. The secret source is pretty much the same thing: **high-quality/difficult prompt with verifiable answer** 1. AceMath takes a simple approach (rejection-fine-tuning -> RFT) to scale up all the SFT dataset to massive size based on the verifiable answer matching. No RM is necessary, but you can still use outcome RM to help boost the performance. 2. rStarMath uses self-evolving SFT approach to gradually boost the data quality and process preference model (PPM) performance. rStarMath is still an RFT, where the samples are coming from MCTS guided with PPM. Still, it requires strong supervision from the verifiable reward in the end. rStarMath also scales up inference compute by utilizing the PPM at each step. 3. PRIME takes a very different angle! PRIME actually uses PPO to train the model, but the major contribution is on how to assign the outcome's reward to each intermediate steps. It also relies heavily on using the verifiable answer to obtain the "correct" on-policy model outputs. The results are quite interesting. It seems that all these approaches are reaching similar results. Eurus-2 might seem weaker due to its smaller training set size. These results are all somewhat on par with o1-mini already/ Given some leakage that o1-mini is ~20B, it basically says that on there is no gap with o1 at least on math problems now. However, o1-mini might win significantly in other broader reasoning tasks, like physics, puzzles, etc. These results might reveal that reaching o1 is more of a data or infra problem than an algorithm problem. As we find great ways to scale up the (good and difficult prompt, verifiable answer) pairs from different domains, the actual algorithm might not influence too much. Some algorithms are more data efficient than the others, but many of them will take us to o1 or even o3.
Wenhu Chen tweet media
English
18
156
919
81.5K
Liangbing Zhao retweetledi
Zhou Xian
Zhou Xian@zhou_xian_·
Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n
English
562
3K
16.1K
3.8M