Yuhang Chen

103 posts

Yuhang Chen banner
Yuhang Chen

Yuhang Chen

@chen940382

A student of HUST,China

Wuhan,China Katılım Temmuz 2023
306 Takip Edilen15 Takipçiler
Yuhang Chen retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Beautiful paper, MetaClaw Shows how a deployed LLM agent can keep learning on the job without stopping service. The current problem is that most agents in production stay frozen, so they keep making the same mistakes as user needs shift and new kinds of tasks appear. The paper’s fix is to split improvement into 2 loops: a fast loop that turns failures into reusable written skills right away, and a slow loop that later updates the model itself. That split matters because the new skills help immediately with no downtime, while the slower training runs only when the user is away, using idle time like sleep, inactivity, or meetings. The authors tested this on a 934-question benchmark built as 44 simulated workdays and on a separate automated research pipeline that has 23 stages. They found that skills alone raised accuracy by up to 32% relative, and the full setup lifted Kimi-K2.5 from 21.4% to 40.6% while also improving robustness on the research pipeline by 18.3%. What makes the paper matter is the pattern behind the gains: production agents can adapt fast now and learn deeper later, so they improve from real use instead of staying frozen.
Rohan Paul tweet media
English
24
52
247
14.5K
Yuhang Chen retweetledi
alphaXiv
alphaXiv@askalphaxiv·
"Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation" This paper introduces the logic of human vision to diffusion models, where you generate full detail only when the viewer is looking, and becomes low detail in the periphery. With this setup, you can get up to 2x faster image generation and 4x faster video generation with little perceptual drop!
alphaXiv tweet media
English
1
15
77
3.6K
Yuhang Chen retweetledi
DailyPapers
DailyPapers@HuggingPapers·
VideoDetective See Less but Know More. A plug-and-play framework for long video understanding that hunts clues by integrating extrinsic query relevance with intrinsic video structure via spatio-temporal affinity graphs.
DailyPapers tweet media
English
1
6
17
1.4K
Yuhang Chen retweetledi
DailyPapers
DailyPapers@HuggingPapers·
PEARL A plug-and-play training-free framework for Personalized Streaming Video Understanding. Enables real-time recognition of user-defined concepts in continuous video streams with precise timestamp localization.
DailyPapers tweet media
English
1
5
31
1.9K
Yuhang Chen retweetledi
DailyPapers
DailyPapers@HuggingPapers·
PixelSmile A diffusion framework for fine-grained facial expression editing with continuous intensity control and robust identity preservation across human and anime portraits. Supports zero-shot expression blending and introduces FFE-Bench for comprehensive evaluation.
DailyPapers tweet media
English
2
4
29
2.1K
Yuhang Chen retweetledi
alphaXiv
alphaXiv@askalphaxiv·
Yann LeCun and his team can't stop cooking "LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels" One of the biggest bottlenecks of JEPA is they are hard to train, and this new research changes that. They propose LeWorldModel, which shows that a small model can learn a usable world model directly from raw pixels end-to-end. Sitting at 15M parameters, they made it without needing heuristics and avoiding anti-collapse hacks while staying competitive and planning up to 48x faster. Making JEPA based modeling much more accessible, cheaper, and stabler.
alphaXiv tweet media
English
40
237
1.8K
190.3K
Yuhang Chen retweetledi
Robert Youssef
Robert Youssef@rryssf_·
everyone's building multi-agent systems right now. multiple llms collaborating, checking each other's work, splitting tasks researchers tested whether this actually helps across 180 controlled configurations. matched token budgets. multiple model families. four different task domains centralized coordination improved performance by 80.8% on parallelizable tasks. on sequential reasoning tasks, every multi-agent variant made things worse by 39-70% more agents isn't better. it depends entirely on the shape of the task
Robert Youssef tweet media
English
20
25
97
7.8K
Yuhang Chen retweetledi
Xihui Liu
Xihui Liu@XihuiLiu·
We are excited to present **MACRO**, a large-scale dataset (400,000 samples) and comprehensive benchmark designed for improving multi-reference image generation using structured, long-context data. 🖼️ MacroData: Features 400K high-quality samples that support up to 10 input reference images, with an average of 5.44 images per sample. 🗂️ Diverse Coverage: Comprehensively covers Customization, Illustration, Spatial, and Temporal tasks, categorized by the number of input references (1-3, 4-5, 6-7, and 8-10). 📏 MacroBench: A rigorous evaluation protocol tailored to accurately assess multi-image generation models across various tasks using targeted metrics. 📈 Performance Boost: Fine-tuning the Bagel model on MACRO elevates its MacroBench score from 3.03 to 5.71. This substantial leap significantly narrows the performance gap between open-weight architectures and leading closed-source models. 🌐 Project Page: macro400k.github.io 📄 arXiv: arxiv.org/abs/2603.25319 🤗 Hugging Face: huggingface.co/papers/2603.25… 💻 GitHub Repo: github.com/HKU-MMLab/Macro
Xihui Liu tweet media
English
0
7
78
5.4K
Yuhang Chen retweetledi
Panda
Panda@Jiaxi_Cui·
很多人凭自己的直觉认为多模态应该尽可能多的做标注 但实际并不是的,在北大做Languagebind(arxiv.org/pdf/2310.01852)的时候我就非常排斥对图片进行标注再训练或者检索 因为人为或者模型打出的标注,只是按照人类的想法把向量空间的特征映射到了人类自然语言空间而已,本来就造成了特征损失 而借助 AutoResearch,在我们的 cerul.ai 的实验上我验证了这个想法,实际上对图片的标注越多,反而越会损伤embedding检索的性能
Panda tweet media
中文
17
14
142
13.8K
Yuhang Chen retweetledi
Cheng Luo
Cheng Luo@ChengLuo_lc·
We open-source Attention Residuals — replacing standard additive residuals with learned cross-layer attention in transformers. Block AttnRes reduces WikiText-2 perplexity by 7.7% with only 0.03% extra parameters. Includes visualization of how layers route information across depth. Code: github.com/wdlctc/open-at… Blog: wdlctc.github.io/open-attention…
Cheng Luo tweet media
English
4
32
249
14.9K
Yuhang Chen retweetledi
Lorenzo Xiao
Lorenzo Xiao@lrzneedresearch·
I’m so lost… Jobless… School-less My research direction has no impact and I have no idea what I should do next.
English
21
4
164
18.4K
Yuhang Chen retweetledi
Bo Wang
Bo Wang@BoWang87·
Sharing another very cool paper from my friend @XinggangWang. It goes after one of the most fundamental assumptions in Transformers: residual connections. The core issue is simple: as Transformers get deeper, early-layer signals get washed out. Every residual update is added with roughly equal weight, so features formed in shallow layers gradually get diluted. By the time you are 100 layers deep, a lot of that useful early information is barely preserved. MoDA’s idea is elegant: let attention operate not just across the sequence, but across depth too. So instead of each head only attending over tokens, it also attends to KV pairs from previous layers at the same position. In other words, the model can look back not only across context, but also across its own intermediate representations — all in one unified attention operation. What makes this even better is that the engineering is serious too: --fused Triton kernel reaches 97.3% of FlashAttention-2 efficiency at 64K context with only 3.7% FLOPs overhead --works even better with post-norm than pre-norm, also reduces attention sink behavior as a nice side effect And the results are strong: at 1.5B scale, MoDA gets +2.11% average improvement across 10 downstream tasks, and -0.2 perplexity across 10 benchmarks vs OLMo2. For a long time, depth has been the relatively underused scaling axis. People talk about data scale, model width, and context length. Much less about how to make depth actually compound. MoDA makes a very compelling case that depth still has a lot to give — if the architecture can truly preserve and reuse what earlier layers learned. Triton code is open: github.com/hustvl/MoDA Paper: arxiv.org/abs/2603.15619
Bo Wang tweet mediaBo Wang tweet media
English
1
20
118
12.2K
Yuhang Chen retweetledi
Jiayi Geng
Jiayi Geng@JiayiiGeng·
As long-horizon software engineering tasks grow in complexity, a single agent can no longer finish the tasks alone — effective multi-agent collaboration becomes necessary. This leads to a natural question: how can multiple agents be coordinated to asynchronously collaborate over a shared artifact in an effective way? We answer this question in our new preprint: Effective Strategies for Asynchronous Software Engineering Agents! We suggest that to coordinate multiple software engineering agents, branch-and-merge is the key coordination mechanism, and that human SWE primitives like git worktree, git commit, and git merge are all you need to support it. (1/n)
Jiayi Geng tweet media
English
14
81
382
39.6K
Yuhang Chen retweetledi
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰
Reinforcement Learning from Human Feedback by Nathan Lambert Book: rlhfbook.com/c/06-policy-gr… Video: youtube.com/watch?v=jQPiH-… This is one of the best resources to understand how ChatGPT-like systems are actually trained. The RLHF Book. What you’ll learn: → What RLHF actually is (beyond the buzzword) → How models learn from human preferences → Reward models, policy training, and alignment → Why models become helpful, safe, and “human-like” What’s inside: → Full RLHF pipeline (instruction tuning → reward model → RL) → Practical intuition + real training workflows → Algorithms like PPO, DPO, and modern alignment methods → Advanced topics like evaluation, synthetic data, and open problems
YouTube video
YouTube
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰 tweet media
English
1
58
329
15.3K
Yuhang Chen retweetledi
Jason Weston
Jason Weston@jaseweston·
🌐Unified Post-Training via On-Policy-Trained LM-as-RM🔧 RLLM = RL + LM-as-RM: - post-training framework that unifies RL across easy-, hard-to-verify, and non-verifiable tasks. - trains the LM-as-RM reward model on-policy from the policy’s own outputs, then uses those generative rewards to optimize the policy. 🔗📈 - uses the LLM’s reasoning + instruction-following for higher-quality rewards — boosting performance on all task types. 🚀🤖🏆 Read more in the blog post: facebookresearch.github.io/RAM/blogs/rllm/
Jason Weston tweet media
English
5
45
308
24.6K
Yuhang Chen retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨This week's top AI/ML research papers: - Attention Residuals - V-JEPA 2.1 - Mamba-3 - AI Can Learn Scientific Taste - Mixture-of-Depths Attention - Temporal Straightening for Latent Planning overview for each + authors' explanations read this in thread mode for the best experience
English
4
54
479
28.1K