Yuhang Chen

103 posts

Yuhang Chen

@chen940382

A student of HUST,China

Wuhan,China Katılım Temmuz 2023

306 Takip Edilen15 Takipçiler

Yuhang Chen retweetledi

Rohan Paul@rohanpaul_ai·1d

Beautiful paper, MetaClaw Shows how a deployed LLM agent can keep learning on the job without stopping service. The current problem is that most agents in production stay frozen, so they keep making the same mistakes as user needs shift and new kinds of tasks appear. The paper’s fix is to split improvement into 2 loops: a fast loop that turns failures into reusable written skills right away, and a slow loop that later updates the model itself. That split matters because the new skills help immediately with no downtime, while the slower training runs only when the user is away, using idle time like sleep, inactivity, or meetings. The authors tested this on a 934-question benchmark built as 44 simulated workdays and on a separate automated research pipeline that has 23 stages. They found that skills alone raised accuracy by up to 32% relative, and the full setup lifted Kimi-K2.5 from 21.4% to 40.6% while also improving robustness on the research pipeline by 18.3%. What makes the paper matter is the pattern behind the gains: production agents can adapt fast now and learn deeper later, so they improve from real use instead of staying frozen.

English

247

14.5K

Yuhang Chen retweetledi

alphaXiv@askalphaxiv·23h

"Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation" This paper introduces the logic of human vision to diffusion models, where you generate full detail only when the viewer is looking, and becomes low detail in the periphery. With this setup, you can get up to 2x faster image generation and 4x faster video generation with little perceptual drop!

English

3.6K

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·4d

Dataset: huggingface.co/datasets/tence… Paper: huggingface.co/papers/2603.06…

Filipino

764

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·4d

VideoDetective See Less but Know More. A plug-and-play framework for long video understanding that hunts clues by integrating extrinsic query relevance with intrinsic video structure via spatio-temporal affinity graphs.

English

1.4K

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·3d

PEARL A plug-and-play training-free framework for Personalized Streaming Video Understanding. Enables real-time recognition of user-defined concepts in continuous video streams with precise timestamp localization.

English

1.9K

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·3d

Paper: huggingface.co/papers/2603.22… Awesome list: github.com/IBM/awesome-ag…

English

652

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·1d

PixelSmile A diffusion framework for fine-grained facial expression editing with continuous intensity control and robust identity preservation across human and anime portraits. Supports zero-shot expression blending and introduces FFE-Bench for comprehensive evaluation.

English

2.1K

Yuhang Chen retweetledi

alphaXiv@askalphaxiv·5d

Yann LeCun and his team can't stop cooking "LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels" One of the biggest bottlenecks of JEPA is they are hard to train, and this new research changes that. They propose LeWorldModel, which shows that a small model can learn a usable world model directly from raw pixels end-to-end. Sitting at 15M parameters, they made it without needing heuristics and avoiding anti-collapse hacks while staying competitive and planning up to 48x faster. Making JEPA based modeling much more accessible, cheaper, and stabler.

English

237

1.8K

190.3K

Yuhang Chen retweetledi

Robert Youssef@rryssf_·2d

everyone's building multi-agent systems right now. multiple llms collaborating, checking each other's work, splitting tasks researchers tested whether this actually helps across 180 controlled configurations. matched token budgets. multiple model families. four different task domains centralized coordination improved performance by 80.8% on parallelizable tasks. on sequential reasoning tasks, every multi-agent variant made things worse by 39-70% more agents isn't better. it depends entirely on the shape of the task

English

7.8K

Yuhang Chen retweetledi

Xihui Liu@XihuiLiu·1d

We are excited to present **MACRO**, a large-scale dataset (400,000 samples) and comprehensive benchmark designed for improving multi-reference image generation using structured, long-context data. 🖼️ MacroData: Features 400K high-quality samples that support up to 10 input reference images, with an average of 5.44 images per sample. 🗂️ Diverse Coverage: Comprehensively covers Customization, Illustration, Spatial, and Temporal tasks, categorized by the number of input references (1-3, 4-5, 6-7, and 8-10). 📏 MacroBench: A rigorous evaluation protocol tailored to accurately assess multi-image generation models across various tasks using targeted metrics. 📈 Performance Boost: Fine-tuning the Bagel model on MACRO elevates its MacroBench score from 3.03 to 5.71. This substantial leap significantly narrows the performance gap between open-weight architectures and leading closed-source models. 🌐 Project Page: macro400k.github.io 📄 arXiv: arxiv.org/abs/2603.25319 🤗 Hugging Face: huggingface.co/papers/2603.25… 💻 GitHub Repo: github.com/HKU-MMLab/Macro

English

5.4K

Yuhang Chen retweetledi

Panda@Jiaxi_Cui·2d

很多人凭自己的直觉认为多模态应该尽可能多的做标注但实际并不是的，在北大做Languagebind(arxiv.org/pdf/2310.01852)的时候我就非常排斥对图片进行标注再训练或者检索因为人为或者模型打出的标注，只是按照人类的想法把向量空间的特征映射到了人类自然语言空间而已，本来就造成了特征损失而借助 AutoResearch，在我们的 cerul.ai 的实验上我验证了这个想法，实际上对图片的标注越多，反而越会损伤embedding检索的性能

中文

142

13.8K

Yuhang Chen retweetledi

Junyang Lin@JustinLin610·2d

x.com/i/article/2037…

ZXX

565

2.9K

760.5K

Yuhang Chen retweetledi

Cheng Luo@ChengLuo_lc·2d

We open-source Attention Residuals — replacing standard additive residuals with learned cross-layer attention in transformers. Block AttnRes reduces WikiText-2 perplexity by 7.7% with only 0.03% extra parameters. Includes visualization of how layers route information across depth. Code: github.com/wdlctc/open-at… Blog: wdlctc.github.io/open-attention…

English

249

14.9K

Yuhang Chen retweetledi

Lorenzo Xiao@lrzneedresearch·3d

I’m so lost… Jobless… School-less My research direction has no impact and I have no idea what I should do next.

English

164

18.4K

Yuhang Chen retweetledi

DailyPapers@HuggingPapers·4d

Dataset: huggingface.co/datasets/faceb… Paper: huggingface.co/papers/2512.13…

Filipino

644

Yuhang Chen retweetledi

Bo Wang@BoWang87·4d

Sharing another very cool paper from my friend @XinggangWang. It goes after one of the most fundamental assumptions in Transformers: residual connections. The core issue is simple: as Transformers get deeper, early-layer signals get washed out. Every residual update is added with roughly equal weight, so features formed in shallow layers gradually get diluted. By the time you are 100 layers deep, a lot of that useful early information is barely preserved. MoDA’s idea is elegant: let attention operate not just across the sequence, but across depth too. So instead of each head only attending over tokens, it also attends to KV pairs from previous layers at the same position. In other words, the model can look back not only across context, but also across its own intermediate representations — all in one unified attention operation. What makes this even better is that the engineering is serious too: --fused Triton kernel reaches 97.3% of FlashAttention-2 efficiency at 64K context with only 3.7% FLOPs overhead --works even better with post-norm than pre-norm, also reduces attention sink behavior as a nice side effect And the results are strong: at 1.5B scale, MoDA gets +2.11% average improvement across 10 downstream tasks, and -0.2 perplexity across 10 benchmarks vs OLMo2. For a long time, depth has been the relatively underused scaling axis. People talk about data scale, model width, and context length. Much less about how to make depth actually compound. MoDA makes a very compelling case that depth still has a lot to give — if the architecture can truly preserve and reuse what earlier layers learned. Triton code is open: github.com/hustvl/MoDA Paper: arxiv.org/abs/2603.15619

English

118

12.2K

Yuhang Chen retweetledi

Jiayi Geng@JiayiiGeng·4d

As long-horizon software engineering tasks grow in complexity, a single agent can no longer finish the tasks alone — effective multi-agent collaboration becomes necessary. This leads to a natural question: how can multiple agents be coordinated to asynchronously collaborate over a shared artifact in an effective way? We answer this question in our new preprint: Effective Strategies for Asynchronous Software Engineering Agents! We suggest that to coordinate multiple software engineering agents, branch-and-merge is the key coordination mechanism, and that human SWE primitives like git worktree, git commit, and git merge are all you need to support it. (1/n)

English

382

39.6K

Yuhang Chen retweetledi

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·5d

Reinforcement Learning from Human Feedback by Nathan Lambert Book: rlhfbook.com/c/06-policy-gr… Video: youtube.com/watch?v=jQPiH-… This is one of the best resources to understand how ChatGPT-like systems are actually trained. The RLHF Book. What you’ll learn: → What RLHF actually is (beyond the buzzword) → How models learn from human preferences → Reward models, policy training, and alignment → Why models become helpful, safe, and “human-like” What’s inside: → Full RLHF pipeline (instruction tuning → reward model → RL) → Practical intuition + real training workflows → Algorithms like PPO, DPO, and modern alignment methods → Advanced topics like evaluation, synthetic data, and open problems

YouTube

English

329

15.3K

Yuhang Chen retweetledi

Jason Weston@jaseweston·5d

🌐Unified Post-Training via On-Policy-Trained LM-as-RM🔧 RLLM = RL + LM-as-RM: - post-training framework that unifies RL across easy-, hard-to-verify, and non-verifiable tasks. - trains the LM-as-RM reward model on-policy from the policy’s own outputs, then uses those generative rewards to optimize the policy. 🔗📈 - uses the LLM’s reasoning + instruction-following for higher-quality rewards — boosting performance on all task types. 🚀🤖🏆 Read more in the blog post: facebookresearch.github.io/RAM/blogs/rllm/

English

308

24.6K

Yuhang Chen retweetledi

The AI Timeline@TheAITimeline·6d

🚨This week's top AI/ML research papers: - Attention Residuals - V-JEPA 2.1 - Mamba-3 - AI Can Learn Scientific Taste - Mixture-of-Depths Attention - Temporal Straightening for Latent Planning overview for each + authors' explanations read this in thread mode for the best experience

English

479

28.1K

Keşfet

@XinggangWang @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine