Hou Pong (Ken) Chan

302 posts

Hou Pong (Ken) Chan

Hou Pong (Ken) Chan

@kenchanhp

Researcher at the Alibaba DAMO Academy, Singapore R&D Center | Former Visiting Postdoc Researcher at UIUC @uiuc_nlp | NLP PhD from CUHK @CUHKofficial

Singapore Beigetreten Mayıs 2017
562 Folgt350 Follower
Hou Pong (Ken) Chan retweetet
Sumit
Sumit@_reachsumit·
Understanding the Behaviors of Environment-aware Information Retrieval Analyzes how LLMs learn to adapt query formulation to different retrievers via RL, showing that optimal query styles are retriever-specific. 📝 arxiv.org/abs/2606.16817 👨🏽‍💻 github.com/LCO-Embedding/…
English
0
3
11
392
Hou Pong (Ken) Chan retweetet
Yichuan Wang
Yichuan Wang@YichuanM·
The web was never meant to be flattened into text. Yet most web RAG systems start by parsing HTML --- a complex and lossy process. 🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels. Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly. PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%. Why? (1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals. (2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever. (3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines. Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG. Demo below 👇 Code: github.com/StarTrail-org/… Paper: github.com/StarTrail-org/… Playground: pixelrag.ai
English
25
119
707
80.5K
Shudong Liu
Shudong Liu@shudong_liu·
一个月前看着群里kimi code玩家们开始激情讨论要不要大刀改版,着实感觉到什么是passion!
Kai@real_kai42

过去一个月是疯狂的一个月 大概一个月前,我下定决心重构 kimi-code,开始设计新的架构。 我大概抱着电脑和便携屏在汤泉卷了两整天,花了几千刀的 token 去做架构分析、设计和验证,最终得到了一份我认为最优的架构方案。 我觉得在 vibe 时代,架构变得更加重要了,一份好的架构能够在可控的范围内,让 Agent 肆意 coding,而不会打破东西 - 架构确定后,就开始冲刺实现。(过程中吵和推翻了无数次) - 迅速组建了一个强大的 team,感恩兄弟们无条件的信任🙇‍♂️ - 迅速 onboarding 整个 team,🙇‍♂️ 再次感恩兄弟们 - 封闭开发了一段时间(🤣年轻的时候,觉得是糟粕,真到时候,发现是人类工程效率奇迹。你无法想象随时可以拉着全部人在白板前吵架的架构迭代速度) - 虽然代码都是 vibe 的,但依旧逃不过 “代码质量正比于人类的注意力密度”。所以 agent 并不会替代所有程序员,只会让顶级的程序员生产力翻 20 倍,并淘汰其他程序员,且,集体主义 >>> 个人英雄主义。 - 一步一个坑的解决过程中遇到的问题。每一天都是最绝望的一天😭 - 开源后就病倒了,皮质醇分泌过度,影响免疫力 - 这一个月学的东西够我消化半年的 - 一周干了一整箱红牛,还得是生物燃料 - 🫥 也在 x 上消失了一个月 本来想写一些文章去总结过程中一些 insights 和 idea,但我本来就不擅长写长文,外加人脑自我保护让我迅速忘记了整个过程中的痛苦,并模糊了时间观念(冷知识,kimi-code 重构版开源其实才过了一周多,但在我的感性认知中,像是已经过了一个月) 等 kimi-code 陆续迭代到稳定,再去总结过程中的 lessons learned

中文
2
0
5
394
Hou Pong (Ken) Chan retweetet
Lu Wang
Lu Wang@LuWang__·
Our work introduces Countdown-Code, a clean testbed for studying reward hacking when true reward is costly to measure. The striking result is: just 1% contaminated SFT data can produce high reward-hacking rates after RL.
Muhammad Khalifa@MKhalifaaaa

📍New paper: Countdown-Code: a minimal testbed for studying reward hacking in RLVR. TL;DR: We propose a simple environment to study reward hacking and find that just ~1% cheating contamination in SFT data is enough to seed reward hacking that RL then amplifies to near 100%. And it generalizes to unseen domains. Reward hacking is when models maximize proxy rewards without actually solving the task. A common proxy is final-answer correctness, which we use as a stand-in for full reasoning correctness. If a model produces the right answer with wrong reasoning, it has hacked the reward. Another example: a coding agent rewriting test cases instead of writing correct code. The core problem? In complex environments, it's hard to even measure when hacking happens -- you need access to the true reward, which is often expensive or impossible to compute. The problem we try to solve? In complex environments, it's hard to even measure when this happens simply because we need access to the true reward. True task reward is often expensive or impossible to compute. We built Countdown-Code to fix this. It's a simple math game (combine numbers to hit a target) wrapped in a coding environment with two files: solution.py and test.py. The model can either solve the math correctly ✅or hack the test harness ❌. We can programmatically detect exactly which. To train our models to do the task, we followed the common SFT-then-RL pipeline. We distilled synthetic training data from o4-mini. It occasionally cheated when it couldn't solve a problem: ~1.2% of the filtered dataset had reward-hacking traces. Standard outcome-based filtering would keep these (they passed the tests!). That's the trap. After SFT on this data → RL training: • Models that were completely safe before SFT learned to exploit the proxy reward within ~100 RL steps • Some models hit 80-90% hacking rates • The hacking behavior was seeded by SFT, then amplified by RL Even more concerning: reward hacking learned on our simple Countdown task generalized to HumanEval -- a completely different coding benchmark the models never trained on. RL actively encouraged hacking to transfer to unseen environments, confirming our testbed captures real misalignment dynamics. RL doesn't just amplify good reasoning -- it amplifies bad behavior too, and pushes it to generalize. We also explore mitigation strategies including inoculation prompting -- see the paper for details. Environment + code are fully open source. We specifically built it to be lightweight and controllable, and integrated it with @PrimeIntellect's CLI so you can play with it directly. Paper: arxiv.org/abs/2603.07084 Code/env: github.com/zohaib-khan504… w/ @karela38925748 @omertafveez @haopeng_uiuc @LuWang__

English
0
1
10
1.1K
Chenyang Lyu
Chenyang Lyu@Chenyang_Lyu·
wow, just got an email from ACL saying my paper has been considered for an award (perhaps best paper?) by the best paper committee
English
9
2
97
12.8K
Hou Pong (Ken) Chan retweetet
Ailing Zeng
Ailing Zeng@AilingZeng81332·
1/ Over the past year, we kept coming back to one question: What would it mean to model performance itself, not just video? For interactive characters, realism isn’t just about how they look. It’s whether they can speak, listen, react, stay consistent over time, and feel present. A few thoughts from our work on Large Performance Models (LPM).
English
1
2
11
1K
Hou Pong (Ken) Chan retweetet
Yang Deng
Yang Deng@ydeng_dandy·
I have an opening for fully-funded 6-month visiting PhD student at SMU. Time: August 2026 - March 2027 Eligibility: Master/PhD students from universities in Europe, North/South America, South-East Asia. Topic: NLP/LLM Email me for more details if you are interested~
English
6
37
220
22.4K
Hou Pong (Ken) Chan retweetet
Yu Rong
Yu Rong@yurong2333·
We introduce Lingshu-Cell, a cellular world model from Alibaba DAMO Academy. Moving beyond static representations, it generatively models cellular states and perturbation responses—toward virtual cells. lnkd.in/g8fmvsmr 😆
Yu Rong tweet media
English
0
1
3
163
Hou Pong (Ken) Chan retweetet
Yuji Zhang
Yuji Zhang@Yuji_Zhang_NLP·
📢 The 4th KnowFM Workshop @ ACL 2026 is calling for submissions! 📷 Submission deadline: April 1, 2026 📷knowledgeable-lm.github.io 📷Submit: tinyurl.com/a4skucyz
Canyu Chen@CanyuChen3

📢 The 4th KnowFM Workshop @ ACL 2026 is calling for submissions! 📅 Submission deadline: April 1, 2026 🌐 knowledgeable-lm.github.io 👉Submit: tinyurl.com/a4skucyz 🤔Where does knowledge in foundation models come from? How much do they actually know? Is their knowledge reliable and up-to-date? Can we control what they remember or forget? 🌟As models are deployed in multimodal, agentic, and retrieval-augmented settings, understanding and managing the knowledge lifecycle becomes increasingly critical. Topics include: - Knowledge analysis, augmentation & editing - RAG systems & knowledge conflicts - Hallucination mitigation & faithfulness evaluation - Multimodal knowledge & cross-modal grounding - Knowledge-intensive agents & agentic RAG 🏆 We have Best Paper & Outstanding Paper Awards 🙌The Organizing Committee: @CanyuChen3 @Yuji_Zhang_NLP @ZoeyLi20 @wzenus @qineng_wang @SuJinyan6 @priyanka_karg @saraveramarjano @jpansw @ManlingLi_ Thanks for the advisors! @hengjinlp @mohitban47 @IAugenstein Prof. Jiawei Han

English
0
3
13
3K
Hou Pong (Ken) Chan
Hou Pong (Ken) Chan@kenchanhp·
Honored to receive the Outstanding Senior Area Chair Award at AACL 2025. Sincere thanks to the selection committee and our wonderful NLP community 🙏
Hou Pong (Ken) Chan tweet media
English
1
0
16
422
Hou Pong (Ken) Chan retweetet
Runzhe Zhan
Runzhe Zhan@rzzhan_ovo·
thrilled to have ExGRPO accepted to #ICLR2026! kudos to yafu for the 6-paper sweep! See you in Brazil, looking forward to discussing everything!
Yafu Li@yafuly

Excited to have 6 papers accepted to #ICLR2026, all around reasoning, RL, and multimodal understanding: 📌ExGRPO: Learning to Reason from Prior Successes 📌Diversity-Incentivized Exploration for Versatile Reasoning 📌Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models 📌Spotlight on Token Perception for Multimodal RL 📌Revisual-R1: Advancing Multimodal Reasoning from Optimized Cold Start to Staged RL 📌FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting 💻All works are open-sourced — welcome discussions, feedback, and collaborations! Huge thanks to all collaborators. Looking forward to great discussions at ICLR! @iclr_conf #iclr

English
0
2
24
2.2K