HKUST NLP (@hkustNLP) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

HKUST NLP@hkustNLP·1 Haz

Great work! @ZhaoweiWang4

AK@_akhaliq

MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

English

0

2

526

HKUST NLP retweetledi

Zhaowei Wang@ZhaoweiWang4·2 Ağu

Check out our new work on multimodal long-context!

Wenhao Yu@wyu_nd

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs. 📊 13,331 examples across 5 tasks: – Visual RAG – Many-shot ICL – Needle-in-a-haystack – VL Summarization – Long-document VQA 📏 Lengths: 8 / 16 / 32 / 64 / 128K 🔍 Benchmarking both thoroughly & effectively!

English

0

3

9

1.6K

HKUST NLP retweetledi

IJCAIconf@IJCAIconf·22 Ağu

How far are we from Artificial General Intelligence—and what might follow as Artificial Superintelligence? IJCAI Closing Panel opened by James Kwok, #IJCAI2025 Programe Chair. share.google/DbB2Ky20bCc3B0… #Montreal #AI

English

1

4

9

1.7K

HKUST NLP retweetledi

Yi R. (May) Fung@May_F1_·21 Ağu

🎉 Congrats to our students on the new #EMNLP2025 paper acceptances! 1⃣ LLM Natural-Formal Hybrid Reasoning arxiv.org/abs/2505.23703 2⃣ Text-Instructed Image Editing on Medical Domain arxiv.org/abs/2506.01921 3⃣ Knowledge Boundary Aware Multi-Compositional Reasoning arxiv.org/abs/2504.21773 4⃣ End-to-End Optimized Multimodal RewardRAG

English

0

16

84

6.6K

HKUST NLP retweetledi

Hongru Wang@HongruWang007·14 Ağu

Actually, we implemented this kind of capability in our AdaCtrl paper three months ago by injecting difficult-aware tags (i.e., easy, hard, adaptive) to trigger different reasoning behaviors of LLMs. Paper: arxiv.org/pdf/2505.18822

Sam Altman@sama

Updates to ChatGPT: You can now choose between “Auto”, “Fast”, and “Thinking” for GPT-5. Most users will want Auto, but the additional control will be useful for some people. Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking mini after that limit. Context limit for GPT-5 Thinking is 196k tokens. We may have to update rate limits over time depending on usage. 4o is back in the model picker for all paid users by default. If we ever do deprecate it, we will give plenty of notice. Paid users also now have a “Show additional models” toggle in ChatGPT web settings which will add models like o3, 4.1, and GPT-5 Thinking mini. 4.5 is only available to Pro users—it costs a lot of GPUs. We are working on an update to GPT-5’s personality which should feel warmer than the current personality but not as annoying (to most users) as GPT-4o. However, one learning for us from the past few days is we really just need to get to a world with more per-user customization of model personality.

English

0

5

9

1.4K

HKUST NLP retweetledi

Zhitao He@ZhouHE777·29 Tem

Excited to present our work, MMBoundary, at #ACL2025! Come chat with us at our poster session! 📍 Hall 4/5, Session 12: Poster Session 4 🗓️ Wednesday, July 30 ⏰ 11:00-12:30

English

3

9

19

1.7K

HKUST NLP retweetledi

Yangqiu Song@yqsong·28 Tem

If you are at ACL, please talk to our members of @HKUSTKnowComp and @hkustNLP big family

English

0

3

51

2.2K

HKUST NLP retweetledi

Zhitao He@ZhouHE777·30 Haz

🤯 Multimodal LLMs can be confidently wrong. A single early mistake in perception can lead to a completely incorrect answer. 🚀Introducing our work, MMBoundary, a new framework to make MLLMs aware of their own knowledge boundaries! 🧵 Paper:arxiv.org/abs/2505.23224 #ACL2025

English

0

4

12

1.3K

HKUST NLP retweetledi

Jianshu Zhang@SterZhang·25 Tem

Find us at the poster booth ✨Wed 7/30 11am Hall 4/5 ✨

Yi R. (May) Fung@May_F1_

[1/n] "𝘔𝘢𝘵𝘤𝘩𝘪𝘯𝘨 𝘤𝘶𝘦𝘴 𝘧𝘰𝘳 𝘪𝘥𝘦𝘯𝘵𝘪𝘤𝘢𝘭 𝘰𝘣𝘫𝘦𝘤𝘵𝘴, 𝘥𝘪𝘴𝘵𝘪𝘯𝘤𝘵 𝘢𝘵𝘵𝘳𝘪𝘣𝘶𝘵𝘦𝘴 𝘧𝘰𝘳 𝘶𝘯𝘪𝘲𝘶𝘦 𝘰𝘯𝘦𝘴." Such 𝙘𝙧𝙤𝙨𝙨-𝙘𝙤𝙣𝙩𝙚𝙭𝙩 𝙫𝙞𝙨𝙪𝙖𝙡 𝙧𝙚𝙖𝙨𝙤𝙣𝙞𝙣𝙜 is extremely simple and straightforward for the human cognitive process, but 𝗾𝘂𝗶𝘁𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗶𝗻𝗴 𝗳𝗼𝗿 𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝗹𝗮𝗿𝗴𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 (𝗟𝗩𝗟𝗠𝘀), especially across multiple images and videos ‼️ 𝙒𝙝𝙮, and 𝙝𝙤𝙬 𝙘𝙖𝙣 𝙬𝙚 𝙞𝙢𝙥𝙧𝙤𝙫𝙚 upon this major shortcoming in future research? 🔍 🔥🚀 Check out our latest release, “𝗩𝗟𝗠𝟮 -𝗕𝗲𝗻𝗰𝗵: 𝗔 𝗖𝗹𝗼𝘀𝗲𝗿 𝗟𝗼𝗼𝗸 𝗮𝘁 𝗛𝗼𝘄 𝗪𝗲𝗹𝗹 𝗩𝗟𝗠𝘀 𝗜𝗺𝗽𝗹𝗶𝗰𝗶𝘁𝗹𝘆 𝗟𝗶𝗻𝗸 𝗘𝘅𝗽𝗹𝗶𝗰𝗶𝘁 𝗠𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝗩𝗶𝘀𝘂𝗮𝗹 𝗖𝘂𝗲𝘀”! 🚀🔥 🔗 Project page: vlm2-bench.github.io 📑 Paper: arxiv.org/abs/2502.12084 💻 Github: github.com/vlm2-bench/VLM… 🤗 Huggingface: huggingface.co/datasets/Sterz… --- Work led by our amazing team of students Jianshu, Dongyu, and Renjie at The Hong Kong University of Science and Technology. Find us at the poster booth ✨Wed 7/30 11am Hall 4/5 ✨

English

0

5

10

872

HKUST NLP retweetledi

Yi R. (May) Fung@May_F1_·25 Tem

[1/n] "𝘔𝘢𝘵𝘤𝘩𝘪𝘯𝘨 𝘤𝘶𝘦𝘴 𝘧𝘰𝘳 𝘪𝘥𝘦𝘯𝘵𝘪𝘤𝘢𝘭 𝘰𝘣𝘫𝘦𝘤𝘵𝘴, 𝘥𝘪𝘴𝘵𝘪𝘯𝘤𝘵 𝘢𝘵𝘵𝘳𝘪𝘣𝘶𝘵𝘦𝘴 𝘧𝘰𝘳 𝘶𝘯𝘪𝘲𝘶𝘦 𝘰𝘯𝘦𝘴." Such 𝙘𝙧𝙤𝙨𝙨-𝙘𝙤𝙣𝙩𝙚𝙭𝙩 𝙫𝙞𝙨𝙪𝙖𝙡 𝙧𝙚𝙖𝙨𝙤𝙣𝙞𝙣𝙜 is extremely simple and straightforward for the human cognitive process, but 𝗾𝘂𝗶𝘁𝗲 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗶𝗻𝗴 𝗳𝗼𝗿 𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝗹𝗮𝗿𝗴𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 (𝗟𝗩𝗟𝗠𝘀), especially across multiple images and videos ‼️ 𝙒𝙝𝙮, and 𝙝𝙤𝙬 𝙘𝙖𝙣 𝙬𝙚 𝙞𝙢𝙥𝙧𝙤𝙫𝙚 upon this major shortcoming in future research? 🔍 🔥🚀 Check out our latest release, “𝗩𝗟𝗠𝟮 -𝗕𝗲𝗻𝗰𝗵: 𝗔 𝗖𝗹𝗼𝘀𝗲𝗿 𝗟𝗼𝗼𝗸 𝗮𝘁 𝗛𝗼𝘄 𝗪𝗲𝗹𝗹 𝗩𝗟𝗠𝘀 𝗜𝗺𝗽𝗹𝗶𝗰𝗶𝘁𝗹𝘆 𝗟𝗶𝗻𝗸 𝗘𝘅𝗽𝗹𝗶𝗰𝗶𝘁 𝗠𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝗩𝗶𝘀𝘂𝗮𝗹 𝗖𝘂𝗲𝘀”! 🚀🔥 🔗 Project page: vlm2-bench.github.io 📑 Paper: arxiv.org/abs/2502.12084 💻 Github: github.com/vlm2-bench/VLM… 🤗 Huggingface: huggingface.co/datasets/Sterz… --- Work led by our amazing team of students Jianshu, Dongyu, and Renjie at The Hong Kong University of Science and Technology. Find us at the poster booth ✨Wed 7/30 11am Hall 4/5 ✨

English

0

4

8

1.4K

HKUST NLP retweetledi

ACL Mentorship@aclmentorship·24 Tem

📢 Join us for the ACL Mentorship Session @aclmeeting #ACL2025NLP #NLProc • Session Link: mentorship.aclweb.org/schedule • Ask Questions: tinyurl.com/y2v2j462 Mentors: • @May_F1_ (@hkust) • @d_aumiller (@cohere) • @vernadankers (@Mila_Quebec) • @ziqiao_ma (@UMichCSE) • @WeijiaShi2 (@uwnlp) • @ZhijingJin (@UofTCompSci) • @OanaIgnatRo (@SantaClaraUniv)

English

0

18

47

14.3K

HKUST NLP retweetledi

Yi R. (May) Fung@May_F1_·24 Tem

Heading out to #ACL2025 in Vienna with six main/finding papers to present! 🇦🇹✈️🤩 Would love to chat about research on multimodal model reasoning and agent, as well as opportunities in my group @hkustNLP. Please DM if you'd like to meet!

English

1

7

35

2.7K

HKUST NLP retweetledi

Zeyu Qin@ZeyuQin_alan·8 Tem

#COLM2025 Our work has been accepted to COLM 2025😊 Looking forward to discussing Scalable Oversight and Synthetic Data with old and new friends in Montréal @COLM_conf ！

Yi R. (May) Fung@May_F1_

🚀 Data to pre-train LLMs on are reaching critical bottleneck. 𝘿𝙤𝙚𝙨 𝙢𝙤𝙙𝙚𝙡-𝙜𝙚𝙣𝙚𝙧𝙖𝙩𝙚𝙙 𝙨𝙮𝙣𝙩𝙝𝙚𝙩𝙞𝙘 𝙙𝙖𝙩𝙖 𝙬𝙤𝙧𝙠 𝙨𝙞𝙢𝙞𝙡𝙖𝙧𝙡𝙮 𝙬𝙚𝙡𝙡 𝙛𝙤𝙧 𝙨𝙘𝙖𝙡𝙞𝙣𝙜 𝙥𝙧𝙚-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙛𝙪𝙧𝙩𝙝𝙚𝙧? Let's dive into the "𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗮𝘄𝘀 𝗼𝗳 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗳𝗼𝗿 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀" together! Several key findings: ⭐️ 𝗚𝗿𝗮𝗽𝗵-𝗯𝗮𝘀𝗲𝗱 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 & 𝗿𝗲𝗰𝗼𝗺𝗯𝗶𝗻𝗮𝘁𝗶𝗼𝗻 of concepts across docs transforms pre-training corpora into 𝘥𝘪𝘷𝘦𝘳𝘴𝘦, 𝘩𝘪𝘨𝘩-𝘲𝘶𝘢𝘭𝘪𝘵𝘺 𝘴𝘺𝘯𝘵𝘩𝘦𝘵𝘪𝘤 𝘥𝘢𝘵𝘢𝘴𝘦𝘵𝘴; ⭐️𝗣𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝘀𝘆𝗻𝘁𝗵𝗲𝘀𝗶𝘇𝗲𝗱 𝗱𝗮𝘁𝗮 reliably follows 𝘳𝘦𝘤𝘵𝘪𝘧𝘪𝘦𝘥 𝘴𝘤𝘢𝘭𝘪𝘯𝘨 𝘭𝘢𝘸 across various model sizes; ⭐️ 𝗟𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 approach optimal performance with 𝙛𝙚𝙬𝙚𝙧 𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙩𝙤𝙠𝙚𝙣𝙨; ⭐️ 𝘿𝙞𝙢𝙞𝙣𝙞𝙨𝙝𝙞𝙣𝙜 𝙧𝙚𝙩𝙪𝙧𝙣𝙨 start to kick in near 300B tokens. 👇Check preprint for further details~ 🔗 arxiv.org/html/2503.1955…

English

0

6

25

2.2K

HKUST NLP retweetledi

Zhaochen Su@SuZhaochen0110·2 Tem

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

English

7

64

159

16.1K

HKUST NLP retweetledi

Yangqiu Song@yqsong·6 Haz

Thrilled to share a major milestone: the culmination of a 15-month project, ATLAS—a new benchmark in event graphs and conceptualization! This journey began with Probase in 2012, evolved through ASER (2019), AbstractATOMIC (2022), and AbsPyramid (2023), and now realizes a decade-long dream of building a web-scale knowledge graph integrating entities, events, and a conceptualization framework. It’s also a tribute to Marvin Minsky’s K-Lines Theory. I still vividly recall 2012 when I worked with Haixun, including intern Fangting Xia, spent months exploring event semantic primitives and conceptualization, only to pivot due to limitations. Fast forward to the era of large language models, and we’ve finally brought this vision to life! Someone argued that knowledge graphs are obsolete in the age of large models. We disagree—it’s about having high-quality graphs that scale with these models. That’s why we invested HK$1.5M to process Wikipedia, academic abstracts, and 3% of web data from Dolma, creating a 1B-node graph—the best we could achieve. While entities cover 70% of factual queries, adding event semantic primitives boosts coverage to 90%. This is why text decomposition is gaining traction. Our extracted knowledge graph excels on complex factual queries like HotpotQA, MMLU, and FELM, making it the largest open-source GraphRAG dataset available. This work is the result of 10+ students and nearly 10 collaborators pouring their hearts into it. I’m deeply grateful for their contributions to this meaningful milestone, and I hope they’ve gained as much from the journey as I have. Big shoutout to anyone curious about pushing this further! Parsing the entire web’s data would cost ~HK$40M, and I’d love to collaborate with bold thinkers to make it happen. Paper: arxiv.org/abs/2505.23628 Code & Resources: github.com/HKUST-KnowComp… Project: pypi.org/project/atlas-… Feedback and critiques are welcome! Let’s keep advancing the field together. #KnowledgeGraphs #ATLAS #GraphRAG

English

2

7

41

4.3K

HKUST NLP retweetledi

Junxian He@junxian_he·2 Haz

We studied both rule-based and model-based verifiers and found that each has unique limitations. Rule-based verifiers are often unreliable, even in math, and are unavailable in many domains. Model-based verifiers can be easily hacked. In our paper, we construct simple adversarial patterns to successfully crack several open model-based verifiers. 🧐Interestingly, as the community shifts toward generative verifiers, we found them to be far more vulnerable than discriminative ones. This suggests that discriminative verifiers might be more robust against reward hacking in RLVR.

Yuzhen Huang@yuzhenh17

🔍 Are Verifiers Trustworthy in RLVR? Our paper, Pitfalls of Rule- and Model-based Verifiers, exposes the critical flaws in reinforcement learning verification for mathematical reasoning. 🔑 Key findings: 1️⃣ Rule-based verifiers miss correct answers, especially when presented in different formats. 2️⃣ Model-based verifiers are susceptible to reward hacking, undermining RL outcomes. 3️⃣ A probe study reveals that most model-based verifiers, particularly generative ones (like those using chain-of-thought reasoning), are highly vulnerable to adversarial attacks. 📄Paper: arxiv.org/abs/2505.22203 💻Code & Model & Dataset: github.com/hkust-nlp/RL-V… Special thanks to my amazing collaborators @AndrewZeng17, Xingshan Zeng, and Qi Zhu, and to our advisor @junxian_he! 🥰

English

2

17

137

15.6K

HKUST NLP retweetledi

Yi R. (May) Fung@May_F1_·1 Haz

Great to see the wonderful series of work that @WangCarrey has been leading at UIUC. We also had a fun collaboration recently together with my incoming PhD student Shijue. Check out our latest release 𝘈𝘥𝘢𝘊𝘵𝘳𝘭: 𝘈𝘥𝘢𝘱𝘵𝘪𝘷𝘦 𝘢𝘯𝘥 𝘊𝘰𝘯𝘵𝘳𝘰𝘭𝘭𝘢𝘣𝘭𝘦 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘷𝘪𝘢 𝘋𝘪𝘧𝘧𝘪𝘤𝘶𝘭𝘵𝘺-𝘈𝘸𝘢𝘳𝘦 𝘉𝘶𝘥𝘨𝘦𝘵𝘪𝘯𝘨 🔥🔥🔥 . . . arxiv.org/pdf/2505.18822

Hongru Wang@HongruWang007

Time flies. My visiting at UIUC is coming to an end. I’m sincerely grateful to Prof. Heng Ji @hengjinlp for the invaluable opportunity to join such a warm, collaborative, and world-leading research group. I’ve learned so much from Prof. Heng and lots of amazing students here @qiancheng1231 @xiusi_chen @emrecanacikgoz @BowenJin13 @zhenhailongW @Yuji_Zhang_NLP @Glaciohound. This experience will always be a defining chapter of my Ph.D. life, filled with friendships, collaborations, and unforgettable moments. Looking back, we truly did something together for the world — from SMART to ToolRL, OTC, RM-R1, ModelingAgent, and DecisionFlow, with even more impactful work on the way. I’ll always be cheering for BlenderLab and looking forward to what’s next!

English

2

8

22

8.3K

HKUST NLP retweetledi

Wenhao Yu@wyu_nd·20 May

🚀 We release MMLongBench: Benchmark for evaluating long-context VLMs. 📊 13,331 examples across 5 tasks: – Visual RAG – Many-shot ICL – Needle-in-a-haystack – VL Summarization – Long-document VQA 📏 Lengths: 8 / 16 / 32 / 64 / 128K 🔍 Benchmarking both thoroughly & effectively!