XintongNLPer

144 posts

XintongNLPer

@XintongNLPer

PhD Candidate at Universität Hamburg @unihh, Deutschland. I work on #NLProc.

Germany Katılım Temmuz 2017

858 Takip Edilen208 Takipçiler

XintongNLPer@XintongNLPer·3h

Multimodal and Industrial AI team🚀🚀🚀 LLM's answer is only useful if it survives strict standards checks. Partial correctness can mask safety-critical contradictions. Data: huggingface.co/datasets/aliba… Code: github.com/orgs/alibaba-m… Paper: arxiv.org/abs/2605.10267

Liam Liang Ding@liangdingNLP

1/6) Excited to share our latest work from the Multimodal and Industrial AI team at Alibaba: IndustryBench! 🚀⚙️ In industrial procurement, an LLM's answer is only useful if it survives strict standards checks. Partial correctness can mask safety-critical contradictions. Check out the full paper for deep dives into capability dimensions and model comparisons! Feedback and PRs are highly welcome. 👇 Data: huggingface.co/datasets/aliba… Code: github.com/orgs/alibaba-m… Paper: arxiv.org/abs/2605.10267 #Alibaba #Gemini #Qwen #GPT #Claude #Kimi #GLM #Mimimax

English

XintongNLPer retweetledi

Akshay 🚀@akshay_pachaar·10 Haz

Top 50 LLM Interview Questions. A great resource to learn LLM basics:

English

290

232K

XintongNLPer@XintongNLPer·16 May

@Chenyang_Lyu Thank you, Chenyang!🥳

English

115

Chenyang Lyu@Chenyang_Lyu·16 May

@XintongNLPer niubility 🐮

Polski

164

XintongNLPer@XintongNLPer·16 May

@LyxTg Thank you, Yunxin!🥰

English

Yunxin Li@LyxTg·16 May

@XintongNLPer Big congrats 🎉🎊

English

135

XintongNLPer@XintongNLPer·26 Nis

Nice work👍

Yunxin Li@LyxTg

🎉 Exciting news! Welcome to CulturalLingo, our new VideoVista series that bridges cultures (China, North America, and Europe), languages, and domains (140+) in video comprehension. Join us on this journey of video understanding! 🌍📽️ Link of paper: researchgate.net/publication/39…

English

284

XintongNLPer retweetledi

Longyue Wang@wangly0229·10 Nis

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly challenging! 🔥 #ImageGeneration #AutoEvaluation #Multimodal #GPT4O

English

3.7K

XintongNLPer retweetledi

Zain@ZainHasan6·11 Mar

arxiv.org/abs/2503.06072

ZXX

2.8K

XintongNLPer retweetledi

elvis@omarsar0·3 Mar

A Deep Dive into Reasoning LLMs This is a really nice summary of the progress made in post-training and reasoning LLMs. Highly recommend this one!

English

560

244.9K

XintongNLPer retweetledi

Bowen Jin@BowenJin13·28 Şub

🚀 Introducing 𝗦𝗲𝗮𝗿𝗰𝗵-𝗥𝟭 – the first 𝗿𝗲𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗼𝗳 𝗗𝗲𝗲𝗽𝘀𝗲𝗲𝗸-𝗥𝟭 (𝘇𝗲𝗿𝗼) for training reasoning and search-augmented LLM agents with reinforcement learning! This is a step towards training an 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗢𝗽𝗲𝗻𝗔𝗜 “𝗗𝗲𝗲𝗽 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵” via RL. Our 𝟯𝗕 𝗯𝗮𝘀𝗲 𝗟𝗟𝗠𝘀—including not just 𝗤𝘄𝗲𝗻 𝟮.𝟱 but also 𝗟𝗹𝗮𝗺𝗮 𝟯.𝟮—learn to 𝗿𝗲𝗮𝘀𝗼𝗻 and 𝗰𝗮𝗹𝗹 𝘀𝗲𝗮𝗿𝗰𝗵 𝗲𝗻𝗴𝗶𝗻𝗲𝘀 all on their own! Everything will be 𝗳𝘂𝗹𝗹𝘆 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲. Stay tuned! Code: github.com/PeterGriffinJi… Experimental logs: wandb.ai/peterjin/Searc… #R1 #deepresearch #deepseek

English

318

2.5K

364K

XintongNLPer retweetledi

Zhijiang Guo@ZhijiangG·26 Şub

🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…

English

158

12.4K

XintongNLPer retweetledi

Nathan Lambert@natolambert·2 Şub

#reinforce" target="_blank" rel="nofollow noopener">rlhfbook.com/c/11-policy-gr…

ZXX

4.8K

XintongNLPer retweetledi

Yuhui Zhang@Zhang_Yu_hui·7 Oca

🔍 Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions! Key findings: 1️⃣ Current open-ended VQA eval methods are flawed: rule-based metrics correlate poorly with true performance (0.09 on VQAv2), while model-based eval has reproducibility issues (updates in GPT-4o versions constantly increase scores by 6% on MMVet). 2️⃣ To address this challenge, we propose AutoConverter, an agentic framework that automatically converts open-ended VQA to multiple-choice questions. It generates distractors matching/exceeding human difficulty, with only 3% of generated questions incorrect. 3️⃣ Using AutoConverter, we built VMCBench: 9,018 multiple-choice questions from 20 datasets testing 33 VLMs in a unified format! 🎯 Our goal: Make VLM evaluation more reliable, efficient & scalable yuhui-zh15.github.io/AutoConverter-… Joint work with a really fantastic team: @hhhhh2033528 (co-lead) @leoliuym @XiaohanWang96 @jmhb0 @elaine__sui @ChenyuW64562111 @AkliluJosiah2 @Ale9806_ @anjiangw advised by @lschmidt3 @yeung_levy!

English

153

33.1K

XintongNLPer retweetledi

Nathan Lambert@natolambert·10 Ara

Here are the slides for our language modeling tutorial with @kylelostat and @AkshitaB93 in west ballroom b (ongoing). docs.google.com/presentation/d…

English

439

35.8K

XintongNLPer@XintongNLPer·6 Ara

👉 Incredible work! 🎉 Excited to see such significant advancements in multilingual capabilities, effectively bridging the gap between high- and low-resource languages. 🚀

Longyue Wang@wangly0229

🎯Excited to share our latest research on Marco-LLM! 🌍We aim to improve the multilingual capabilities of large language models across different languages and cultures. 💡Paper: arxiv.org/abs/2412.04003

English

188

XintongNLPer retweetledi

Yu Su@ysu_nlp·13 Kas

Sharing the slides of my talk at Princeton yesterday--"A holistic and critical look at language agents": ysu1989.github.io/resources/lang… LLM-based language agents are exciting, but it's also undeniably a quite chaotic space: are agents the next big thing, or are they just thin wrappers around LLMs? I have been giving this talk 10+ times this year (at CMU/Stanford/Apple/Amazon/etc.), hoping to bring some scientific rigor to this emerging topic. I also learned and sharpened my thinking in this process. Finally, I feel comfortable sharing a close-to-final version with everyone. Comments are welcome! In this 76-page deck, I talk about 1. the definition of language agents (and why that's the best name) 2. the evolution of AI agents 3. the power of language in agents, demonstrated through our latest work on memory (HippoRAG), world models and model-based planning (WebDreamer), grounding (UGround), and tool use (STE) 4. Exciting future directions (planning, synthetic data, multimodal perception, continual learning, and safety) 🧵

English

118

507

89.6K

XintongNLPer retweetledi

Zheng Zhao@zhengzhao97·1 Kas

[1/5] Super excited to share our paper "Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models" which has been accepted to EMNLP2024! #NLProc #EMNLP2024 📄arxiv.org/pdf/2410.20008

English

7.2K

XintongNLPer retweetledi

Zhengzhong Tu@_vztu·5 Kas

🚨Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework 🚀𝐀𝐛𝐬: arxiv.org/abs/2411.01639 A new framework for handling uncertainty in multimodal foundation models, enhancing robot planning reliability! 🤖🚗 💡The Challenge: Current models struggle with unpredictable environments, as they can’t accurately separate perception and decision uncertainties. This limits their effectiveness in real-world robotics and autonomous driving. 🔍The Approach: • Uncertainty Disentanglement: Isolates perception uncertainty (visual recognition) and decision uncertainty (planning reliability). • Targeted Quantification: Uses conformal prediction for perception and Formal-Methods-Driven Prediction (FMDP) for decision-making. • Active Sensing: Dynamically re-observes high-uncertainty scenes to improve visual input. • Automated Refinement: Fine-tunes model with high-certainty data, boosting consistency. 🔧Results: Reduces output variability by up to 40% and enhances task success rates by 5%, showcasing how uncertainty disentanglement can significantly improve model robustness. #AI #Robotics #MachineLearning #Uncertainty #AutonomousSystems #UTAustin #TAMU #ML #AutonomousDriving

English

115

7.6K

XintongNLPer retweetledi

Yftah Ziser@YftahZ·2 Kas

Excited to share our new EMNLP paper! 📄 We uncover that different LLM layers naturally play distinct roles in multitask learning pipelines. 🧠 Layers contribute as both shared and task-specific components—without explicit parameter assignment! - arxiv.org/pdf/2410.20008

English

5.4K

XintongNLPer retweetledi

Wenhao Yu@wyu_nd·31 Eki

🥳 We open-sourced Leopard-Instruct, a dataset containing 𝟏𝐌 𝐡𝐢𝐠𝐡-𝐪𝐮𝐚𝐥𝐢𝐭𝐲, 𝐭𝐞𝐱𝐭-𝐫𝐢𝐜𝐡, 𝐦𝐮𝐥𝐭𝐢-𝐢𝐦𝐚𝐠𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧-𝐭𝐮𝐧𝐢𝐧𝐠 examples. It significantly improves performance on multi-image understanding! Github: github.com/tencent-ailab/… Paper: arxiv.org/abs/2410.01744 Huggingface dataset: huggingface.co/datasets/wyu1/…

English

191

15.6K

XintongNLPer retweetledi

Zining Zhu@zhuzining·27 Eki

Next semester I'm launching a new seminar course at @FollowStevens: Explainable Natural Language Processing ziningzhu.github.io/ExplainableNLP/

English

5.1K

Keşfet

@Chenyang_Lyu @LyxTg @leoliuym @XiaohanWang96 @jmhb0 @elaine__sui @ChenyuW64562111 @AkliluJosiah2