XintongNLPer

144 posts

XintongNLPer banner
XintongNLPer

XintongNLPer

@XintongNLPer

PhD Candidate at Universität Hamburg @unihh, Deutschland. I work on #NLProc.

Germany Katılım Temmuz 2017
858 Takip Edilen208 Takipçiler
XintongNLPer retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Top 50 LLM Interview Questions. A great resource to learn LLM basics:
Akshay 🚀 tweet media
English
9
290
2K
232K
XintongNLPer retweetledi
Longyue Wang
Longyue Wang@wangly0229·
🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly challenging! 🔥 #ImageGeneration #AutoEvaluation #Multimodal #GPT4O
Longyue Wang tweet media
English
2
21
49
3.7K
XintongNLPer retweetledi
elvis
elvis@omarsar0·
A Deep Dive into Reasoning LLMs This is a really nice summary of the progress made in post-training and reasoning LLMs. Highly recommend this one!
elvis tweet media
English
17
560
3K
244.9K
XintongNLPer retweetledi
Bowen Jin
Bowen Jin@BowenJin13·
🚀 Introducing 𝗦𝗲𝗮𝗿𝗰𝗵-𝗥𝟭 – the first 𝗿𝗲𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗼𝗳 𝗗𝗲𝗲𝗽𝘀𝗲𝗲𝗸-𝗥𝟭 (𝘇𝗲𝗿𝗼) for training reasoning and search-augmented LLM agents with reinforcement learning! This is a step towards training an 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗢𝗽𝗲𝗻𝗔𝗜 “𝗗𝗲𝗲𝗽 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵” via RL. Our 𝟯𝗕 𝗯𝗮𝘀𝗲 𝗟𝗟𝗠𝘀—including not just 𝗤𝘄𝗲𝗻 𝟮.𝟱 but also 𝗟𝗹𝗮𝗺𝗮 𝟯.𝟮—learn to 𝗿𝗲𝗮𝘀𝗼𝗻 and 𝗰𝗮𝗹𝗹 𝘀𝗲𝗮𝗿𝗰𝗵 𝗲𝗻𝗴𝗶𝗻𝗲𝘀 all on their own! Everything will be 𝗳𝘂𝗹𝗹𝘆 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲. Stay tuned! Code: github.com/PeterGriffinJi… Experimental logs: wandb.ai/peterjin/Searc… #R1 #deepresearch #deepseek
English
42
318
2.5K
364K
XintongNLPer retweetledi
Zhijiang Guo
Zhijiang Guo@ZhijiangG·
🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: arxiv.org/pdf/2502.17419 Github: github.com/zzli2022/Aweso…
Zhijiang Guo tweet media
English
2
62
158
12.4K
XintongNLPer retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
#reinforce" target="_blank" rel="nofollow noopener">rlhfbook.com/c/11-policy-gr…
ZXX
1
5
40
4.8K
XintongNLPer retweetledi
Yuhui Zhang
Yuhui Zhang@Zhang_Yu_hui·
🔍 Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions! Key findings: 1️⃣ Current open-ended VQA eval methods are flawed: rule-based metrics correlate poorly with true performance (0.09 on VQAv2), while model-based eval has reproducibility issues (updates in GPT-4o versions constantly increase scores by 6% on MMVet). 2️⃣ To address this challenge, we propose AutoConverter, an agentic framework that automatically converts open-ended VQA to multiple-choice questions. It generates distractors matching/exceeding human difficulty, with only 3% of generated questions incorrect. 3️⃣ Using AutoConverter, we built VMCBench: 9,018 multiple-choice questions from 20 datasets testing 33 VLMs in a unified format! 🎯 Our goal: Make VLM evaluation more reliable, efficient & scalable yuhui-zh15.github.io/AutoConverter-… Joint work with a really fantastic team: @hhhhh2033528 (co-lead) @leoliuym @XiaohanWang96 @jmhb0 @elaine__sui @ChenyuW64562111 @AkliluJosiah2 @Ale9806_ @anjiangw advised by @lschmidt3 @yeung_levy!
Yuhui Zhang tweet media
English
3
67
153
33.1K
XintongNLPer retweetledi
Yu Su
Yu Su@ysu_nlp·
Sharing the slides of my talk at Princeton yesterday--"A holistic and critical look at language agents": ysu1989.github.io/resources/lang… LLM-based language agents are exciting, but it's also undeniably a quite chaotic space: are agents the next big thing, or are they just thin wrappers around LLMs? I have been giving this talk 10+ times this year (at CMU/Stanford/Apple/Amazon/etc.), hoping to bring some scientific rigor to this emerging topic. I also learned and sharpened my thinking in this process. Finally, I feel comfortable sharing a close-to-final version with everyone. Comments are welcome! In this 76-page deck, I talk about 1. the definition of language agents (and why that's the best name) 2. the evolution of AI agents 3. the power of language in agents, demonstrated through our latest work on memory (HippoRAG), world models and model-based planning (WebDreamer), grounding (UGround), and tool use (STE) 4. Exciting future directions (planning, synthetic data, multimodal perception, continual learning, and safety) 🧵
Yu Su tweet media
English
16
118
507
89.6K
XintongNLPer retweetledi
Zheng Zhao
Zheng Zhao@zhengzhao97·
[1/5] Super excited to share our paper "Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models" which has been accepted to EMNLP2024! #NLProc #EMNLP2024 📄arxiv.org/pdf/2410.20008
Zheng Zhao tweet media
English
4
13
60
7.2K
XintongNLPer retweetledi
Zhengzhong Tu
Zhengzhong Tu@_vztu·
🚨Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework 🚀𝐀𝐛𝐬: arxiv.org/abs/2411.01639 A new framework for handling uncertainty in multimodal foundation models, enhancing robot planning reliability! 🤖🚗 💡The Challenge: Current models struggle with unpredictable environments, as they can’t accurately separate perception and decision uncertainties. This limits their effectiveness in real-world robotics and autonomous driving. 🔍The Approach: • Uncertainty Disentanglement: Isolates perception uncertainty (visual recognition) and decision uncertainty (planning reliability). • Targeted Quantification: Uses conformal prediction for perception and Formal-Methods-Driven Prediction (FMDP) for decision-making. • Active Sensing: Dynamically re-observes high-uncertainty scenes to improve visual input. • Automated Refinement: Fine-tunes model with high-certainty data, boosting consistency. 🔧Results: Reduces output variability by up to 40% and enhances task success rates by 5%, showcasing how uncertainty disentanglement can significantly improve model robustness. #AI #Robotics #MachineLearning #Uncertainty #AutonomousSystems #UTAustin #TAMU #ML #AutonomousDriving
Zhengzhong Tu tweet media
English
1
29
115
7.6K
XintongNLPer retweetledi
Yftah Ziser
Yftah Ziser@YftahZ·
Excited to share our new EMNLP paper! 📄 We uncover that different LLM layers naturally play distinct roles in multitask learning pipelines. 🧠 Layers contribute as both shared and task-specific components—without explicit parameter assignment! - arxiv.org/pdf/2410.20008
Yftah Ziser tweet media
English
1
5
43
5.4K
XintongNLPer retweetledi
Wenhao Yu
Wenhao Yu@wyu_nd·
🥳 We open-sourced Leopard-Instruct, a dataset containing 𝟏𝐌 𝐡𝐢𝐠𝐡-𝐪𝐮𝐚𝐥𝐢𝐭𝐲, 𝐭𝐞𝐱𝐭-𝐫𝐢𝐜𝐡, 𝐦𝐮𝐥𝐭𝐢-𝐢𝐦𝐚𝐠𝐞 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧-𝐭𝐮𝐧𝐢𝐧𝐠 examples. It significantly improves performance on multi-image understanding! Github: github.com/tencent-ailab/… Paper: arxiv.org/abs/2410.01744 Huggingface dataset: huggingface.co/datasets/wyu1/…
Wenhao Yu tweet media
English
1
47
191
15.6K