Jielin Qiu

32 posts

Jielin Qiu

@_Jason_Q

Research Scientist @Salesforce AI Research, Ph.D. from @SCSatCMU

Carnegie Mellon University Katılım Ocak 2021

177 Takip Edilen72 Takipçiler

Jielin Qiu retweetledi

Weiran Yao@iscreamnearby·26 Kas

Today I finally get to share something our team has been quietly grinding on for months – we've created an 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝗼𝗳 Cursor 𝗕𝗲𝗻𝗰𝗵 @cursor_ai . If you’ve been following Cursor’s Composer launch and their internal "Cursor Bench" for testing vibe coding models, you can think of our 𝗟𝗖𝗕𝗔 𝗯𝗲𝗻𝗰𝗵 as the open-source, model-agnostic counterpart. Here is what we provide by @SFResearch . With 𝗟𝗖𝗕𝗔 𝗯𝗲𝗻𝗰𝗵 we: • Ship a 𝗖𝘂𝗿𝘀𝗼𝗿-𝘀𝘁𝘆𝗹𝗲 𝗮𝗴𝗲𝗻𝘁 𝘀𝘁𝗮𝗰𝗸: ReAct loop, semantic @ codebase search, grep, file read/write, refactor tools, and a three-tier memory system inspired by production coding assistants like Cursor. • 𝗧𝗮𝗸𝗲 𝟴,𝟬𝟬𝟬 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘃𝗶𝗯𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝘀𝗰𝗲𝗻𝗮𝗿𝗶𝗼𝘀 and turn them into interactive agent gyms across 10 languages and 10K–1M token codebases. • Let you plug in any model (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro, etc.) and see how it actually behaves on long, messy, multi-turn coding tasks. A few fun findings: Cursor-style agents with context management are surprisingly robust at 1M-token contexts, but there’s a hard trade-off between deep exploration vs. efficiency — no one frontier model sits in the “perfect” top-right corner yet. Anthropic Claude 4.5 and Google Gemini 2.5 pro are at the Pareto Frontier. Everything is open source (agent, code, scenarios, traces, metrics) on @huggingface: 📄 Tech Report: arxiv.org/pdf/2509.09614 🤖 GitHub:github.com/SalesforceAIRe… 🤗 Dataset: huggingface.co/datasets/jason… If you’re building coding agents, benchmarking your model against GPT/Claude/Gemini, or want to train your coding agents with RL in real coding environments, we’d love for you to try LCBA bench, and tell us your findings!

English

509

Jielin Qiu retweetledi

Salesforce AI Research@SFResearch·19 Kas

🚨 Introducing LoCoBench-Agent: a comprehensive benchmark for evaluating LLM agents in long-context software engineering 📄 Paper: bit.ly/49mPrBv 🔗 GitHub: bit.ly/3KbpkTN ✨ Key Features: 🤖 8,000 interactive agent scenarios with multi-turn conversations (up to 50 turns) 🔍 Context lengths: 10K-1M tokens across 10 programming languages ⚡ 9 bias-free evaluation metrics (5 comprehension + 4 efficiency) 🛠️ 8 specialized development tools: file operations, semantic search, grep, code analysis 🎯 8 task categories: architectural understanding, cross-file refactoring, multi-session development, bug investigation, feature implementation, code comprehension, integration testing, and security analysis 🔬 Key Findings: - Fundamental comprehension-efficiency trade-off - Tool usage patterns matter more than raw capabilities - Strategic exploration > exhaustive exploration LoCoBench-Agent assesses agent behavior across extended development sessions, measuring context retention, adaptive strategy refinement, and tool usage efficiency. Authors: Jielin Qiu @Jason_Q, Zuxin Liu @LiuZuxin, Zhiwei Liu @JYJimLiu, Rithesh Murthy @rithesh__rn, Jianguo Zhang @JianguoZhang3, Haolin Chen @HaolinChen11, Shiyu Wang @shiyu04490786, Ming Zhu@ming_zhu0527, Liangwei Yang @Liangwei_Yang, Juntao Tan @chrisjtan, Roshan Ram @shoonyaka1, Akshara Prabhakar @aksh_555, Tulika Awalgaonkar @tulika614, Zixiang Chen @_zxchen_, Zhepeng Cen @ZhepengCen, Cheng Qian @qiancheng1231, Shelby Heinecke @shelbyh_ai, Weiran Yao @iscreamnearby, Silvio Savarese @silviocinguetta, Caiming Xiong @CaimingXiong, Huan Wang @huan__wang #LLM #AIAgents #SoftwareEngineering #MachineLearning #Benchmark #FutureOfAI #EnterpriseAI

English

2.4K

Jielin Qiu retweetledi

Salesforce AI Research@SFResearch·15 Eyl

🚨 Introducing LoCoBench: a comprehensive benchmark for evaluating long-context LLMs in complex software development 📄 Paper: bit.ly/4ponX3P 🔗 GitHub: bit.ly/4pvIfbZ ✨ Key Features: 📊 8,000 evaluation scenarios across 10 programming languages 🔍 Context lengths: 10K-1M tokens (100× variation!) ⚡ 17 evaluation metrics across 4 dimensions (6 newly proposed) 🎯 8 essential task categories: architectural understanding, cross-file refactoring, multi-session development, bug investigation, feature implementation, code comprehension, integration testing, and security analysis Current SOTA models show dramatic performance drops as context increases - highlighting critical gaps in long-context understanding for real-world software engineering. Authors: Jielin Qiu @_Jason_Q, Zuxin Liu @LiuZuxin, Zhiwei Liu @JYJimLiu, Rithesh Murthy @rithesh__rn, Jianguo Zhang @JianguoZhang3, Haolin Chen @HaolinChen11, Shiyu Wang @shiyu04490786, Ming Zhu@ming_zhu0527, Liangwei Yang @Liangwei_Yang, Juntao Tan @chrisjtan, Zhepeng Cen @ZhepengCen, Cheng Qian @qiancheng1231, Shelby Heinecke @shelbyh_ai, Weiran Yao @iscreamnearby, Silvio Savarese @silviocinguetta, Caiming Xiong @CaimingXiong, Huan Wang @huan__wang #LLM #SoftwareEngineering #MachineLearning #Benchmark #FutureOfAI #EnterpriseAI

English

2.3K

Jielin Qiu retweetledi

Ce Zhang@ce_zhang·19 Oca

Excited to see the first paper getting accepted at @DMLRJournal. In the last few months, we are fascinated by the quality of reviews and the engaging interactions between authors and reviewers! Thanks everyone! Please continue to send your best work about Data x ML😀

Journal of Data-centric Machine Learning Research@DMLRJournal

'Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift' by Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li Action Editor: Hongyang Zhang openreview.net/forum?id=Vc1fX… #Multimodal #Robustness #DistributionShift

English

2.3K

Jielin Qiu@_Jason_Q·21 Oca

@JiachengZhu_ML @DMLRJournal @yizhu59 @sxjscience @flwenz @mli65 Thanks, Jiacheng!

English

Jiacheng Zhu@JiachengZhu_ML·20 Oca

@_Jason_Q @DMLRJournal @yizhu59 @sxjscience @flwenz @mli65 Congrats @_Jason_Q ! DMLR is fortunate to host your work!

English

Jielin Qiu@_Jason_Q·19 Oca

🎊Extremely honored to share that our paper on multimodal model robustness has been accepted as the 1st paper for the Journal of Data-centric Machine Learning Research @DMLRJournal With @yizhu59 @sxjscience @flwenz @mli65 #Multimodal #Robustness #DistributionShift

Journal of Data-centric Machine Learning Research@DMLRJournal

English

2.2K

Jielin Qiu@_Jason_Q·20 Oca

@nmervegurel @DMLRJournal @yizhu59 @sxjscience @flwenz @mli65 Thank you very much, Merve!!

English

Nezihe Merve Gürel (nmervegurel.bsky.social)@nmervegurel·20 Oca

@_Jason_Q @DMLRJournal @yizhu59 @sxjscience @flwenz @mli65 Many congratulations!! 🙂

English

Jielin Qiu retweetledi

Danqing Wang@dqwang122·15 Eki

📚🌟 Evaluate any story to your heart's content with our new personalized story evaluation model, PerSE! No more worries about diverse preferences - get your own story evaluation report now! 📝🎯 arxiv.org/abs/2310.03304 1/5

English

19.1K

Jielin Qiu retweetledi

Wenda Xu@WendaXu2·24 May

What is missing in the text generation evaluation for BERTScore, BLERUT, COMET, SEScore & SEScore2? Explanation! Can we build a metric that not only produces a well-correlated quality score but also tell you the rationales, error type, and error location? Checkout InstructScore!

English

15K

Jielin Qiu retweetledi

Danqing Wang@dqwang122·10 Eki

🚀 Excited to share our latest work in EMNLP main conference: "Learning from Mistakes via Interactive Study Assistant for Large Language Models". We introduce a study assistant (SALAM) to conduct thoughtful analysis on LLMs' mistakes and provide guidelines to avoid past mistakes

English

Jielin Qiu retweetledi

Kexun Zhang@kexun_zhang·12 Eki

😭Tired of in-context demos & docs for LLM tool use? 💰Too GPU-poor to tune LLMs for unseen tools? 🤬Frustrated with frequent syntax errors in tool calls? Check out our new preprint 𝐓𝐨𝐨𝐥𝐃𝐞𝐜 that addresses all these issues from the decoding side! arxiv.org/abs/2310.07075 1/5

English

36.2K

Jielin Qiu retweetledi

Seungwhan Shane Moon@shane_moon·29 Eyl

Excited to share our recent work, AnyMAL -- a unified Multimodal LLM built on LLaMA-2 that can reason over various inputs, e.g. images, audio, motion sensors. Check out our paper for more information on the model training, evaluation, safety and more! ➡️ arxiv.org/abs/2309.16058

English

122

22.5K

Jielin Qiu retweetledi

Yi Zhu@yizhu59·18 Ara

Check out our new evaluation benchmarks and metrics for robustness of image-text multimodal models! @AmazonScience #multimodal #stablediffusion

DeepAI@DeepAI

Are Multimodal Models Robust to Image and Text Perturbations? deepai.org/publication/ar… by Jielin Qiu et al. including @yizhu59 #OpenSource #ComputerVision

English

7.4K

Jielin Qiu retweetledi

Santiago@svpino·24 Mar

A topic that comes up in every interview: Bias, variance, and their relationship with machine learning algorithms. Here is a simple summary that you will easily remember. ↓

English

209

964

Jielin Qiu retweetledi

Xin Eric Wang (hiring postdoc)@xwang_lk·25 Mar

Our #ACL2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions" is out (arxiv.org/abs/2203.12667)!!! It serves as a thorough reference for the VLN research community (for both starters and experts). github.com/eric-ai-lab/aw…

English

130

Jielin Qiu retweetledi

Jia-Bin Huang@jbhuang0604·22 Mar

How to present a line plot? Line plots are effective for describing the relationship between two variables of interests. Unfortunately, most junior students would simply copy&paste the figure from the paper in their talk and cause much confusion. 😕 Let's break it down ... 🧵

English

106

547

Jielin Qiu retweetledi

Jiahui Yu@jhyuxm·23 Şub

Our team at Google Brain is looking for outstanding PhD students (expected graduation after 2023) who are interested in student researcher internships this year 2022. careers.google.com/jobs/results/9…

English

Jielin Qiu retweetledi

Ai2@allen_ai·11 Şub

The Embodied AI Lecture Series at AI2 is back! Subscribe to the mailing list for info about how to join these free lectures live, or stay tuned and we'll post the recorded sessions after the fact. Subscribe: allenai.us1.list-manage.com/subscribe?u=61… More info: prior.allenai.org/lectures

English

Jielin Qiu retweetledi

Andrew White 🐦‍⬛@andrewwhite01·10 Şub

I've been writing research articles for over 10 years now and one of the hardest parts is writing consistently and efficiently without procrastinating. I'm going to share some of my tips here 🧵 1/10

English

1.4K

11.5K

Jielin Qiu retweetledi

Ai2@allen_ai·9 Şub

AI2's computer vision team PRIOR announced an exciting new release of their #EmbodiedAI platform AI2-THOR – in partnership with @unity, you can now train headlessly on multiple GPUs. 📈 Learn more: medium.com/ai2-blog/ai2-t…

English

Keşfet

@cursor_ai @SFResearch @huggingface @Jason_Q @LiuZuxin @JYJimLiu @rithesh__rn @JianguoZhang3