Zhiwei-Jim

20 posts

Zhiwei-Jim

Zhiwei-Jim

@JYJimLiu

Zhiwei Liu, senior research scientist @salesforce @SFResearch

Katılım Ekim 2022
57 Takip Edilen36 Takipçiler
Zhiwei-Jim
Zhiwei-Jim@JYJimLiu·
@dair_ai it seems you tagged a different paper. here is the arxiv paper link for our MCPEval arxiv.org/abs/2507.12806. And also we posted a demo about how to use it. x.com/SFResearch/sta…
Salesforce AI Research@SFResearch

⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅ Standardized metrics for reproducible research ✅ Open-source & eliminates manual bottlenecks Our evaluation of 10+ models (GPT-4o, O3, Qwen3, etc.) reveals surprising insights: smaller tool-enhanced models can match larger ones in specific domains! Perfect for researchers & developers building reliable AI agents. #AIAgents #FutureOfAI #EnterpriseAI

English
0
0
0
4
DAIR.AI
DAIR.AI@dair_ai·
9. MCPEval MCPEval is an open-source framework that automates end-to-end evaluation of LLM agents using a standardized Model Context Protocol, eliminating manual benchmarking. arxiv.org/abs/2507.15015
English
4
1
22
3.6K
DAIR.AI
DAIR.AI@dair_ai·
Top AI Papers of The Week (July 21 - 27): - MCPEval - Subliminal Learning - Learning without Training - Alignment Auditing Agents - Structural Planning for LLM Agents - Inverse Scaling in Test-Time Compute - Deep Researcher with Test-Time Diffusion Read on for more:
English
12
99
649
100.6K
Zhiwei-Jim
Zhiwei-Jim@JYJimLiu·
Check our latest work about automatic mcp-based evaluation pipeline
Salesforce AI Research@SFResearch

⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅ Standardized metrics for reproducible research ✅ Open-source & eliminates manual bottlenecks Our evaluation of 10+ models (GPT-4o, O3, Qwen3, etc.) reveals surprising insights: smaller tool-enhanced models can match larger ones in specific domains! Perfect for researchers & developers building reliable AI agents. #AIAgents #FutureOfAI #EnterpriseAI

English
0
0
0
37
Zhiwei-Jim retweetledi
Sheng Zhang
Sheng Zhang@sheng_zh·
📢Our team at Microsoft Research (@MSFTResearch) is hiring summer interns. If you have expertise in building image encoders, reward models, or vision-language models, I'd love to hear from you. Please send me your CV or website via email (zhang.sheng@microsoft.com) or DM!
English
6
30
225
28.4K
Zhiwei-Jim
Zhiwei-Jim@JYJimLiu·
@SFResearch Check our research work towards the visual agent model!
English
0
0
0
60
Zhiwei-Jim retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks! 📊Results: 20% gains on MMVet 3.9% average improvement across 8 benchmarks 1M+ synthetic CoTA traces in training 🔓 🔓🔓Fully open-sourced! 🔓🔓🔓 Get started with: 📄 Paper: bit.ly/3PufThl 💻 Code: bit.ly/3Pw8azw 📱 Demo: bit.ly/3PwrEE2 🤖 Models: bit.ly/4j2ZG0h 📚 Datasets: bit.ly/3Pxtzbv 🧵 ...and our Technical deep-dive starts here ⤵️ (1/4) How does TACO work? 🤔 ⛓️TACO answers complex questions by generating Chains-of-Thought-and-Action (CoTA), executing intermediate actions with external tools such as OCR, calculator, and depth estimation, then integrating both the thoughts and action outputs to produce final responses. We generate the synthetic CoTA data with two approaches: model-based generation (top) and programmatic generation (bottom).
Salesforce AI Research tweet media
English
6
57
178
70.5K
Zhiwei-Jim retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
Excited to open source TACO and see how the AI research community builds on these multimodal innovations! Together we'll push the boundaries of visual reasoning and agent capabilities. 🌮🚀 📄 Paper: bit.ly/3PufThl 💻 Code: bit.ly/3Pw8azw 📱 Demo: bit.ly/3PwrEE2 🤖 Models: bit.ly/4j2ZG0h 📚 Datasets: bit.ly/3Pxtzbv Huge thanks to our 🌮 research team! @zixianma02 @JianguoZhang3 @JYJimLiu @JieyuZhang20 @chrisjtan @ManliShu @jcniebles @shelbyh_ai @huan__wang @CaimingXiong @RanjayKrishna @silviocinguetta
English
2
1
11
1.2K
Zhiwei-Jim retweetledi
Silvio Savarese
Silvio Savarese@silviocinguetta·
Happy to see our team's hard work come to fruition. The xLAM family of models represents a huge leap in AI capabilities for function calling, planning and reasoning—fit-for-purpose for varied needs of modern business. Eager to see where its application takes us! #AIInnovation
Salesforce AI Research@SFResearch

Introducing the full xLAM family, our groundbreaking suite of Large Action Models! 🚀 From the 'Tiny Giant' to industrial powerhouses, xLAM is revolutionizing AI efficiency! #AIResearch #AIEfficiency 🤗 Hugging Face Collection: bit.ly/4faoYaQ 🤩 Research Blog bit.ly/3MxliCZ 🗞️ Press Release: sforce.co/3XzaOt9 Meet the family: • xLAM-1B / TINY: Our 1B parameter marvel, ideal for on-device AI. Outperforms larger models despite its compact size • xLAM-7B / SMALL: Perfect for swift academic exploration with limited GPU resources. • xLAM-8x7B / MEDIUM: Mixture-of-experts model balancing latency, resources, and performance for industrial applications. • xLAM-8x22B / LARGE: Our large-scale model for optimal performance in high-resource environments. 🎉 Huge congrats to the team of AI scientists who brought xLAM series to life! Zuxin Liu @LiuZuxin Shirley Kokane @KokaneShirley Ming Zhu @ming_zhu0527 Tian Lan @TLan001 Jianguo Zhang @JianguoZhang3 Thai Hoang @TeeH912. Caiming Xiong @CaimingXiong Silvio Savarese @silviocinguetta

English
0
12
18
4.1K
Zhiwei-Jim retweetledi
Juan Carlos Niebles
Juan Carlos Niebles@jcniebles·
The slides for my #CVPR2024 Tutorial on Agents are now available! I’ve also posted an accompanying blog and links to all the @SFResearch Open-Source repos to make it easy for people to get started. Check them out here: niebles.net/blog/2024/agen…
Juan Carlos Niebles tweet mediaJuan Carlos Niebles tweet media
Juan Carlos Niebles@jcniebles

I’m back at #CVPR2024! I’m speaking tomorrow 8:40am at the Generalist Agent AI Tutorial about Language-based AI Agents and Large Action Models (LAMs). #aiagent I’m also a panelist tomorrow 11:30am at the Workshop on What is next in Multimodal Foundation Models? #multimodalai

English
2
18
40
8.2K
Zhiwei-Jim
Zhiwei-Jim@JYJimLiu·
check our repos and play with our xLAM model and AgentLite Library!
Caiming Xiong@CaimingXiong

🎉🎉We are excited to release a full package for AI Agent R&D: 1) For Data & Training, 🎙️AgentOhana🎙️: Design Unified Data and Training Pipeline for Effective Agent Learning. 2) For model, 🔥xLAM-v0.1-R🔥: A strong large action model for AI Agent while maintaining abilities on general tasks. 3) For agent inference framework, 🤖AgentLite🤖: a lightweight agent/multi-agent library. AgentOhana aggregated, standardized and unified agent trajectories from distinct environments. xLAM-v0.1-r, fine-tuned on #Mixtral, outperforms #GPT-3.5-Turbo on the benchmarks (WebShop, HotpotQA, ToolBench, and MINT-Bench) and #GPT-4 on several of them. AgentLite is implemented with <1K lines of code, and magically supports quickly building LLM agents, designing new agent reasoning, new agent architectures and multi-agent orchestration. AgentOhana Paper: arxiv.org/abs/2402.15506… xLAM GitHub and Model:github.com/SalesforceAIRe… and huggingface.co/Salesforce/xLA… AgentLite Github: github.com/SalesforceAIRe… AgentLite Paper: arxiv.org/abs/2402.15538

English
0
0
4
69
Zhiwei-Jim
Zhiwei-Jim@JYJimLiu·
@cognition_labs Does Cognition train an LLM for Devin or just use gpt-4 as a foundation?
English
0
0
0
4
Cognition
Cognition@cognition·
Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.
English
4.3K
9.7K
42.7K
31.4M
Zhiwei-Jim retweetledi
Shelby Heinecke
Shelby Heinecke@shelbyh_ai·
Data Cloud powers AI - but there is also AI powering Data Cloud! Unifying data from different sources is challenging, and we build the AI to get it done intelligently. Check out our most recent work, now available in Data Cloud ⬇️
Salesforce AI Research@SFResearch

Good AI starts with good data. 💡 That's why #DataCloud is the foundation of AI at Salesforce. 💻 Excited to announce our latest AI-powered capabilities for #IdentityResolution in Data Cloud! Check out our blog to learn more. 🔍 #SalesforceAI blog.salesforceairesearch.com/identity-resol…

English
0
1
2
334
Zhiwei-Jim retweetledi
Caiming Xiong
Caiming Xiong@CaimingXiong·
🎉🎉We are excited to release 👉BOLAA👈: Benchmarking and Orchestrating LLM-augmented Autonomous Agents. In this release, we compare 6 different agent arches (including BOLAA) and 15 popular LLMs under web & QA agents tasks. We will keep expanding the benchmark!
GIF
English
3
30
121
18.7K
Zhiwei-Jim retweetledi
Caiming Xiong
Caiming Xiong@CaimingXiong·
We introduce 🔥XGen-7B 🔥, a new 7B LLM trained on up to 8K sequence length for 1.5T tokens. Achieves better or comparable results with MPT, Falcon, LLaMA, Redpajama, and OpenLLaMA in the text and code tasks. 🔗Blog: blog.salesforceairesearch.com/xgen/ 🔗Code: github.com/salesforce/xgen
English
7
108
445
61.9K
Zhiwei-Jim retweetledi
Huan Wang
Huan Wang@huan__wang·
Salesforce has been actively engaged in the integration of cutting-edge artificial intelligence technologies into our product offerings. We are excited to share some significant progress in our efforts to enhance the entity resolution capabilities. #salesforce #AI
Salesforce AI Research@SFResearch

Unify Profiles with Salesforce Data Cloud Identity Resolution Soft-Matching 🪪 @shelbyh_ai @huan__wang @JYJimLiu Read our blog: blog.salesforceairesearch.com/data-cloud-ide… Visit our website: salesforceairesearch.com/projects/data-…

English
0
5
7
2.7K