Amitoj Singh

2 posts

Amitoj Singh

Amitoj Singh

@amitojs04

UC Berkeley EECS and Business

Katılım Ekim 2024
15 Takip Edilen10 Takipçiler
Amitoj Singh retweetledi
Shishir Patil
Shishir Patil@shishirpatil_·
🔥 At ICML 2025, we’re delighted to introduce BFCL V4 Agentic. As function-calling (also called tool-calling) forms the bed-rock of Agentic systems, BFCL V4 Agentic benchmark focuses on tool-calling in real-world agentic settings — including: 🔍 Web search with multi-hop reasoning and error recovery 🧠 Evaluating Tool-Calling for Memory ⚠️ Evaluating Format Sensitivity As always, BFCL prioritizes real-world realism. For example, in the web-search track, we evaluate not just multi-hop reasoning ability—but also how models handle real-world failures. In BFCL V4, we introduce randomized injection of six common programmatic access errors: 503 Server Error, 429 Too Many Requests, 403 Forbidden, etc Which models recover gracefully? Which ones fail silently? All this and more! Checkout BFCL V4 Agentic blogs: Web-search: gorilla.cs.berkeley.edu/blogs/15_bfcl_… Memory: gorilla.cs.berkeley.edu/blogs/16_bfcl_… Format Sensitivity: gorilla.cs.berkeley.edu/blogs/17_bfcl_… As always, everything is open-sourced at BFCL V4 PR: github.com/ShishirPatil/g… 🏃‍♂️Who's the overall #1? We're currently sprinting to integrate all models into the new benchmark. Once generations are complete, the leaderboard will migrate from v3 to v4. Hang tight — big updates incoming!
Shishir Patil tweet media
English
1
10
19
4K