Sabitlenmiş Tweet

Introducing WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
abs: arxiv.org/pdf/2405.00823
LLM-based "agents" can interact with external systems by correctly invoking their interfaces, greatly improving over vanilla LLMs.
1/n 🧵

English










