

Steffi Chern
93 posts

@steffichern
CS PhD @Penn | @NSF Graduate Fellow | B.S. @CarnegieMellon 🤠








🚀We are excited to introduce the Tool Decathlon (Toolathlon), a benchmark for language agents on diverse, complex, and realistic tool use. ⭐️32 applications and 600+ tools based on real-world software environments ⭐️Execution-based, reliable evaluation ⭐️Realistic, covering daily and professional scenarios Toolathlon reveals significant shortcomings of SOTA LLMs in realistic tool-use tasks, where Claude Sonnet 4.5 achieves 38.6% success rate. It also indicates a clear gap between open-source and leading proprietary models. Check our blog: toolathlon.xyz/docs/blog/tool… Github: github.com/hkust-nlp/Tool… Paper: huggingface.co/papers/2510.25… 🧵⬇️




I will be attending #CVPR2025 and presenting our latest research at Apple MLR! Specifically, I will present our highlight poster--world consistent video diffusion (cvpr.thecvf.com/virtual/2025/p…), and three workshop invited talks which includes our recent preprint ★STARFlow★! (0/n)

In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)

















