
Hugging Face
13.2K posts

Hugging Face
@huggingface
The AI community building the future. https://t.co/TpiXQMQ9rZ







We keep saying we want open-source frontier agents. Fine. Then let’s build the dataset. @badlogicgames, creator of Pi, just shared some of his agent traces used to build Pi on @huggingface. I’m now sharing some of mine too, exporting them from @hermes, @opencode, and Claude via @tracesdotcom, and I’ll keep going. Why this matters: one of the biggest bottlenecks for open-source agent models is the data. And all of us are generating that data every day through our conversations with agents. If enough builders share even a slice of their traces publicly, we can create the largest crowdsourced open dataset for agents. Time to put your tokens where your mouth is and give a chance for open source to win!


I’m pleased to share that our search team has open sourced an embedding model called Harrier that is currently ranking #1 on the multilingual MTEB-v2 benchmark leaderboard. Harrier delivers SOTA performance on retrieval quality, semantic matching, and contextual analysis across workloads, supporting more than 100 languages and handles long inputs up to 32K. It is built for the next generation semantic search for Bing and our web grounding (RAG) service for AI agents, which already powers nearly every major AI chatbot today. As you can see in the leadership board, our Harrier model is currently ahead of other excellent models based on Gemini, Gemma, Llama, Qwen, and more. I’m grateful for the hard work of our team to get to this top ranking, and I’m excited to see all the healthy competition in the space, which should ultimately lead to more innovations that will benefit everyone. Learn more: msft.it/6019QNB0b





















