AI & TECH

@AiTechHubs

NEWS • TIPS • RESOURCES

Katılım Ekim 2012

22 Takip Edilen155.6K Takipçiler

AI & TECH retweetledi

Jerry Liu@jerryjliu0·13 Nis

We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench. Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work: ✅ It optimizes for semantic correctness (instead of exact similarity) ✅ It has the most comprehensive distribution of real-world enterprise documents It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding. We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings: 💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost. 💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better. 💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better. 💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions. This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper. 🌐: Blog: llamaindex.ai/blog/parsebenc… 📄 Paper: arxiv.org/abs/2604.08538… 💻 Code: github.com/run-llama/Pars… 📊 Dataset: huggingface.co/datasets/llama… 🎥 YouTube: youtube.com/watch?v=g5p7G-…

YouTube

English

526

106.6K

AI & TECH retweetledi

Karan Vaidya@KaranVaidya6·27 Mar

Okay, @gdb is team CLI all the way. @garrytan thinks MCPs suck. So we hit the streets of SF to see if the city agreed. We posed a simple question: MCP or CLI? - Basically everyone under the age of 35 said CLI - One person said MCP was as bloated as Java - & unsurprisingly, numerous people told us to touch grass Final score- MCP: 3 vs CLI: 17 SF has spoken, and @composio listened. Our universal CLI is now live! Drop your best CLI vs MCP hot take in the comments and we'll send the best ones some very sick gear 👀 Link to try our CLI in the next thread ⬇️

English

134

316

1.1K

AI & TECH retweetledi

Matthieu ❙❙ ElevenLabs@matt_elevenlabs·5d

To celebrate the launch of @ElevenCreative on X, we’re giving away 111k credits to 3 lucky creators. To enter: Like + follow @ElevenCreative Winners announced on May 6 at 4 PM GMT

English

361

130

1.5K

47.7K

AI & TECH retweetledi

Matthieu ❙❙ ElevenLabs@matt_elevenlabs·5d

We just launched ElevenMusic. We've paid out over $11M to voice creators. Now the same model comes to music. Like this post to get the link in your DMs. x.com/ElevenLabs/sta…

ElevenLabs@ElevenLabs

Today we are launching ElevenMusic, a new platform to discover, remix, create, and earn from music, built on the ElevenLabs music model.

English

111

1.5K

272.5K

AI & TECH@AiTechHubs·16 Oca

turn on sound and enjoy vine.co/v/irh50Emljln

English