AI & TECH

34K posts

AI & TECH banner
AI & TECH

AI & TECH

@AiTechHubs

NEWS • TIPS • RESOURCES

Katılım Ekim 2012
22 Takip Edilen155.6K Takipçiler
AI & TECH retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench. Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work: ✅ It optimizes for semantic correctness (instead of exact similarity) ✅ It has the most comprehensive distribution of real-world enterprise documents It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding. We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings: 💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost. 💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better. 💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better. 💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions. This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper. 🌐: Blog: llamaindex.ai/blog/parsebenc… 📄 Paper: arxiv.org/abs/2604.08538… 💻 Code: github.com/run-llama/Pars… 📊 Dataset: huggingface.co/datasets/llama… 🎥 YouTube: youtube.com/watch?v=g5p7G-…
YouTube video
YouTube
English
31
81
526
106.6K
AI & TECH retweetledi
Karan Vaidya
Karan Vaidya@KaranVaidya6·
Okay, @gdb is team CLI all the way. @garrytan thinks MCPs suck. So we hit the streets of SF to see if the city agreed. We posed a simple question: MCP or CLI? - Basically everyone under the age of 35 said CLI - One person said MCP was as bloated as Java - & unsurprisingly, numerous people told us to touch grass Final score- MCP: 3 vs CLI: 17 SF has spoken, and @composio listened. Our universal CLI is now live! Drop your best CLI vs MCP hot take in the comments and we'll send the best ones some very sick gear 👀 Link to try our CLI in the next thread ⬇️
English
134
316
1.1K
2M
AI & TECH
AI & TECH@AiTechHubs·
This man is living in 3017 we are all just passengers
English
0
2
7
0
AI & TECH
AI & TECH@AiTechHubs·
My shooter Pablo
English
0
9
10
0
AI & TECH
AI & TECH@AiTechHubs·
THIS COACH IS AN ABSOLUTE SAVAGE
English
0
10
10
0
AI & TECH
AI & TECH@AiTechHubs·
Not gonna lie this looks fun as hell
English
0
2
3
0