Albert Sunyer retweetledi

Google quietly released the most powerful text extraction tool ever.
It's called LangExtract and it's a Python library that extracts structured information from unstructured text using LLMs with precise source grounding.
No regex nightmares. No manual parsing. No fine-tuning needed.
Here's how it works:
You define what you want to extract with a few examples
→ LangExtract chunks your document intelligently
→ Processes chunks in parallel across multiple passes
→ Maps every extraction to its exact location in the source text
→ Generates an interactive HTML visualization to review everything
Here's the wildest part:
It handled the full text of Romeo and Juliet (147,843 characters) extracting hundreds of entities with high accuracy using just a few-shot prompt and Gemini 2.5 Flash.
No model training. No labeled datasets. Just examples + instructions.
Already has 17.1K GitHub stars, supports Gemini, GPT-4o, and local models via Ollama.
100% Opensource. Apache 2.0 license.

English
















