Jerry Chen

7.8K posts

Jerry Chen banner
Jerry Chen

Jerry Chen

@jerrychen

Restless. Irreverent. Greylock GP. @GreylockVC

San Francisco, CA Katılım Mart 2008
3.4K Takip Edilen10.4K Takipçiler
Jerry Chen retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
We've massively improved our document layout capabilities in LlamaParse 📄📐 This means that our document OCR engine lets you get insanely detailed bounding boxes over really complex multimodal documents, like the research poster shown below. A core requirement for any agentic document workflow is enabling users to trace back to the source. Now your AI agents can reason over complex line charts and tables deeply embedded within specific pages, free of hallucinations, but also surface the source segment to the user. Come check out LlamaParse: cloud.llamaindex.ai/?utm_source=xj… If you are building document OCR in production, come talk to us: llamaindex.ai/contact?utm_so…
Jerry Liu tweet media
LlamaIndex 🦙@llama_index

LlamaParse Agentic Plus mode now delivers precise visual grounding with bounding boxes for the most challenging document elements. Our latest update brings major improvements to how we handle complex visual content: 📐 Complex LaTex formulas - accurately parse mathematical expressions with precise positioning ✍️ Handwriting recognition - extract handwritten text with location coordinates 📊 Complex layouts - navigate multi-column documents and intricate formatting 📈 Infographics and charts - identify and extract data visualizations with spatial context This means you can now build applications that not only extract text from documents but also understand exactly where that content appears on the page - perfect for creating more intelligent document analysis workflows. Try LlamaParse Agentic Plus mode and see how visual grounding transforms your document parsing capabilities: cloud.llamaindex.ai/?utm_source=so…

English
4
9
39
3.8K
Jerry Chen retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Bengaluru data engineers: Join our no-fluff meetup at the @onehousehq office on March 25th (next Wed eve) ⚙️ If you're into Spark, Hudi, Iceberg, lakehouses, or AI infra—this is for you. I'll cover: - Scaling Spark on K8s: What works, what breaks 🔧 - Next-gen lakehouse with Quanton & LakeBase 🏗️ Real talk on arch, benchmarks, tradeoffs - not a marketing event. Small group for deep chats. 📍 Onehouse Bengaluru ⏰ 4-6:30 PM IST 🚀 Register: docs.google.com/forms/d/1tQECs…
Vinoth Chandar tweet media
English
0
1
3
329
Jerry Chen retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Everyone assumes usage-based pricing in cloud data is fair and efficient. ⚖️ But it has a real problem: It can stop vendors for building faster engines. Traditional models priced on value—Oracle earned more for standout features. Now, with EMR or Databricks, bills hinge on compute usage. Customers win from compute efficiency (lower costs), but vendors lose revenue, pushing them to own the compute layer for pricing control. Sure, usage models offer flexibility, but they misalign incentives long-term. What's better? We need outcome-based pricing that rewards real value, like queries executed or data processed. 🚀📊
Vinoth Chandar tweet media
English
0
5
9
543
Jerry Chen retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
Spark is still a $15B+ annual spend category 💰 Yet most enterprises treat Spark like a black box. 🧠 TLDR: pip install spark-analyzer Apache Spark still powers the backbone of lakehouse workloads 🏗️ Yet inside most companies, no one can clearly answer: ❓ Where does the spend actually go? ❓ Why don’t optimizations translate into real savings? ❓ Why is Spark cost so unpredictable? A huge share of this spend runs on ⚠️ slow runtimes that waste compute cycles (e.g. default EMR setups) 💸 premium platforms charging 2–3× markups for engines like Photon If you now want to do something about it : pypi.org/project/spark-…
Vinoth Chandar tweet media
English
0
5
5
659
Jerry Chen retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
Existing "OCR" technology for digitalizing PDFs has been around for ~30 years. Reading printed characters on a page and converting them into meaningful representations is a hard problem! Existing approaches were either dependent on pattern matching to specific document templates, or on specialized ML models for specific data distributions. They constantly needed template/model refitting and broke on the long-tail of varied docs. Today, vision models are capable of much higher general accuracy without constant retraining, but they still need careful orchestration to make sure that they're able to attend to specific elements (tables, charts), and output semantically correct outputs. Our OCR platform LlamaParse is built on this "agentic OCR" foundation. A network of specialized agents will parse apart even the most complicated documents and reconstruct the outputs in a semantically meaningful way. We're excited to reach a world where raw parsing accuracy is not just 80% over "easy" docs, but 100% accurate over literally any document that exists. Check it out: llamaindex.ai/blog/agentic-o… LlamaParse: cloud.llamaindex.ai/?utm_source=xj…
Jerry Liu tweet media
LlamaIndex 🦙@llama_index

Ever wondered what we mean by 'agentic' OCR? It's parsing that reasons about documents instead of just reading them. Agentic OCR adapts to layout changes by treating document processing as a goal-oriented task rather than simple text extraction. 🧠 Uses multimodal language models to understand document structure and context, not just convert pixels to text 📍 Provides visual grounding with bounding boxes so every extracted field traces back to its source location 🔄 Runs self-correction loops to catch inconsistencies before they reach your downstream systems ⚡ Achieves 90-95%+ straight-through processing rates on new document formats without template setup This matters for legal teams processing M&A due diligence, healthcare admins handling medical forms, and finance teams reconciling reports across subsidiaries. The agent doesn't just extract data - it completes document workflows with built-in validation and business logic. LlamaParse is our implementation of agentic OCR. Get 10,000 free credits to test it against your actual documents: Read the full breakdown: llamaindex.ai/blog/agentic-o…

English
12
24
172
22.3K
Jerry Chen retweetledi
Jerry Chen retweetledi
Vinoth Chandar
Vinoth Chandar@byte_array·
1/ ✨ Azure just made the list. Not the list you’re thinking of. The list of clouds that Onehouse runs on. With our launch on Microsoft Azure, the only truly modular data lakehouse platform now runs across AWS, GCP, and Azure.
Vinoth Chandar tweet media
English
1
4
9
453
Jerry Chen retweetledi
Onehouse
Onehouse@Onehousehq·
Onehouse is now officially available on Microsoft Azure. The #1 demand from data teams ever since we announced our faster Spark engine “Quanton” has been Azure support. We are thrilled to answer that call and bring the Onehouse open data lakehouse platform to Azure users – with faster, cheaper Apache Spark infrastructure – without sacrificing openness or flexibility. We’re proud to announce that Onehouse is now the first cloud data platform to run on all three major clouds, and support any major lakehouse or warehouse engine, including our own. The Onehouse Platform With Onehouse on Azure, you get access to our full platform to scale your workloads efficiently directly on k8s (AKS): 🚀 Spark & SQL Powered by Quanton™: Our purpose-built execution engine delivers 3-4x better price/performance than open-source Spark. 0 code changes required—just point your existing pipelines to Quanton and start speeding up. 🌊 OneFlow Data Ingestion: Battle-tested, managed ingestion from databases, event streams, and cloud storage into your lakehouse, handling schema evolution and data quality automatically. ⚙️ Open Engines: Spin up open-source engines like Trino, Ray, and Apache Flink™ with a single click, pre-connected and optimized for your tables in Onehouse. ⚡ Automated Table Optimization: Accelerate query performance up to 30x across any connected engine with automated maintenance for Apache Hudi, Iceberg, and Delta Lake. Built for the Azure Ecosystem This launch builds on our deep partnership with Microsoft, including our co-creation of Apache XTable™, and is designed to integrate natively into your existing architecture: 🔒 Deployed in your VNet: Your data never leaves your environment. It stays secure in your Azure Data Lake Storage (ADLS) accounts, fully governed by your own network controls. 🤝 Microsoft OneLake Catalog: Keep your table metadata perfectly synced with Microsoft OneLake using OneSync™, for seamless querying from Microsoft Fabric, Power BI, or Synapse. 🔌 Seamless Azure Integrations: Stream real-time data from Azure Event Hubs or replicate data via CDC from Azure Database for PostgreSQL directly into your lakehouse. Whether you are scaling an existing Microsoft Fabric architecture or building a new lakehouse on Azure from scratch, Onehouse offers a fully managed, open, and highly optimized path forward. Running complex ETL pipelines or real-time analytics no longer breaks the bank. Ready to see how much you could save on your Spark workloads? Check out the blog. 👉 onehouse.ai/blog/bringing-… #Azure #MicrosoftFabric #OneLake #DataPlatform #Lakehouse #DataEngineering #ApacheSpark
Onehouse tweet media
English
0
1
2
204
Jerry Chen retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown - that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesn’t have high-fidelity understanding over graphics/charts/tables. Our project Surreal Slides uses LlamaParse to convert presentations into clean structured data that you can put into a db (@SurrealDB) for simple retrieval, without having to take screenshots of the data on the fly. Thanks to @itsclelia for this project, check it out: github.com/run-llama/surr…
LlamaIndex 🦙@llama_index

If you’re working with lots of slide decks and need a better way to search through them, Surreal Slides makes it simple 🌀 Built around LlamaParse, it parses presentation files into clean, structured data, turning raw slides into something AI can truly understand. Each slide is extracted, summarized, and organized before being stored in @SurrealDB for flexible retrieval. From there, you can query your entire presentation library in natural language through an agentic interface: no need to manually scan files or remember where a specific slide lives. Take a look at the demo below👇 GitHub Repository: github.com/run-llama/surr…

English
16
27
221
39.4K
Jerry Chen retweetledi
Sara Du
Sara Du@saradu·
linguistics x ai dinner, ft. sf sunset
Sara Du tweet media
English
6
1
95
6.5K
Kevin Kwok
Kevin Kwok@kevinakwok·
This article is very not true. please stop doing second order advanced analysis of it on the feed
Kevin Kwok tweet media
English
8
3
117
226.1K
Jerry Chen retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
I love the Big Arch Burger 🍔 I also love Big Harnesses™ and Big Complex PDFs™ with hundreds of pages of tables, images and forms.
English
5
5
81
14K
Jerry Chen retweetledi
Jerry Liu
Jerry Liu@jerryjliu0·
Shoutout @latentspacepod for calling me a "Big Harness guy" in this article: latent.space/p/ainews-is-ha… The biggest barrier to adapting AI is your ability to provide context and workflows to these models. We see ourselves as unlocking the highest quality context from all documents (PDFs, Word, Excel) so that agents can reason through them at scale.
Jerry Liu tweet media
English
1
5
27
22.6K
Jerry Chen retweetledi
Shreya Shekhar
Shreya Shekhar@_shreya_s·
Excited to kick off this year’s Systems Reading Group series with @harborframework and @terminalbench! Top frontier labs, data vendors, and AI cos are moving to Harbor for their RL infra and evals. Come by to learn why, and dive into key components of their architecture with creators @alexgshaw & @ryanmart3n! Sign up below for the event on 3/10 👉 luma.com/wkdfbw17
English
4
7
104
16.2K
jedgar
jedgar@jedgar·
I think this talk by my friend @jerrychen may very well be one of the most underappreciated startup talks out there. One of the most intuitive explainers on unit economics/unit of value. Must watch for all fresh founders:
English
1
1
5
583
jedgar
jedgar@jedgar·
So young here Jerry lol
English
1
0
1
97
christine kim
christine kim@ChristneKim·
After 5 incredible years at Greylock, it’s time to build again! I'm thrilled to share that I'm joining @getserval as Head of Strategic Projects. Serval is building AI agents that automate enterprise IT workflows, replacing systems like ServiceNow with something fundamentally better. ITSM is one of the largest enterprise markets and the opportunity to reimagine it with AI is once in a lifetime. My initial focus is building out the forward deployed engineering and deployment motion from the ground up. FDEs are the critical bridge between the product and the customer and it’s one of the most important functions in enterprise AI right now. At Serval, FDEs don’t just configure in customer environments, they build critical core product — and we already work with some of the largest enterprise customers in the world including Abridge, Bilt, Notion, Vercel, Verkada plus more that I’d love to tell you in person. Which brings me to this — I'm hiring FDEs. This role is perfect for former or future founders as you’ll spend time selling, building, and deploying 0→1 world class enterprise products. If you're an engineer who loves working directly with customers, thrives in ambiguity, and wants to be early at a company with serious momentum, I'd love to hear from you (DM or reach me at christine@serval.com) I’m so grateful to the @GreylockVC team and to all my founders for an incredible chapter. The privilege of partnering with founders at their earliest stages and working with some of the sharpest minds in the industry has shaped how I think, operate, and lead. Thank you.
christine kim tweet mediachristine kim tweet mediachristine kim tweet media
English
17
1
183
20.7K
Jerry Chen retweetledi
Renu Raman
Renu Raman@renuraman·
I have a different view. Taking @jerrychen 's original thesis (he might need to update with all the changes), today 1. Systems of Engagement: Voice (via LLMs) 2. Systems of Agents (you can say its part of intelligence - but worth calling a layer of its own) 3. Systems of Intelligence (per the original framework), executes the business logic 4. Systems of Data (a new layer on top of traditional systems of record as we have LLM have some enterprise truth, Knowledge graph, Vector (Unstructured data) and Relational / tabular (traditional systems of record). 5. Systems of Record Now you can collapse 2&3 into one and 4&5 into 1, but I am just calling out recognizingthe emergent new layers. The interesting thing is a foundation model spans all the layers or touches all the layers
English
1
3
2
174
Jerry Chen retweetledi
Neo
Neo@neo·
Today we unveiled Neo Residency, a new program for startups and high-agency student teams. 🎉 We’re replacing our best-known program, Neo Accelerator, with something even better and more selective. 🧵
Neo tweet media
English
34
68
310
187.9K