Pulse

99 posts

Pulse

@Pulse__AI

Production-grade unstructured document extraction

San Francisco, CA Joined Haziran 2024

3 Following776 Followers

sid@sid_mnk·12 Mar

Footnote extraction is now available on the @Pulse__AI platform. Every footnote is returned as a structured object with its reference marker, full text, and positional metadata linking it back to exactly where it was cited in the document. Available now here: platform.runpulse.com

English

6.5K

Pulse@Pulse__AI·12 Mar

@sid_mnk 🙌

QME

Pulse@Pulse__AI·18 Şub

@sid_mnk Finally!

English

Pulse retweeted

sid@sid_mnk·18 Şub

The dirty secret of document AI: most accuracy problems aren't model problems. They're scoping problems. If your schema is running against the full document, you're asking your model to find signal across cover pages, appendices, boilerplate, and blank pages. It doesn't know what matters. You do. A billion pages taught us the fix is upstream: define which pages are relevant before extraction runs. Clean inputs, clean outputs. @Pulse__AI is launching Split for public access today.

English

7.4K

Pulse retweeted

Ritvik Pandey@ritvikpandey21·14 Oca

Agentic OCR is everywhere in document AI conversations right now. We published a breakdown of what it actually means, the tradeoffs that matter for production, and why hybrid architectures usually win. More below!

English

876

Pulse@Pulse__AI·14 Oca

@sid_mnk @NotionHQ 👀

QME

102

Pulse retweeted

sid@sid_mnk·14 Oca

shoutout @NotionHQ

English

2.9K

Pulse@Pulse__AI·13 Oca

@sid_mnk wow

sid@sid_mnk·13 Oca

Introducing Suggest by @Pulse__AI upload a document, get a suggested schema in seconds. No manual field definition to get started. The output is production-ready and API-compliant. more details below!

English

195

Pulse@Pulse__AI·12 Oca

@ritvikpandey21 🙌

QME

Pulse retweeted

Ritvik Pandey@ritvikpandey21·12 Oca

We assumed structured outputs had solved document extraction. Then we tried complex schemas at scale. New post on the computational complexity of schema-guided extraction: why JSON schemas require pushdown automata, how state explosion happens, and why tighter constraints can actually hurt accuracy.

English

282

Pulse retweeted

Ritvik Pandey@ritvikpandey21·8 Oca

Most spreadsheet errors are not obvious failures. They are structural ones. Flatten the grid and you lose meaning before the model ever reasons. After millions of XLSX pages in production, the takeaway was clear. Representation was the bottleneck, not scale. Advanced spreadsheet parsing is now generally available in @Pulse__AI - due to high demand, message our team to get started. Better structure. Fewer silent errors. Real business impact at production scale.

English

7.8K

Pulse retweeted

Ritvik Pandey@ritvikpandey21·29 Ara

Accuracy is only part of the document AI problem. In production systems, meaning depends on layout. Tables, hierarchy, and proximity determine how values should be interpreted and verified. Layout segmentation and bounding boxes preserve that structure. They enable citations, traceability, and review long after extraction. We wrote a technical post on why layout needs to be treated as a first-class primitive in document AI systems, with examples from finance, healthcare, and legal workflows. Link below.

English

339

Pulse@Pulse__AI·25 Ara

@ritvikpandey21 🥳

QME

Pulse retweeted

Ritvik Pandey@ritvikpandey21·24 Ara

Webhooks are now live in @Pulse__AI . Real-time job notifications, auto-retries, signed payloads. If you're running async extraction at scale, no more polling. Small addition, but it came up a lot.

English

302

Pulse retweeted

Ritvik Pandey@ritvikpandey21·25 Ara

Merry Christmas! A common request we get is how to share extraction results with someone who doesn't have a @Pulse__AI account. Now you can. Shared links with configurable expiration and org-level visibility controls. Simple feature, but it removes friction in a lot of workflows.

English

338

Pulse retweeted

sid@sid_mnk·23 Ara

One of the hardest problems in document extraction isn't generating output. It's knowing whether that output is actually correct as systems evolve. Most teams rely on spot checks, which doesn't scale well. Accuracy Scorer from @Pulse__AI lets you upload ground truth, measure precision and recall at the field level, and catch regressions before they hit production. Available now: platform.runpulse.com

English

281

Pulse retweeted

sid@sid_mnk·22 Ara

Today we're launching Extraction Library in @Pulse__AI . One challenge we kept hearing: teams lose track of which schemas and prompts are actually running in production. Someone updates a config, outputs start drifting, and it's hard to trace what changed. Extraction Library keeps everything versioned in one place. Full history, inline editing, no more guessing.

English

236

Pulse retweeted

sid@sid_mnk·21 Ara

Finally released our public Python and TypeScript SDKs for @Pulse__AI. Typed interfaces, sync and async support, and full parity with the REST API. More details below.

English

313

Pulse retweeted

Ritvik Pandey@ritvikpandey21·20 Ara

Today we're launching a rebuilt structured output system in @Pulse__AI . A lot of structured extraction is just asking a model to output JSON and hoping it's right. We rebuilt ours from scratch. Schema-first pipeline, two-step process, and every field cites back to exactly where it came from in the source document. Shipping today in the platform and API.

English

186

Pulse@Pulse__AI·19 Ara

Enjoyed this!

Ritvik Pandey@ritvikpandey21

We spent the past couple weeks delivering holiday gifts to 100 startups around SF to close out the year. Got to do a few office visits along the way, some of our favorite offices were @usepylon , @MomenticAI , and @juicebox_work . Huge thanks to the teams for the office tours. Big announcements continuing next week!

English

195

Discover

@sid_mnk @NotionHQ @ritvikpandey21 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates