Pulse

99 posts

Pulse banner
Pulse

Pulse

@Pulse__AI

Production-grade unstructured document extraction

San Francisco, CA Joined Haziran 2024
3 Following776 Followers
sid
sid@sid_mnk·
Footnote extraction is now available on the @Pulse__AI platform. Every footnote is returned as a structured object with its reference marker, full text, and positional metadata linking it back to exactly where it was cited in the document. Available now here: platform.runpulse.com
English
1
5
18
6.5K
Pulse retweeted
sid
sid@sid_mnk·
The dirty secret of document AI: most accuracy problems aren't model problems. They're scoping problems. If your schema is running against the full document, you're asking your model to find signal across cover pages, appendices, boilerplate, and blank pages. It doesn't know what matters. You do. A billion pages taught us the fix is upstream: define which pages are relevant before extraction runs. Clean inputs, clean outputs. @Pulse__AI is launching Split for public access today.
English
6
4
31
7.4K
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Agentic OCR is everywhere in document AI conversations right now. We published a breakdown of what it actually means, the tradeoffs that matter for production, and why hybrid architectures usually win. More below!
Ritvik Pandey tweet media
English
4
2
11
876
Pulse retweeted
sid
sid@sid_mnk·
shoutout @NotionHQ
sid tweet media
English
3
3
15
2.9K
sid
sid@sid_mnk·
Introducing Suggest by @Pulse__AI upload a document, get a suggested schema in seconds. No manual field definition to get started. The output is production-ready and API-compliant. more details below!
English
2
2
6
195
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
We assumed structured outputs had solved document extraction. Then we tried complex schemas at scale. New post on the computational complexity of schema-guided extraction: why JSON schemas require pushdown automata, how state explosion happens, and why tighter constraints can actually hurt accuracy.
Ritvik Pandey tweet media
English
2
2
8
282
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Most spreadsheet errors are not obvious failures. They are structural ones. Flatten the grid and you lose meaning before the model ever reasons. After millions of XLSX pages in production, the takeaway was clear. Representation was the bottleneck, not scale. Advanced spreadsheet parsing is now generally available in @Pulse__AI - due to high demand, message our team to get started. Better structure. Fewer silent errors. Real business impact at production scale.
Ritvik Pandey tweet media
English
4
3
22
7.8K
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Accuracy is only part of the document AI problem. In production systems, meaning depends on layout. Tables, hierarchy, and proximity determine how values should be interpreted and verified. Layout segmentation and bounding boxes preserve that structure. They enable citations, traceability, and review long after extraction. We wrote a technical post on why layout needs to be treated as a first-class primitive in document AI systems, with examples from finance, healthcare, and legal workflows. Link below.
Ritvik Pandey tweet media
English
1
2
9
339
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Webhooks are now live in @Pulse__AI . Real-time job notifications, auto-retries, signed payloads. If you're running async extraction at scale, no more polling. Small addition, but it came up a lot.
English
2
2
6
302
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Merry Christmas! A common request we get is how to share extraction results with someone who doesn't have a @Pulse__AI account. Now you can. Shared links with configurable expiration and org-level visibility controls. Simple feature, but it removes friction in a lot of workflows.
Ritvik Pandey tweet media
English
1
2
8
338
Pulse retweeted
sid
sid@sid_mnk·
One of the hardest problems in document extraction isn't generating output. It's knowing whether that output is actually correct as systems evolve. Most teams rely on spot checks, which doesn't scale well. Accuracy Scorer from @Pulse__AI lets you upload ground truth, measure precision and recall at the field level, and catch regressions before they hit production. Available now: platform.runpulse.com
English
1
2
7
281
Pulse retweeted
sid
sid@sid_mnk·
Today we're launching Extraction Library in @Pulse__AI . One challenge we kept hearing: teams lose track of which schemas and prompts are actually running in production. Someone updates a config, outputs start drifting, and it's hard to trace what changed. Extraction Library keeps everything versioned in one place. Full history, inline editing, no more guessing.
English
2
2
9
236
Pulse retweeted
sid
sid@sid_mnk·
Finally released our public Python and TypeScript SDKs for @Pulse__AI. Typed interfaces, sync and async support, and full parity with the REST API. More details below.
sid tweet media
English
2
2
9
313
Pulse retweeted
Ritvik Pandey
Ritvik Pandey@ritvikpandey21·
Today we're launching a rebuilt structured output system in @Pulse__AI . A lot of structured extraction is just asking a model to output JSON and hoping it's right. We rebuilt ours from scratch. Schema-first pipeline, two-step process, and every field cites back to exactly where it came from in the source document. Shipping today in the platform and API.
English
1
2
6
186