VLM Run

149 posts

VLM Run banner
VLM Run

VLM Run

@vlmrun

Visual Intelligence for Enterprise.

Santa Clara, CA Katılım Ocak 2022
46 Takip Edilen334 Takipçiler
Sabitlenmiş Tweet
VLM Run
VLM Run@vlmrun·
Chat with Orion – the first visual agent that sees, reasons, and acts across images, videos, and documents.
English
1
146
1.9K
12.7M
VLM Run
VLM Run@vlmrun·
Manually parsing handwritten intake forms can be slow and prone to error, while VLM Run's HIPAA-ready API allows you to extract the same details in seconds. In this tutorial by @jeremyparkphd, learn how to use VLM Run to extract structured JSON from handwritten healthcare documents at scale. Through this walkthrough, you will learn how to: - Upload documents in the Requests tab and run them against your saved skills - Enable confidence scores and grounding to see exactly where each field came from in the original document - Edit incorrect extractions and provide feedback to improve extraction over time - Run the same workflow programmatically via the VLM Run API as shown in Google Colab
English
2
3
4
40
VLM Run
VLM Run@vlmrun·
Chat with Orion – the first visual agent that sees, reasons, and acts across images, videos, and documents.
English
1
146
1.9K
12.7M
VLM Run
VLM Run@vlmrun·
Announcing Orion Skills! 🚀 Rather than rewriting prompts every time you want to define a specific task, you can now package all of that knowledge into a reusable skill. Why skills? - Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent) - Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision - Composable: Pass multiple skills in a single request, or combine them with custom schemas Unlike purely text-based skills, we have reimagined what skills mean for visual agents and how to codify visual workflows into skills. Try skills in chat today! And check out this skills creation tutorial by @jeremyparkphd 👇
English
1
4
8
402
VLM Run
VLM Run@vlmrun·
The AI agent skills conversation focuses mainly on text. But for many tasks, words fall short. We need visual skills: providing images and videos as context, not just text. It's one thing to describe to a robot how to fold a t-shirt. It's another to show it a video. At VLM Run, that's what we're building: visual agents that understand visual data and act on it.
English
0
3
6
187
VLM Run
VLM Run@vlmrun·
What if visual AI agents could help give feedback on exercise? @jeremyparkphd recently reviewed Orion visual AI agent for providing deadlift feedback. He raises the question: what if visual intelligence could be made accessible for applications in exercise, all through a chat interface? Read the Substack blog here: jeremyparkphd.substack.com/p/visual-agent…
English
1
5
7
320
VLM Run
VLM Run@vlmrun·
Healthcare documents come fragmented across PDFs, images, emails, and faxed scans. OCR fails because real-world documents require visual reasoning of layout and context – not just plain text extraction. Scan.com processes high volumes of documents and images where both speed and accuracy matter. They needed automation that could handle the diversity and complexity of healthcare documents. We built it together with Orion. In a single call: • Classifies multi-page document bundles • Extracts data from emails and attachments • Understands checkboxes, handwriting, and layout • Visually verifies for high confidence The result: faster processing, reduced manual QA, reliable structured data. Document automation isn't a text problem. It's a visual reasoning problem. Read more: vlm.run/blog/how-scan.…
VLM Run tweet media
English
1
2
3
197
VLM Run
VLM Run@vlmrun·
We're hiring our first infra engineer (senior/staff) at @vlmrun! We're processing tens of millions of VLM requests per month and scaling fast; we're looking for a founding Infrastructure Engineer to serve and operationalize our GPU workloads (custom runtimes on @vllm_project / transformers, orchestrated with @raydistributed / @modal). The work is technically challenging, the learning curve is steep (in the best way), and you'll be joining a stellar ML team building the go-to visual intelligence platform for enterprises. In-person ONLY. Tag someone who'd crush this 👇
English
1
1
9
1.7K
Ethan Mollick
Ethan Mollick@emollick·
The ability of AI to understand video/images seems to be largely underexplored and underexploited. There are a lot of economically valuable applications to having an AI watch the world in real time, even with errors & limitations, and I have seen few products or papers on it.
English
52
12
362
26.1K
Google AI
Google AI@GoogleAI·
Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model's context window, allowing it to inspect the new data before generating a final response to the initial image query Learn more about Agentic Vision and how to access it in our blog ⬇️ blog.google/innovation-and…
English
170
564
4.3K
544.8K