Mixpeek

205 posts

Mixpeek

@mixpeek

The multimodal data warehouse. One API to decompose, store, and search video, image, audio, and documents. Built on Ray. 🏗️ Try live demos ↓

USA Katılım Eylül 2021

9 Takip Edilen140 Takipçiler

Mixpeek@mixpeek·15h

available now at storage.mixpeek.com

ethan steininger 🔎@ethansteininger

We compared 21 object storage providers across pricing, S3 compat, and the gotchas nobody tells you until after you migrate 50 TB. • Wasabi's 90-day minimum retention • R2 has no versioning or object lock • DO Spaces caps objects at 5 GB (not 5 TB) • GCS has the highest egress of the big 3 github.com/mixpeek/awesom…

English

Mixpeek@mixpeek·16h

Your RAG pipeline is a Rube Goldberg machine. And it's why your AI hallucinates. LangChain → chunker → embedder → vector DB → re-ranker → LLM. 6 tools. 6 failure points. 6 things to debug at 3am. The fix: one retriever pipeline. Filter → Sort → Reduce → Enrich. One API call. mixpeek.com

English

Mixpeek@mixpeek·1d

Vector databases are a scam. $80K/yr for 1B vectors. You're paying for RAM. We built the same thing on S3 for $3,500. Rust shards, Ray coordinator, sub-10ms latency. 95% cheaper. Benchmarks open-sourced → mixpeek.com

English

Mixpeek@mixpeek·2d

RAG is the #1 AI pattern right now. Here's how it works in 60 seconds: → Retrieve: semantic search across your data → Augment: inject context into the LLM prompt → Generate: grounded answers, no hallucinations Every serious AI app runs on this. We built a multimodal retrieval API to make it easy → mixpeek.com

English

Mixpeek@mixpeek·26 Mar

Full benchmark results + implementation details in the blog post ↓ mixpeek.com/blog/colqwen2-…

English

Mixpeek@mixpeek·26 Mar

This makes multi-vector search practical at scale for the first time. Prior approaches either sacrificed quality (single-vector), speed (brute-force), or guarantees (PLAID). The verticals where this hits hardest: financial doc search, medical imaging, legal discovery — anywhere OCR is the bottleneck.

English

Mixpeek@mixpeek·26 Mar

We benchmarked every viable approach to multimodal document retrieval on financial tables and found a combination that hasn't been published before: ColQwen2 + MUVERA. 99.4% of brute-force quality. Sub-millisecond first-pass retrieval. OCR-based search isn't even close.

English

195

Mixpeek@mixpeek·18 Mar

Gemini Embedding 2 & Multi-file embeddings are both live in the Mixpeek Studio

Google@Google

Gemini Embedding 2, our first fully multimodal embedding model, is now available in Public Preview via the Gemini API and Vertex AI. Developers can now map text, images, video, and audio in one centralized space, with one model, which simplifies complex tasks like semantic search. Here's what this means and why it matters 🧵↓

English

100

Mixpeek@mixpeek·17 Mar

Your agent doesn't need 6 microservices to go from "find similar products under $50" to ranked, enriched results. It needs one retriever with 6 stages.

English

Mixpeek retweetledi

Anthony Katsur@anthonykatsur·6 Mar

The @IABTechLab has launched the Agent Registry as part of our Agentic Ad Management Protocols #AAMP! The Agent Registry is a new place for the industry to publish, discover, and connect with advertising agents across the ecosystem. Early participants already include @Equativ, @mixpeek and @PubMatic If you are building agentic advertising services, register your agents & help shape the future of agentic workflows. To learn more: iabtechlab.com/introducing-th… #AAMP #AgenticAdvertising #Agentic #AI #IABTechLab #Standards

GIF

English

458

Mixpeek retweetledi

ethan steininger 🔎@ethansteininger·9 Mar

Search infra assumes your query is a text string. What happens when it's a 500MB video? We just shipped query preprocessing for @mixpeek decompose large files into chunks, batch embed in parallel, run concurrent searches, fuse results. One API call. "query_preprocessing": { "max_chunks": 20, "aggregation": "rrf" } Ingestion applied to the query. Same pipeline that indexed your data now runs on your search input. #query-preprocessing" target="_blank" rel="nofollow noopener">mixpeek.com/docs/retrieval…

English

396

Mixpeek@mixpeek·3 Mar

Mixpeek Plugins are live. We built this because every team's extraction logic is different and forcing everyone through the same pre-built models doesn't work at scale. Now you can build custom feature extractors with: → Your own models → Your own weights → Your own API keys Wire them into multi-collection decomposition pipelines that break complex assets (video, documents, images) into structured, searchable features across purpose-built collections. Then deploy real-time inference endpoints so those same models serve retrieval at query time enabling multi-vector serving patterns like ColBERT, ColPali, and hybrid dense+sparse, all behind a single retriever. What this unlocks: you're no longer choosing between flexibility and infrastructure. Fine-tune a domain-specific embedding model, plug it in, and get batch processing + real-time serving without stitching together five different systems. Docs: mixpeek.com/docs/processin…

English

368

Mixpeek@mixpeek·22 Oca

Retrieval is the interface for the next generation of AI agents. Not "find documents"—find the exact 8 seconds, the specific table, the cited source. Grounded multimodal data for your agents. → mixpeek.com

English

Mixpeek@mixpeek·22 Oca

Production scale from day one. Financial services. Adtech. Healthcare. Media. Regulated industries where "it mostly works" isn't acceptable. Structured outputs your agent can cite. Timestamps it can reference. Sources it can link.

English

Mixpeek@mixpeek·22 Oca

Grounded multimodal data for your agents. We build the infrastructure that turns video, images, audio, and documents into structured context AI can actually use. Here's how we built it:

English

Keşfet

@IABTechLab @Equativ @PubMatic @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates