LanceDB

918 posts

LanceDB

@lancedb

Developer-friendly, open source AI-Native Multimodal Lakehouse https://t.co/wXn4tw5ySn

San Francisco, CA เข้าร่วม Nisan 2023

62 กำลังติดตาม4.1K ผู้ติดตาม

LanceDB@lancedb·2d

@huggingface @openclaw 7/7 More in the full newsletter! lancedb.com/blog/newslette…

English

LanceDB@lancedb·2d

@huggingface @openclaw 6/7 🗓️ Upcoming Events - Data Engineering Open Forum (SF) — Apr 16 - TokioConf (Portland) — Apr 20–22

English

LanceDB@lancedb·2d

1/7 LanceDB March Newsletter is out — focused on performance + scale 🔥 Here’s what shipped ↓

English

549

LanceDB@lancedb·3d

Bottom line: @lancedb's JSONB gives AI builders the same storage efficiency as Variant, with more flexibility, and no vendor lock-in. 3/3 Read the full blog post for details 👇🏽 lancedb.com/blog/lance-jso…

English

195

LanceDB@lancedb·3d

Agent data (tool calls, chain-of-thought traces, RAG context) is 𝘁𝗲𝘅𝘁-𝗵𝗲𝗮𝘃𝘆, 𝗱𝗲𝗲𝗽𝗹𝘆 𝗻𝗲𝘀𝘁𝗲𝗱, and distinct between rows. It's not structured metadata. That's exactly where @lancedb's JSONB matches Variant in storage size, while offering more flexibility. 2/3

English

260

LanceDB@lancedb·3d

𝘿𝙤 𝙮𝙤𝙪 𝙧𝙚𝙖𝙡𝙡𝙮 𝙣𝙚𝙚𝙙 𝙑𝙖𝙧𝙞𝙖𝙣𝙩 𝙛𝙤𝙧 𝙮𝙤𝙪𝙧 𝘼𝙄 𝙙𝙖𝙩𝙖? We benchmarked Lance JSONB vs Parquet Variant on real-world JSON workloads. On text-heavy data with mixed schemas Variant and @lancedb's JSONB are within 𝟬-𝟴% of one another — essentially equal. Variant's 𝟮-𝟰x storage advantage only appears when documents share the same structure with short, repetitive fields 1/3

English

1.2K

LanceDB@lancedb·4d

@LlamaIndex @itsclelia 4/4 Check out the repo: github.com/run-llama/llam… Shoutout to @itsclelia for collaborating on this piece! Great work pushing on parsing + retrieval together 🤝

English

113

LanceDB@lancedb·4d

@LlamaIndex @itsclelia 3/4 This avoids splitting modalities across systems and losing context between stages. The agent can retrieve what it needs, in the form it needs. On our eval dataset, this setup reaches near-perfect accuracy on complex QA. Full breakdown: lancedb.com/blog/smart-par…

English

144

LanceDB@lancedb·4d

1/4 If your agent fails on PDFs with tables or charts, it’s usually not the model. It’s the data pipeline. 🧵 We worked with @LlamaIndex and @itsclelia to build a structure-aware QA pipeline using LiteParse. lancedb.com/blog/smart-par…

English

399

LanceDB@lancedb·5d

3/3 Compression is applied without breaking the access path. Less data read → faster batch fetch → higher GPU utilization. Benchmarks: lancedb.com/blog/lance-for…

English

161

LanceDB@lancedb·5d

2/3 With Lance format v2.2: - ~50% smaller than Parquet on text-heavy data - ~75x faster random blob reads (image/video fetch) - Sampling + filtering performance stays flat - No changes to application code

English

187

LanceDB@lancedb·5d

1/3 If your dataset doubles, your storage cost shouldn’t. And your GPUs shouldn’t slow down reading it. Most formats force a tradeoff between compression and access speed. Lance format v2.2 doesn’t.

English

455

LanceDB@lancedb·6d

3/3 Because Lance integrates both the file format and table format, fragment metadata tracks which blob objects are referenced by each dataset version. That enables version-aware garbage collection and compaction without rewriting large blobs. Full design: lancedb.com/blog/lance-blo…

English

191

LanceDB@lancedb·6d

2/3 Every blob is encoded as the same on-disk Arrow struct: (kind, position, size, blob_id, blob_uri) The kind field identifies how the blob is stored, but the logical column type stays the same. This lets the system route reads without changing schemas or APIs.

English

191

LanceDB@lancedb·6d

1/3 Rewriting multi-GB blobs during compaction or schema evolution is one of the worst failure modes in multimodal datasets. Lance Blob V2 avoids this by defining blob storage directly in the **format design**. 🧵👇

English

495

ค้นพบ

@huggingface @openclaw @LlamaIndex @itsclelia @elonmusk @BarackObama @taylorswift13 @cristiano