LanceDB

918 posts

LanceDB banner
LanceDB

LanceDB

@lancedb

Developer-friendly, open source AI-Native Multimodal Lakehouse https://t.co/wXn4tw5ySn

San Francisco, CA เข้าร่วม Nisan 2023
62 กำลังติดตาม4.1K ผู้ติดตาม
LanceDB
LanceDB@lancedb·
@huggingface @openclaw 6/7 🗓️ Upcoming Events - Data Engineering Open Forum (SF) — Apr 16 - TokioConf (Portland) — Apr 20–22
LanceDB tweet mediaLanceDB tweet media
English
1
0
0
98
LanceDB
LanceDB@lancedb·
1/7 LanceDB March Newsletter is out — focused on performance + scale 🔥 Here’s what shipped ↓
LanceDB tweet media
English
1
0
5
549
LanceDB
LanceDB@lancedb·
Bottom line: @lancedb's JSONB gives AI builders the same storage efficiency as Variant, with more flexibility, and no vendor lock-in. 3/3 Read the full blog post for details 👇🏽 lancedb.com/blog/lance-jso…
English
0
0
2
195
LanceDB
LanceDB@lancedb·
Agent data (tool calls, chain-of-thought traces, RAG context) is 𝘁𝗲𝘅𝘁-𝗵𝗲𝗮𝘃𝘆, 𝗱𝗲𝗲𝗽𝗹𝘆 𝗻𝗲𝘀𝘁𝗲𝗱, and distinct between rows. It's not structured metadata. That's exactly where @lancedb's JSONB matches Variant in storage size, while offering more flexibility. 2/3
LanceDB tweet media
English
1
0
2
260
LanceDB
LanceDB@lancedb·
𝘿𝙤 𝙮𝙤𝙪 𝙧𝙚𝙖𝙡𝙡𝙮 𝙣𝙚𝙚𝙙 𝙑𝙖𝙧𝙞𝙖𝙣𝙩 𝙛𝙤𝙧 𝙮𝙤𝙪𝙧 𝘼𝙄 𝙙𝙖𝙩𝙖? We benchmarked Lance JSONB vs Parquet Variant on real-world JSON workloads. On text-heavy data with mixed schemas Variant and @lancedb's JSONB are within 𝟬-𝟴% of one another — essentially equal. Variant's 𝟮-𝟰x storage advantage only appears when documents share the same structure with short, repetitive fields 1/3
LanceDB tweet media
English
1
0
6
1.2K
LanceDB
LanceDB@lancedb·
@LlamaIndex @itsclelia 3/4 This avoids splitting modalities across systems and losing context between stages. The agent can retrieve what it needs, in the form it needs. On our eval dataset, this setup reaches near-perfect accuracy on complex QA. Full breakdown: lancedb.com/blog/smart-par…
English
1
0
1
144
LanceDB
LanceDB@lancedb·
3/3 Compression is applied without breaking the access path. Less data read → faster batch fetch → higher GPU utilization. Benchmarks: lancedb.com/blog/lance-for…
English
0
0
1
161
LanceDB
LanceDB@lancedb·
2/3 With Lance format v2.2: - ~50% smaller than Parquet on text-heavy data - ~75x faster random blob reads (image/video fetch) - Sampling + filtering performance stays flat - No changes to application code
English
1
0
2
187
LanceDB
LanceDB@lancedb·
1/3 If your dataset doubles, your storage cost shouldn’t. And your GPUs shouldn’t slow down reading it. Most formats force a tradeoff between compression and access speed. Lance format v2.2 doesn’t.
LanceDB tweet media
English
2
0
7
455
LanceDB
LanceDB@lancedb·
3/3 Because Lance integrates both the file format and table format, fragment metadata tracks which blob objects are referenced by each dataset version. That enables version-aware garbage collection and compaction without rewriting large blobs. Full design: lancedb.com/blog/lance-blo…
English
0
0
0
191
LanceDB
LanceDB@lancedb·
2/3 Every blob is encoded as the same on-disk Arrow struct: (kind, position, size, blob_id, blob_uri) The kind field identifies how the blob is stored, but the logical column type stays the same. This lets the system route reads without changing schemas or APIs.
English
1
0
1
191
LanceDB
LanceDB@lancedb·
1/3 Rewriting multi-GB blobs during compaction or schema evolution is one of the worst failure modes in multimodal datasets. Lance Blob V2 avoids this by defining blob storage directly in the **format design**. 🧵👇
English
1
0
1
495