LanceDB

1.1K posts

LanceDB

@lancedb

The multimodal lakehouse for AI, accelerating large-scale data curation and feature engineering so teams can build better models faster.

San Francisco, CA Katılım Nisan 2023

65 Takip Edilen4.4K Takipçiler

LanceDB@lancedb·13h

@Ceph has never had native nearest neighbor search, so RAG meant bolting on a vector database and running two systems. 🖥️ July 16, @seagate's Christine Bassani and @ibm's Kyle Bader are joined by Prashanth Rao @tech_optimist from LanceDB to show embedding search built directly into Ceph, through the S3 Vectors API. Billion scale, multi tenant, same storage stack you already run. 🖇️ Register: na2.hubs.ly/H06F5HW0

English

330

LanceDB@lancedb·1d

Embodied AI runs on complete real-robot trajectories covering long-horizon deformable manipulation, dual-arm coordination, and recovery from failure. The bottleneck is rarely the model. It's the data. Lion Rock rebuilt their foundation on Lance. One copy in S3, read and written in place. Long videos cut into GOP blobs so you keep video-level compression and still get frame-level random access. Actions, states, video, and annotations land in a single table row. The result is 1.7-6.0x faster random reads than their previous system and roughly 42% storage savings, with no high-frequency action data lost. Read how they did it: na2.hubs.ly/H06Cvb90

English

264

LanceDB@lancedb·5d

@cloudera @puppyquery @googlecloud 🔗 More in June newsletter: lancedb.com/blog/newslette… 6/6

English

LanceDB@lancedb·5d

@cloudera @puppyquery @googlecloud 📅 Ceph: Object Storage Meets Vector Search • July 16 • Virtual: brighttalk.com/webcast/663/66… 5/6

English

102

LanceDB@lancedb·5d

🌟 June's highlight: Lance averaged 140ms commit latency across a 10,000-commit S3 benchmark, versus 534ms for Delta Lake and 457ms for Iceberg. lancedb.com/blog/a-metadat… 1/6

English

630

LanceDB@lancedb·7 Tem

And that's a wrap on World's Fair in SF 🎉 Thanks to everyone who stopped by the booth to say hi and dive into LanceDB internals. Some of the best conversations happen at a conference booth and this year was no exception. It was great meeting everyone who came out to SPIN SF with @theoryvc & @ollama and to our USA World Cup watch party with @ExaAILabs, @ExtendHQ, & @fastinoAI last week. Both were packed, both were a good time, and we'd absolutely do them again. See you at the next one 👋

English

4.9K

LanceDB@lancedb·6 Tem

How do you trace one number in a 200 page ESG report back to the exact page it came from? We dug into that with the @llama_index team behind LiteParse. We tested five ways to retrieve evidence across six sustainability reports and 50 labeled questions. Fusing pages, chunks, and figures across separate LanceDB tables, joined by one page_id, got 82% any-page-hit@5 at 4.7ms. An agent answering from that got 74% right. Full pipeline and benchmarks: lancedb.com/blog/from-mess…

English

1.6K

LanceDB@lancedb·1 Tem

LibriSpeech in Lance format on @huggingface. One table holds FLAC audio, transcripts, speaker metadata, and transcript embeddings, with search and FTS indices pre-built. Semantic search over transcripts never touches the audio column, even at full scale. huggingface.co/datasets/lance…

English

781

LanceDB@lancedb·30 Haz

Qwen2.5-VL-3B QLoRA fine-tuning on TextVQA: the vision tower is frozen during training but most pipelines still run it on every epoch. Compute the embeddings once, add them as a new column on the same Lance table as the raw images, no table rewrite, then drop the vision tower from the training loop entirely. lancedb.com/blog/faster-vl…

English

737

LanceDB@lancedb·29 Haz

@aiDotEngineer @Theoryvc @ollama 3/3 Wednesday 7/1: Team USA watch party with @ExaAILabs, @FastinoAI, and @ExtendHQ. Doors at 4:30, game at 5. Register: luma.com/hm58nlsn

English

133

LanceDB@lancedb·29 Haz

@aiDotEngineer 2/3 Two side events if you want off the floor 🕺 Tuesday 6/30: LocalServe at SPIN SF with @TheoryVC and @Ollama. Ping pong tournament, drinks, and infra conversations. Register: luma.com/localserve

English

289

LanceDB@lancedb·29 Haz

1/3 At @aiDotEngineer World's Fair this week. Booth B1, come say hi, grab swag, or get your hardest data stack questions answered by the people who built it. Building training pipelines, curation workflows, or feature engineering on multimodal data at scale? Come find us!

English

372

LanceDB@lancedb·25 Haz

@prestodb 3/ Shoutout to Jianjian Xie, Lance maintainer and Senior SWE at Uber, for building this and getting it merged into the @prestodb project. Connector architecture and setup guide: prestodb.io/blog/2026/06/2…

English

188

LanceDB@lancedb·25 Haz

2/ How it works: → Each Lance fragment maps 1:1 to a Presto split, reads fan out across workers automatically → SQL filters compile down to the native Rust scanner, pruning rows on disk before JVM memory load → Arrow RecordBatches stream across JNI with no serialization overhead

English

255

LanceDB@lancedb·25 Haz

1/ Lance is now a native @prestodb connector. Run distributed SQL across Lance tables and join them against Hive, Iceberg, or Postgres in a single query.

English

781

LanceDB@lancedb·24 Haz

Take a break from @aiDotEngineer World's Fair next week to cheer on Team USA at the World Cup with us, @ExaAILabs, @ExtendHQ, and Pioneer by @fastinoAI ⚽ 🎉 Come for the food, drinks, and soccer. Stay for the conversations. 📅 Wed July 1 @ 4:30PM 🔗 Space is limited! luma.com/hm58nlsn

English

631

LanceDB@lancedb·24 Haz

Updating a label on a row that contains a 50MB video shouldn't require Spark to read that video. Lance Blob V2 in Lance Spark keeps blob columns as descriptors through the plan. Bytes materialize only on write. A blob column stays a column. The SQL stays the same. What changes is what Spark actually carries through the plan. lancedb.com/blog/lance-blo…

English

637

Keşfet

@Ceph @seagate @ibm @tech_optimist @cloudera @puppyquery @googlecloud @theoryvc