jasmine wang

80 posts

jasmine wang

@jasminechenwang

Yoga, Cognitive Science, Open Source technology, Startups, Sunset at the Beach

Katılım Haziran 2022

43 Takip Edilen48 Takipçiler

jasmine wang retweetledi

LanceDB@lancedb·6d

Only 2 weeks away from Data Engineering Open Forum 2026 in SF on April 16! Join us for "Powering Netflix's Multimodal Feature Engineering at Scale" and dive into how @netflix curates multimodal features across large video & image corpora, with LanceDB serving as the core storage and query layer for multimodal data.

English

372

jasmine wang retweetledi

Xuanwo@OnlyXuanwo·22 Mar

Working at @lancedb is kinda interesting, because we are forced to re-think everything at 100B scale. How big is 100B? 100B rows of 768-dim embeddings ≈ 300TB of raw vectors alone, before text, images, or indexes. At 1M writes/sec, it still takes 28 hours non-stop to fill 100B rows. The Milky Way has ~100-400B stars. We're basically building a database at galaxy scale. If you've read to the end and think this is cool, DM me your resume. We're looking for sharp minds to join us.

English

169

15.1K

jasmine wang retweetledi

AI超元域@AISuperDomain·2 Mar

🚀想不到这个为OpenClaw定制的增强版LanceDB插件项目发布才三天，都有大佬开始递交pr了。 #OpenClaw github.com/win4r/memory-l…

中文

871

jasmine wang retweetledi

LanceDB@lancedb·25 Şub

3/3 Huge thanks to: 🙌 Xin Sun (@bytedance) for driving the R-Tree implementation 🙌 Jay Narale (@uber) for the BKD prototype + benchmarking 🙌 @kylebarron2 (#GeoArrow / #GeoDataFusion) for ecosystem guidance 🙌 Tim Saucer (#ApacheDataFusion) for helping ensure a clean integration Read the blog: lancedb.com/blog/geo-suppo…

English

157

jasmine wang retweetledi

LanceDB@lancedb·25 Şub

1/3 Geospatial support just landed in Lance. And no new storage format work was required. Because Lance is Arrow-native, GeoArrow extension types work out of the box. Geometry columns are preserved end-to-end with zero special casing.

English

738

jasmine wang retweetledi

LanceDB@lancedb·16 Şub

1/4 Branching for ML data shouldn’t slow down production. Iceberg branching → shared metadata bottlenecks. Delta shallow clone → isolation, but loses Git-like UX. We want both. Here’s how Lance unifies branching, tagging, and shallow clone for AI workloads 🧵

English

536

jasmine wang retweetledi

Julien Chaumond@julien_c·14 Şub

in case you missed it @lancedb and HF are partnering up to unlock the next generation of large dataset storage on the Hub 🔥 And it's fire! - Supports storing embeddings (and their indexes) directly alongside the data - Vector search / similarity search is built-in - Large multimodal datasets (text, images, video) just use the hf:// prefix: db = lancedb. connect("hf://datasets/julien-c/hub-stats-lance") 🔥🔥

English

8.2K

jasmine wang retweetledi

LanceDB@lancedb·12 Şub

1/6 Here’s a quick example of how to read @huggingface datasets via LanceDB. Start with opening a LanceDB connection to a dataset on the Hub using the hf:// prefix path.

English

269

jasmine wang retweetledi

LanceDB@lancedb·4 Şub

1/5 Large multimodal blobs don’t have to break dataset workflows. Images and videos are often treated as external files, separate from metadata and indexes. Once datasets get large, that split makes exploration, curation, and training painful. Lance changes that on the 🤗 @huggingface Hub. 🧵👇

GIF

English

2.5K

jasmine wang retweetledi

LanceDB@lancedb·7 Oca

1/5 @lancedb 🫶🏻 @duckdb We’re happy to announce a new Lance extension for DuckDB! You can simply install this extension in DuckDB and point at your Lance datasets from within a DuckDB CLI or a Python script, while getting 𝗳𝘂𝗹𝗹 𝗦𝗤𝗟 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗼𝗻 𝘁𝗼𝗽 𝗼𝗳 𝗟𝗮𝗻𝗰𝗲 without copying your data!

English

2.1K

jasmine wang@jasminechenwang·24 Kas

@brianzhan1 @BusinessInsider Huge congrats and super excited for you Brian! Can’t wait to see the new seeds you are planting!

English

369

Brian Zhan@brianzhan1·24 Kas

After three years at CRV, I am stepping onto Striker Venture Partners' founding team, leading the firm's AI investments. Thanks to @BusinessInsider for covering the move.

English

371

468.3K

jasmine wang retweetledi

LanceDB@lancedb·22 Eki

1/7 🎨 In a world of infinite scroll, discovering art still feels like searching for a needle in a haystack. With SemanticDotArt, we flipped the question: What if you searched by mood, not just metadata? See how we did this in @lancedb 👇🏽

English

jasmine wang retweetledi

changhiskhan@changhiskhan·6 Eki

This is a big milestone for Lance format. The F3 paper (dl.acm.org/doi/10.1145/37…) verified that Lance has THE fastest random access, essential for search, shuffle, and many other AI workloads. But it incorrectly assumed it was because of lack of compression. With 2.1, we show that, yes indeed you can have your cake and eat it too. Not only does this release come with major improvements on compression without sacrificing performance, it also includes goodies like JSON and better nested data support. It’s also a proof point of how extensible the encodings are in Lance. You can read our blog post for all the fun details

LanceDB@lancedb

💾 Lance File 2.1 Is Now Stable 🥳 Big news from the LanceDB team — Lance File Format 2.1 is officially stable❗️ This release solves one of the biggest challenges from 2.0: 👉 adding compression without sacrificing *random access performance.

English

jasmine wang retweetledi

Apache Spark@ApacheSpark·8 Eyl

Join us for our webinar on Apache Spark™ and Lance Spark Connector with Jack Ye (@lancedb) on September 25! 👏 Learn how the Lance Spark Connector enables Apache Spark™ to work with Lance’s AI-native multimodal storage. ✅ We’ll look at how Spark can handle embeddings, images, videos, and documents with random access, indexing, and vector/blob support. We'll also cover integration with Hive Metastore, @unitycatalog_io, and examples of workflows for ingestion, analytics, feature engineering, and retrieval-augmented generation—using one dataset, without format conversions. 🔗 REGISTER: luma.com/76o36xuk 📅 September 25, 2025 ⏰ 9:30 – 10:30 AM PST 📍 Online #apachespark #spark #oss #opensource #lancedb #lance #sparkconnector @2twitme

English

1.1K

jasmine wang retweetledi

LanceDB@lancedb·12 Ağu

When building a columnar file reader, it becomes clear that 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮𝗻 𝗮𝗯𝘀𝘁𝗿𝗮𝗰𝘁 𝗰𝗼𝗻𝗰𝗲𝗽𝘁. (t.ly/3AyJh) It is the set of rules that determines how every byte of data is stored and accessed on disk. A few months ago, Weston Pace set out to 𝘀𝗼𝗹𝘃𝗲 𝗮 𝘁𝗿𝗶𝗰𝗸𝘆 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗶𝗻 𝗟𝗮𝗻𝗰𝗲. Small values like integers and booleans benefit from maximum compression, even if that means a bit of read amplification. Large values like vector embeddings, images, and documents need lightning fast random access without excessive RAM overhead.

English

2.1K

jasmine wang retweetledi

LanceDB@lancedb·6 Ağu

The data prep bottleneck for fine-tuning LLMs is a common challenge. 𝗢𝘂𝗿 𝗻𝗲𝘄 𝗶𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗠𝗲𝘁𝗮'𝘀 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗞𝗶𝘁 𝗵𝗲𝗿𝗲 𝘁𝗼 𝗳𝗶𝘅 𝘁𝗵𝗮𝘁! It simplifies the entire workflow with a 𝘀𝘁𝗿𝗮𝗶𝗴𝗵𝘁𝗳𝗼𝗿𝘄𝗮𝗿𝗱 𝗖𝗟𝗜 for generating high-quality, synthetic datasets. The package 𝘂𝘀𝗲𝘀 𝘁𝗵𝗲 𝗟𝗮𝗻𝗰𝗲 𝗳𝗼𝗿𝗺𝗮𝘁, so you can store and retrieve massive multimodal datasets. 𝗗𝗼𝗰𝘀: lancedb.com/docs/integrati…

English

577

jasmine wang retweetledi

LanceDB@lancedb·31 Tem

🚀 Video from @TMLS_TO : @character_ai x @LanceDB on building a unified multimodal data lake , a single system for text, audio, video & image retrieval. @changhiskhan @ryanvilim Simpler pipelines, lower infra costs, faster AI dev. 🎥 Watch: youtu.be/8zMeYwR9uQI #AI #LLM #Multimodal #VectorDB #TMLS

YouTube

English

622

jasmine wang retweetledi

Andriy Mulyar@andriy_mulyar·16 Tem

- built by solid db people and hackable (we have a contributor at nomic to it) - used by top ai companies / labs / products for it's nice properties when used in a training loops (e.g. midjourney has been using it since 2023) so probably not going anywhere - feels like the right way to work for lots of embeddings from the devex perspective in both their low level API (lance) and db wrapper

English

716

jasmine wang retweetledi

Charles 🎉 Frye@charles_irl·2 Tem

q from the audience: "Is Lance the next big thing in data?" answer: "Yes" 👀

LanceDB@lancedb

Thanks for @charles_irl , live from NY! Ethan from @runwayml is giving lots of love for LanceDB!

English

3.1K

jasmine wang retweetledi

LanceDB@lancedb·25 Haz

We just published a 𝗻𝗲𝘄 𝗯𝗹𝗼𝗴 (lancedb.com/blog/multimoda…) on what the 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 actually does. The Lakehouse is 𝗳𝗼𝗿 𝘁𝗵𝗼𝘀𝗲 working with a mix of text, images, audio, and structured data - 𝘄𝗵𝗼 𝘄𝗶𝘀𝗵 𝘁𝗼 𝗮𝘃𝗼𝗶𝗱 𝘁𝗵𝗲 𝗽𝗮𝗶𝗻 of manual configuration. You can use it to build real AI systems without dealing with orchestration, DAGs, or custom infrastructure.

English

1.3K

Keşfet

@netflix @lancedb @uber @kylebarron2 @huggingface @duckdb @brianzhan1 @BusinessInsider