Jared Sulzdorf

1.8K posts

Jared Sulzdorf banner
Jared Sulzdorf

Jared Sulzdorf

@j_sulz

I like pretty things, functional things, funny things, food things, and computer things. Not necessarily in that order.

Seattle, WA Katılım Eylül 2008
287 Takip Edilen379 Takipçiler
Jared Sulzdorf retweetledi
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
Any git power users in my mutuals/ timeline? We have a new and faster git experience coming up on Hugging Face and we'd love to get feedback from you! Comment or DM and I'll hook you up!
English
1
5
29
6K
Jared Sulzdorf retweetledi
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
The Hugging Face Hub team is on a tear recently: > You can create custom apps with domains on spaces > Edit GGUF metadata on the Fly > 100% of the Hub is powered by Xet - faster, efficient > Responses API support for ALL Inference Providers > MCP-UI support for HF MCP Server > Search papers based on the Org > Showcase repository size on the UI and a lot more - excited for the coming weeks/ months as we continue to improve the overall UX! 🤗
Vaibhav (VB) Srivastav tweet media
English
1
6
46
14.4K
Jared Sulzdorf retweetledi
Georgi Gerganov
Georgi Gerganov@ggerganov·
HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.
English
6
49
374
37.3K
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
Today, we've finalized this first phase of migrating the Hub to a new, modern storage system. One that's built to scale with AI builders of today and tomorrow. huggingface.co/blog/from-file… There's still a lot of work to do, but we're excited for what's next. 💪
English
0
0
1
61
Jared Sulzdorf retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Let me explain why Hugging Face Datasets storage is faster than S3 + why today's release changes everything 🧵
Quentin Lhoest 🤗 tweet media
English
11
56
566
75.4K
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
@SIGKITTEN @pcuenq @julien_c Thanks @pcuenq appreciate you helping out here! @SIGKITTEN safe to say you're downloading on a Mac? We've run afoul of the small default file descriptor limit there 😅 Like Pedro notes, upping it with `ulimit -n [BIG NUMBER]` will do the trick.
English
1
0
2
71
Julien Chaumond
Julien Chaumond@julien_c·
Please don’t download the weights all at once 🙏 or our servers will melt
Julien Chaumond tweet media
English
104
114
2.3K
221.8K
Jared Sulzdorf retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
New blog post 🚨 Every data engineer should read it @kszucs_ (@ApacheArrow PMC) announces how to drastically speed up Parquet files uploads and downloads. Yes, it can easily outspeed S3. Best part: the feature enabling this is open source Link in 🧵
Quentin Lhoest 🤗 tweet media
English
1
3
25
1.3K
Jared Sulzdorf retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
A new Pandas feature landed 3 days ago and no one noticed. Upload ONLY THE NEW DATA to dedupe-based storage like @huggingface (Xet). Data that already exist in other files don't need to be uploaded. Possible thanks to the recent addition of Content Defined Chunking for Parquet.
Quentin Lhoest 🤗 tweet media
English
3
11
48
17.1K
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes We call this the Git LFS Bridge internally and like our migration process, it's power is in its simplicity
Jared Sulzdorf tweet media
English
0
0
2
52
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
You can see over the past few months some of the biggest migrations show up in our cluster throughput. Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
Jared Sulzdorf tweet media
English
1
0
2
93
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
We've moved the first 20PB from Git LFS to Xet on @huggingface without any interruptions, now we're migrating the rest of the Hub. We got this far by focusing on the community first. Here's a deep dive on the infra making this possible and what's next: huggingface.co/blog/migrating…
English
1
2
28
5.7K
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
These are hard numbers to put into context, but let's try. The latest run of Common Crawl from @CommonCrawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. 🤯🤯🤯
English
0
0
0
38
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
Jared Sulzdorf tweet media
English
1
0
1
49
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
It's been a bit since I took a step back and looked at our progress to migrate @huggingface from Git LFS to Xet, but every time I do it's mind boggling. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? 🤗 700,000 users/orgs 📈 350,000 repos 🚀 15PB
Jared Sulzdorf tweet media
English
3
1
12
3.2K
Jared Sulzdorf retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Xet is now the default storage for new builders on @huggingface ! What it means for 🤗Datasets: - Deduplicated downloads and uploads for speed⚡ - Works with the new Parquet CDC writer, robust to insert/delete/edits 💪 @ApacheParquet has a bright future on HF :)
Quentin Lhoest 🤗 tweet media
English
3
6
41
5.8K
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
@yukiarimo Faster uploads and downloads and storage that will let the Hub continue to scale! Xet uses a chunk-based versioning approach instead of a file-based one. That, along with supporting infra and Rust client make for snappier transfers. More details here: huggingface.co/blog/from-file…
English
0
0
1
22
Jared Sulzdorf
Jared Sulzdorf@j_sulz·
New users and organizations can say goodbye to LFS on @huggingface; Xet is now the default storage for new builders on the Hub 🚀🚀🚀 Just sign up for an account, create a new repo, pip install huggingface_hub and you're off! huggingface.co/changelog/xet-…
English
3
6
27
16.9K