Jared Sulzdorf

1.8K posts

Jared Sulzdorf

@j_sulz

I like pretty things, functional things, funny things, food things, and computer things. Not necessarily in that order.

Seattle, WA Katılım Eylül 2008

287 Takip Edilen379 Takipçiler

Jared Sulzdorf retweetledi

Vaibhav (VB) Srivastav@reach_vb·22 Eki

Any git power users in my mutuals/ timeline? We have a new and faster git experience coming up on Hugging Face and we'd love to get feedback from you! Comment or DM and I'll hook you up!

English

Jared Sulzdorf retweetledi

Vaibhav (VB) Srivastav@reach_vb·9 Eki

The Hugging Face Hub team is on a tear recently: > You can create custom apps with domains on spaces > Edit GGUF metadata on the Fly > 100% of the Hub is powered by Xet - faster, efficient > Responses API support for ALL Inference Providers > MCP-UI support for HF MCP Server > Search papers based on the Org > Showcase repository size on the UI and a lot more - excited for the coming weeks/ months as we continue to improve the overall UX! 🤗

English

14.4K

Jared Sulzdorf retweetledi

Georgi Gerganov@ggerganov·7 Eki

HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.

English

374

37.3K

Jared Sulzdorf@j_sulz·3 Eki

Today, we've finalized this first phase of migrating the Hub to a new, modern storage system. One that's built to scale with AI builders of today and tomorrow. huggingface.co/blog/from-file… There's still a lot of work to do, but we're excited for what's next. 💪

English

Jared Sulzdorf@j_sulz·3 Eki

The Hub is on 100% on Xet. 🚀 A little over a year ago, @huggingface acquired @xetdata to unlock the next phase of growth in models and datasets. huggingface.co/blog/xethub-jo… In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.

English

1.7K

Jared Sulzdorf retweetledi

Quentin Lhoest 🤗@lhoestq·12 Ağu

Let me explain why Hugging Face Datasets storage is faster than S3 + why today's release changes everything 🧵

English

566

75.4K

Jared Sulzdorf@j_sulz·5 Ağu

@SIGKITTEN @pcuenq @julien_c Thanks @pcuenq appreciate you helping out here! @SIGKITTEN safe to say you're downloading on a Mac? We've run afoul of the small default file descriptor limit there 😅 Like Pedro notes, upping it with `ulimit -n [BIG NUMBER]` will do the trick.

English

SIGKITTEN@SIGKITTEN·5 Ağu

@pcuenq @julien_c nah, that was a good guess

English

269

Julien Chaumond@julien_c·5 Ağu

Please don’t download the weights all at once 🙏 or our servers will melt

English

104

114

2.3K

221.8K

Jared Sulzdorf retweetledi

Quentin Lhoest 🤗@lhoestq·25 Tem

New blog post 🚨 Every data engineer should read it @kszucs_ (@ApacheArrow PMC) announces how to drastically speed up Parquet files uploads and downloads. Yes, it can easily outspeed S3. Best part: the feature enabling this is open source Link in 🧵

English

1.3K

Jared Sulzdorf retweetledi

Quentin Lhoest 🤗@lhoestq·21 Tem

A new Pandas feature landed 3 days ago and no one noticed. Upload ONLY THE NEW DATA to dedupe-based storage like @huggingface (Xet). Data that already exist in other files don't need to be uploaded. Possible thanks to the recent addition of Content Defined Chunking for Parquet.

English

17.1K

Jared Sulzdorf@j_sulz·15 Tem

A sneaky part of making this all work is our backward compatibility with Git LFS. This allows us to roll out a significant protocol change without forcing workflow changes We call this the Git LFS Bridge internally and like our migration process, it's power is in its simplicity

English

Jared Sulzdorf@j_sulz·15 Tem

You can see over the past few months some of the biggest migrations show up in our cluster throughput. Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s

English

Jared Sulzdorf@j_sulz·15 Tem

We've moved the first 20PB from Git LFS to Xet on @huggingface without any interruptions, now we're migrating the rest of the Hub. We got this far by focusing on the community first. Here's a deep dive on the infra making this possible and what's next: huggingface.co/blog/migrating…

English

5.7K

Jared Sulzdorf@j_sulz·26 Haz

These are hard numbers to put into context, but let's try. The latest run of Common Crawl from @CommonCrawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. 🤯🤯🤯

English

Jared Sulzdorf@j_sulz·26 Haz

Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

English

Jared Sulzdorf@j_sulz·26 Haz

It's been a bit since I took a step back and looked at our progress to migrate @huggingface from Git LFS to Xet, but every time I do it's mind boggling. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? 🤗 700,000 users/orgs 📈 350,000 repos 🚀 15PB

English

3.2K

Jared Sulzdorf retweetledi

Quentin Lhoest 🤗@lhoestq·27 May

Xet is now the default storage for new builders on @huggingface ! What it means for 🤗Datasets: - Deduplicated downloads and uploads for speed⚡ - Works with the new Parquet CDC writer, robust to insert/delete/edits 💪 @ApacheParquet has a bright future on HF :)

English

5.8K

Jared Sulzdorf@j_sulz·29 May

@yukiarimo Faster uploads and downloads and storage that will let the Hub continue to scale! Xet uses a chunk-based versioning approach instead of a file-based one. That, along with supporting infra and Rust client make for snappier transfers. More details here: huggingface.co/blog/from-file…

English

Jared Sulzdorf@j_sulz·27 May

New users and organizations can say goodbye to LFS on @huggingface; Xet is now the default storage for new builders on the Hub 🚀🚀🚀 Just sign up for an account, create a new repo, pip install huggingface_hub and you're off! huggingface.co/changelog/xet-…

English

16.9K

Jared Sulzdorf@j_sulz·27 May

To migrate your existing repos to Xet, sign up here huggingface.co/join/xet And we'll take care of the rest 🤗

English

108

Keşfet

@huggingface @xetdata @SIGKITTEN @pcuenq @julien_c @kszucs_ @ApacheArrow @CommonCrawl