March 2026 Crawl Archive Now Available
We are pleased to announce the release of the March 2026 crawl, containing 1.97 billion web pages, or 344.64 TiB of uncompressed content. We also observed a dramatic increase in fetches over IPv6, explained by the enabling of Happy Eyeballs in the OkHttp library.
IPv6 Adoption Across the Top 100K Web Hosts
We probed the 100,000 most-linked web hosts for IPv6 support using the Common Crawl Web Graph. Only 36.9% are fully reachable over IPv6, with adoption ranging from 71% among the top 100 to 32% in the long tail.
Common Crawl Foundation would like to share with you an updated overview of our organization as of March 2026.
Please let us know if we can be helpful.
drive.google.com/file/d/1ww2R0x…
I've been dabbling with Claude Code. Using English as a programming language, I wrote a C compiler. @kazabyte/english-as-a-programming-language-how-i-wrote-a-c-compiler-with-claude-f2557fbbf20f" target="_blank" rel="nofollow noopener">medium.com/@kazabyte/engl…
github.com/commoncrawl/cc…
Summary of changes
This PR contains substantial redesign and refactoring for the following:
- Interactive charts instead of static images
- New domain lookup tool for plotting HC and PR (and even comparison of two different domains) over time
- Combine avgindegree and avgdegree plots into avgdegree (closes Merge plots avgoutdegree and avgindegree into avgdegree #4)
- Add appropriate reference links to harmonic centrality (closes Add links to research papers in the section explaning harmonic centrality #5)
- Make masthead image disappear quicker via parallax scrolling so that content is reached faster
- Substantial mobile/responsive UX improvements
- Improve rank tables UX to be one unified table
- Proper links and sanitation of anchor tags with rel= and target=
February 2026 Crawl Archive Now Available
We are pleased to announce the release of the February 2026 crawl, consisting of 2.1 billion web pages (or 363 TiB of uncompressed content). Captures are from 45.5 million hosts or 37.1 million registered domains.
Introducing the New Examples & Resources Browser
We've replaced our old Examples and Use Cases pages with a single searchable, filterable browser. 119 resources from 115 contributors, all in one place. Search, filter by type or language, sort, and share links. We welcome community submissions.
AI Plumbers at FOSDEM’26
Common Crawl was invited to the AI Plumbers unconference held at FOSDEM this year. The contrast between the 100 people at the unconference, compared to the 10,000 people at the main event, couldn't be bigger.
Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
In collaboration with @CommonCrawl@MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.