Common Crawl Foundation

1.4K posts

Common Crawl Foundation

@CommonCrawl

Common Crawl is a non-profit foundation dedicated to the Open Web.

San Francisco, CA Beigetreten Şubat 2010

1.6K Folgt7.8K Follower

Common Crawl Foundation retweetet

Financial Times@FT·11h

Mistral CEO: AI companies should pay a content levy in Europe ft.trib.al/hKU8k0g | opinion

English

67.8K

Common Crawl Foundation@CommonCrawl·1d

commoncrawl.org/blog/march-202…

ZXX

Common Crawl Foundation@CommonCrawl·1d

March 2026 Crawl Archive Now Available We are pleased to announce the release of the March 2026 crawl, containing 1.97 billion web pages, or 344.64 TiB of uncompressed content. We also observed a dramatic increase in fetches over IPv6, explained by the enabling of Happy Eyeballs in the OkHttp library.

English

490

Common Crawl Foundation@CommonCrawl·3d

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record eff.org/deeplinks/2026… @eff

English

133

Common Crawl Foundation@CommonCrawl·4d

commoncrawl.org/blog/ipv6-adop…

ZXX

151

Common Crawl Foundation@CommonCrawl·4d

IPv6 Adoption Across the Top 100K Web Hosts We probed the 100,000 most-linked web hosts for IPv6 support using the Common Crawl Web Graph. Only 36.9% are fully reachable over IPv6, with adoption ranging from 71% among the top 100 to 32% in the long tail.

English

297

Common Crawl Foundation@CommonCrawl·4d

We've never had an entire city ask to be deleted before...

English

1.1K

Common Crawl Foundation@CommonCrawl·5d

Common Crawl Foundation would like to share with you an updated overview of our organization as of March 2026. Please let us know if we can be helpful. drive.google.com/file/d/1ww2R0x…

English

395

Common Crawl Foundation retweetet

Wayne Yamamoto@kazabyte·6 Mar

I've been dabbling with Claude Code. Using English as a programming language, I wrote a C compiler. @kazabyte/english-as-a-programming-language-how-i-wrote-a-c-compiler-with-claude-f2557fbbf20f" target="_blank" rel="nofollow noopener">medium.com/@kazabyte/engl…

English

385

Common Crawl Foundation@CommonCrawl·7 Mar

commoncrawl.org/blog/web-graph…

ZXX

113

Common Crawl Foundation@CommonCrawl·6 Mar

github.com/commoncrawl/cc… Summary of changes This PR contains substantial redesign and refactoring for the following: - Interactive charts instead of static images - New domain lookup tool for plotting HC and PR (and even comparison of two different domains) over time - Combine avgindegree and avgdegree plots into avgdegree (closes Merge plots avgoutdegree and avgindegree into avgdegree #4) - Add appropriate reference links to harmonic centrality (closes Add links to research papers in the section explaning harmonic centrality #5) - Make masthead image disappear quicker via parallax scrolling so that content is reached faster - Substantial mobile/responsive UX improvements - Improve rank tables UX to be one unified table - Proper links and sanitation of anchor tags with rel= and target=

English

255

Common Crawl Foundation@CommonCrawl·24 Şub

commoncrawl.org/blog/host--and…

ZXX

139

Common Crawl Foundation@CommonCrawl·24 Şub

February 2026 Crawl Archive Now Available We are pleased to announce the release of the February 2026 crawl, consisting of 2.1 billion web pages (or 363 TiB of uncompressed content). Captures are from 45.5 million hosts or 37.1 million registered domains.

English

198

Common Crawl Foundation@CommonCrawl·23 Şub

commoncrawl.org/blog/introduci…

ZXX

112

Common Crawl Foundation@CommonCrawl·23 Şub

Introducing the New Examples & Resources Browser We've replaced our old Examples and Use Cases pages with a single searchable, filterable browser. 119 resources from 115 contributors, all in one place. Search, filter by type or language, sort, and share links. We welcome community submissions.

English

655

Common Crawl Foundation@CommonCrawl·18 Şub

techdirt.com/2026/02/17/pre…

ZXX

158

Common Crawl Foundation@CommonCrawl·16 Şub

commoncrawl.org/blog/ai-plumbe…

ZXX

135

Common Crawl Foundation@CommonCrawl·16 Şub

AI Plumbers at FOSDEM’26 Common Crawl was invited to the AI Plumbers unconference held at FOSDEM this year. The contrast between the 100 people at the unconference, compared to the 10,000 people at the main event, couldn't be bigger.

English

173

Common Crawl Foundation retweetet

EleutherAI@AiEleuther·13 Şub

Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data In collaboration with @CommonCrawl @MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

English

3.5K

Entdecken

@EFF @MLCommons @JohnsHopkins @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates