
We want to clarify something: one of the motivations for building this feature is that building a crawler that respects crawling best practices is hard. Too many crawlers use this as an excuse not to abide by site owners' preferences, and respect directives like robots.txt. /crawl does this out of the box. Additionally, the /crawl endpoint cannot bypass Cloudflare's bot detection or Captchas, and self-identifies as a bot. It respects all the same protections available to operators today: AI Crawl Control, robots.txt, Content Signals and Pay-Per-Crawl. So, site owners can and should still choose how their content is consumed, and our /crawl endpoint will respect it.


























