Following the earlier announcement this year regarding a framework for an open AI ecosystem, the nonprofit Creative Commons has expressed its support for “pay-to-crawl” technology — a method designed to automate payment for website content accessed by automated systems, such as AI web crawlers.
Creative Commons (CC) is widely recognized for leading the licensing initiative that enables creators to share their works while maintaining copyright. In July, the organization unveiled a strategy to create a legal and technical structure for sharing datasets between companies that own the data and AI developers seeking to utilize it for training.
Now, the nonprofit is cautiously endorsing pay-to-crawl models, indicating it is “cautiously supportive.”
“If implemented effectively, pay-to-crawl could offer a mechanism for websites to support the production and dissemination of their content, while managing substitute uses, ensuring that content remains accessible to the public where it might not otherwise be shared or might succumb to more stringent paywalls,” a blog post from CC stated.
Led by companies such as Cloudflare, the concept of pay-to-crawl aims to impose charges on AI bots each time they extract information from a site to compile content for model training and updates.
Previously, websites permitted web crawlers to index their content for search engines like Google without restrictions. They reaped the benefits of this partnership through visibility in search results, attracting visitors and clicks. However, with the advent of AI technology, this relationship has transformed. After a consumer receives an answer from an AI chatbot, they are often disinclined to visit the original source.
This change has already proven detrimental to publishers by drastically reducing search traffic, and there are no signs of this trend reversing.
Conversely, a pay-to-crawl approach could assist publishers in recovering from the financial impact caused by AI. Moreover, it could be advantageous for smaller web publishers lacking the leverage to negotiate bespoke content agreements with AI providers. Major arrangements have been made between firms like OpenAI and Condé Nast, Axel Springer, and more, as well as with Perplexity and Gannett; Amazon and The New York Times; and Meta with various media publishers, among others.
CC has provided several stipulations regarding its support for pay-to-crawl, highlighting that such systems might centralize power on the internet. It could also restrict content access for “researchers, nonprofits, cultural heritage institutions, educators, and other entities operating in the public interest.”
It proposed a set of guidelines for the responsible implementation of pay-to-crawl, including avoiding making it the default option for all websites and steering clear of blanket regulations for the entire web. Additionally, it advocated for pay-to-crawl systems to permit throttling, not merely blocking, and to maintain public interest access. They should also be transparent, interoperable, and constructed with standardized components.
Cloudflare is not the sole corporation invested in the pay-to-crawl arena.
Microsoft is also developing an AI marketplace for publishers, while newer startups like ProRata.ai and TollBit are beginning to explore similar avenues. A consortium known as the RSL Collective has introduced its own specification for a new standard referred to as Really Simple Licensing (RSL), which would regulate what areas of a website crawlers could reach but would not block their access. Cloudflare, Akamai, and Fastly have since embraced RSL, which has the backing of Yahoo, Ziff Davis, O’Reilly Media, and others.
CC was also among those who announced its endorsement of RSL, along with CC signals, its wider initiative aimed at developing technology and resources for the AI era.