Introduction to Cloudflare and Perplexity
Cloudflare, a well-known internet security and performance company, recently announced that it has delisted Perplexity’s crawler as a verified bot. This decision was made after multiple user complaints and an investigation that revealed Perplexity was using aggressive rogue bot tactics to force its crawlers onto websites. Perplexity’s actions were found to be in violation of Cloudflare’s requirements for verified bots, which include obeying the robots.txt protocol and refraining from using undeclared IP addresses.
What is Cloudflare’s Verified Bots Program?
Cloudflare has a system called Verified Bots that whitelists bots in their system, allowing them to crawl the websites that are protected by Cloudflare. To maintain their privileged status, verified bots must conform to specific policies, such as obeying the robots.txt protocols. The robots.txt protocol is a standard used by websites to communicate with web crawlers and other web robots. It provides a way for websites to specify which parts of their site should not be crawled or indexed by search engines.
Perplexity’s Violations
Perplexity was found to be violating Cloudflare’s requirements in several ways. The company was using aggressive rogue bot tactics, including rotating IP addresses, changing ASNs, and impersonating browsers like Chrome. These actions allowed Perplexity to circumvent the robots.txt protocol and crawl websites that had explicitly blocked their crawlers. Perplexity’s actions were seen as a serious violation of Cloudflare’s policies and a threat to the integrity of the internet.
Stealth Crawling Behavior: Rotating IP Addresses
Perplexity’s crawlers were found to be using rotating IP addresses, changing ASNs, and impersonating browsers like Chrome. This allowed them to evade blocks and crawl websites that had explicitly blocked their crawlers. An ASN, or Autonomous System Number, is a unique identifier assigned to a group of IP addresses. By changing ASNs, Perplexity’s crawlers were able to disguise themselves as legitimate traffic and avoid detection.
Stealth Crawling Behavior: Spoofed User Agent
Perplexity’s crawlers were also found to be spoofing their user agent, posing as a human user browsing with Chrome on a Mac operating system. This allowed them to bypass filters that block known crawlers and crawl websites that had explicitly blocked their crawlers. The user agent is a string of text that identifies the browser or crawler making a request to a website. By spoofing their user agent, Perplexity’s crawlers were able to disguise themselves as legitimate traffic and avoid detection.
Cloudflare’s Response
In response to Perplexity’s violations, Cloudflare delisted the company as a verified bot and implemented new blocking rules to prevent their stealth crawling. This decision was seen as a strong response to aggressive bot behavior and a necessary step to protect the integrity of the internet. Cloudflare’s actions will help to prevent Perplexity’s crawlers from evading blocks and crawling websites that have explicitly blocked their crawlers.
Takeaways
There are several key takeaways from this incident:
- Perplexity violated Cloudflare’s Verified Bots policy, which grants crawling access to trusted bots that follow common-sense rules like honoring the robots.txt protocol.
- Perplexity used stealth crawling tactics, including rotating IP addresses and spoofing their user agent, to crawl content after being blocked from accessing it.
- Cloudflare’s response was swift and decisive, delisting Perplexity as a verified bot and implementing new blocking rules to prevent their stealth crawling.
- The incident highlights the importance of following rules and respecting website directives, and the need for companies like Cloudflare to take strong action against aggressive bot behavior.
Conclusion
In conclusion, Cloudflare’s decision to delist Perplexity as a verified bot and block their stealth crawling is a strong response to aggressive bot behavior. The incident highlights the importance of following rules and respecting website directives, and the need for companies like Cloudflare to take strong action against aggressive bot behavior. By taking this action, Cloudflare is helping to protect the integrity of the internet and ensure that websites are able to control who crawls their content.