Months before I joined CloudFlare as a programmer I signed up for the company's service to protect my blog from hackers, spammers and scrapers. I saw an instant reduction in the amount of spam, an enormous decrease in hacking attempts, and a halt to the bots that scrape site content.
All that was long before CloudFlare launched its specific anti-content scraping app called ScrapeShield that builds on CloudFlare's existing services to provide a complete package of anti-scraping and tracking tools. ScrapeShield exists, and is powerful, because of CloudFlare's deep roots watching and profiling the behavior of bad web visitors that goes far beyond the short history of the company.
Part of the original inspiration for CloudFlare was an anti-spamming project launched in 2004 by some of CloudFlare's founders called Project Honeypot. That project created an enormous secret, dark web that trapped and profiled bots of all kinds. Although it was most commonly used to stop spammers, the same information can be turned on scrapers to stop site content being stolen.
With the launch of ScrapeShield, CloudFlare has put together a package of new technologies that track scraping if it happens and alert the web site owner, and builds on the Project Honeypot foundation of 8 years of deep web intelligence. It's new, but it has a long heritage.
It's an enormous advantage to have been profiling bots and scrapers for years because it means that CloudFlare's new ScrapeShield service comes straight to the web without a long beta or learning period.
It's ready and fit for purpose from day one.
And ScrapeShield layers active anti-scraping features on top of its web intelligence. These beacons are able to detect scraping, if it happens, and alert the site owner to the scraped content. Every day now I take a look in my ScrapeShield report to see if my site's content has been stolen: the good news is that none has. That's not a surprise given how robust CloudFlare's anti-bot technology is.
But the ScrapeShield report is able to tell me more things about my site's content that weren't visible before. I get to see how often it's being read when my site's content is taken from the RSS feed and viewed off the web on iPad apps, when it's translated into another language and when it's reformatted for some random feed reader. I've learnt that I have a large following in Russia with people using Yandex Translate to read my blog.
Although CloudFlare itself has only been around for a short time, it has grown enormously and now does almost 35 billion page views a month. If it were a single site it would be one of the largest in the world. That means that CloudFlare's bot and scraper intelligence is growing and improving constantly. As bots enter the CloudFlare network of sites they are detected, blocked and dissected.
The combination of 8 years of deep web intelligence, smart technology for detecting scraping and enormous scale means that CloudFlare's anti-scraping solution ScrapeShield is powerful, continuously improving and ready for prime time.
No other anti-scraping service has the long history or enormous scale that CloudFlare brings to ScrapeShield.
Of course, ScrapeShield is a free app from CloudFlare and works with other CloudFlare services such as SSL protection, the ability to hide your domain's real IP address completely, our acceleration technologies such as compression, minification, caching and global distribution, and our core security that keeps hackers at bay.