Subscribe to receive notifications of new posts:

ScrapeShield: The scaled up, deep intelligence anti-scraping service

2012-04-11

2 min read

Months before I joined CloudFlare as a programmer I signed up for the company's service to protect my blog from hackers, spammers and scrapers. I saw an instant reduction in the amount of spam, an enormous decrease in hacking attempts, and a halt to the bots that scrape site content.

ScrapeShield: The scaled up, deep intelligence anti-scraping service

All that was long before CloudFlare launched its specific anti-content scraping app called ScrapeShield that builds on CloudFlare's existing services to provide a complete package of anti-scraping and tracking tools. ScrapeShield exists, and is powerful, because of CloudFlare's deep roots watching and profiling the behavior of bad web visitors that goes far beyond the short history of the company.

Part of the original inspiration for CloudFlare was an anti-spamming project launched in 2004 by some of CloudFlare's founders called Project Honeypot. That project created an enormous secret, dark web that trapped and profiled bots of all kinds. Although it was most commonly used to stop spammers, the same information can be turned on scrapers to stop site content being stolen.

With the launch of ScrapeShield, CloudFlare has put together a package of new technologies that track scraping if it happens and alert the web site owner, and builds on the Project Honeypot foundation of 8 years of deep web intelligence. It's new, but it has a long heritage.

It's an enormous advantage to have been profiling bots and scrapers for years because it means that CloudFlare's new ScrapeShield service comes straight to the web without a long beta or learning period.

It's ready and fit for purpose from day one.

And ScrapeShield layers active anti-scraping features on top of its web intelligence. These beacons are able to detect scraping, if it happens, and alert the site owner to the scraped content. Every day now I take a look in my ScrapeShield report to see if my site's content has been stolen: the good news is that none has. That's not a surprise given how robust CloudFlare's anti-bot technology is.

But the ScrapeShield report is able to tell me more things about my site's content that weren't visible before. I get to see how often it's being read when my site's content is taken from the RSS feed and viewed off the web on iPad apps, when it's translated into another language and when it's reformatted for some random feed reader. I've learnt that I have a large following in Russia with people using Yandex Translate to read my blog.

Although CloudFlare itself has only been around for a short time, it has grown enormously and now does almost 35 billion page views a month. If it were a single site it would be one of the largest in the world. That means that CloudFlare's bot and scraper intelligence is growing and improving constantly. As bots enter the CloudFlare network of sites they are detected, blocked and dissected.

The combination of 8 years of deep web intelligence, smart technology for detecting scraping and enormous scale means that CloudFlare's anti-scraping solution ScrapeShield is powerful, continuously improving and ready for prime time.

No other anti-scraping service has the long history or enormous scale that CloudFlare brings to ScrapeShield.

Of course, ScrapeShield is a free app from CloudFlare and works with other CloudFlare services such as SSL protection, the ability to hide your domain's real IP address completely, our acceleration technologies such as compression, minification, caching and global distribution, and our core security that keeps hackers at bay.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
Product News

Follow on X

Cloudflare|@cloudflare

Related posts

October 24, 2024 1:00 PM

Durable Objects aren't just durable, they're fast: a 10x speedup for Cloudflare Queues

Learn how we built Cloudflare Queues using our own Developer Platform and how it evolved to a geographically-distributed, horizontally-scalable architecture built on Durable Objects. Our new architecture supports over 10x more throughput and over 3x lower latency compared to the previous version....

October 08, 2024 1:00 PM

Cloudflare acquires Kivera to add simple, preventive cloud security to Cloudflare One

The acquisition and integration of Kivera broadens the scope of Cloudflare’s SASE platform beyond just apps, incorporating increased cloud security through proactive configuration management of cloud services. ...

September 27, 2024 1:00 PM

AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection

This year for Cloudflare’s birthday, we’ve extended our AI Assistant capabilities to help you build new WAF rules, added new AI bot & crawler traffic insights to Radar, and given customers new AI bot blocking capabilities...

September 26, 2024 1:00 PM

Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding

With a new generation of data center accelerator hardware and using optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform....