Content scraping is the process of extracting data or content from a website. While the content on a public website is intended to be accessible to all, that content isn't available for any and all uses. Sites with large or attractive audiences and unique content find their site's information is often targeted for gathering and republishing or repurposing.
Commonly targeted content can include:
- Written articles, posts, and information, taken to build up another site for content or marketing purposes
- Listings from data-based sites (jobs, rentals, directories, etc), taken to republish
- Many other pieces of data about users, products, prices, etc.
Not all bots visiting a site are bad. Those that crawl and index information for Google or Bing, for example, provide a great marketing benefit by directing traffic toward your site. Because of these good bots, implementing technology that blindly stops all non-human traffic is a poor solution. The challenge is to make content easy to find and consume for both humans and good bots while preventing its theft and reuse by bad bots.
PerimeterX Bot Defender protects your web content from being harvested by web bots. It blocks disruptive or malicious attacks by identifying if a session is from a human and then stopping any automated process from continuing to crawl or have access to your site. It allows control over how useful bots, such those operated by search engines, may access your site. Bot Defender is smart enough to tell when the attack traffic is being spread between many network ID or origins. It can also identify and narrow the attack source to the session level so only individual attacks are blocked rather than entire networks.