The image caption
Unlike most bot attacks that are launched by cybercriminals, your competitors are mostly responsible for scraping attacks that can steal your intellectual property, pricing or content. The more successful your business, the more likely your website will be scraped by competitors, fueling more targeted attacks.
The PerimeterX research team continuously investigates and analyzes scraping attacks and tracks the evolution of scraping bots as they’ve gotten more advanced. Most basic scraping bots are relatively easy to detect since they have many of the same characteristics.
Characteristics of basic scraping bots:
- Request many different paths: Scraping bots request a significantly higher number of different paths than a human user can possibly navigate.
- Visit the same product page repeatedly: Scraping bots visit the same pages six times more than human users, updating the dynamic content on the page to get different results each time.
- Stay on the page for a very short time: Scraping bots stay on the page for a much shorter time than human users. They go directly to valued content and do not waste time between page loads.
In contrast, scraping attacks from advanced bots take various forms that are more difficult to detect. For example, a scraping attack can look like a spike, as seen in this graph below. This recent scraping attack was detected on the website of an e-commerce business specializing in fashion and beauty merchandise.
Another variance of scraping attacks from advanced bots has a daily cadence mirroring the working hours of a human. For example, in the graph below we see scraping attacks that are trying to blend in with the human traffic of a home improvement retailer’s website.
These two attacks have a high volume of requests, ranging from 2-7 million total, but what is more interesting about the sophistication of the attacks is the high distribution of the source IPs making detection harder.
The attack campaign on the fashion e-commerce business mentioned earlier used 9,000 different user agent combinations of browsers and devices, 28,000 distinct IPs and 1,300 autonomous system numbers (ASNs). The attack on the home improvement retailer had close to 32,000 user agents and over 31,000 distinct IPs coming from 780 ASNs.
Hyper-distribution achieved by using many different user-agents, IPs and ASNs helps these new forms of scraping bots fly under the radar. Basic detection tools that are usually based on signatures and volumetric sensors are unable to detect these scraping attack campaigns.
For digital businesses, where content and pricing are the competitive differentiators, scraping attack detection and mitigation is paramount. Scraping attack patterns can look very different, and attackers use more advanced methods to avoid detection. This new normal of hyper-distributed attacks reinforces the need for solutions that combine advanced detection methods as well as continuous security research to stay on top of the techniques that scrapers use and keep developing. Stay tuned for more attack related research blogs from the PerimeterX research team.