How do you stop Scraping ?
There are a number of ways to go about this including buying a subscription of scraping bots that have been identified and then applying an Access Control List(ACL) to prevent them from accessing your web properties.
DOSarrest develops its own software to thwart any malicious traffic, this particular anti-scraping feature can be applied to your website or a specific URI(s) to keep the scraping bots from continually scraping your website
Named “Code Injection-Bot Blocker” in our DSS2(DOSarrest’s customer portal), this feature dynamically injects code that’s attractive to botnets but remains unseen by human website visitors, this injected code directs the bots to a honeypot, where as soon as they begin to make requests the traffic is analyzed and blocked.
How it Operates?
As the webpages are returned to us from the origin, we modify the html before sending the page along to the requesting bot. The feature includes an ‘injection point’ option where the user can define a “div”or any other location within the page to inject the code. the default injection point is the closing body tag. it takes a first initial request to load the modified page, and then there are usually several other calls before it follows the honeypot link. Once it follows the injected code it is marked as illegitimate traffic on the first request to the honeypot.
Once detected, botnet traffic is added to both a customer specific list, and a global block list. Each customer has the option to block traffic based on either list. This allows every customer to decide if they want to take advantage of a previous botnet discovery from another website or generate their own list of scraping bots for themselves.
Any traffic specifically whitelisted for a resource will not be impeded by the bot blocker feature for that resource. There is already a substantial whitelist of all the major search engine IPs that customers can use as their whitelist, after all, not all bots are bad.
- A request is made to a website on our service
- If the content being requested is not cached it is retrieved from the origin server as normal.
- The origin server responds to the DOSarrest cloud security platform.
- If the anti-scraping feature is enabled, our system will inject code into the HTML content. This code is invisible to human visitors, but irresistible to bots.
- The code injected modified HTML page is sent back to the bot or legit visitor.
- The bot follows the injected code into the honeypot where its IP address Is recorded into a local list as well as a Global list. At this point This bot is stopped and is denied any other visits to the protected website In question.
Senior Application Security Architect, DOSarrest Internet Security