• Security (1)

    Photo

    George Tarasov

    Qrator Labs

    What measure is a robot? Understanding and managing web bot activity

    4 July, 18:10, «Hall 1»

    Bots are much more active and varied than humans when it comes to using the Web. Among them are the complex ones imitating human-controlled web browsers for data scraping, fraud, or just to ruin your pet project’s day. What’s their gain? Why are they so popular? How can we stop them? Let’s find out.

    While useful bots make themselves instantly recognizable in your website’s traffic, the unwanted ones either don’t care or do everything to pass as a real human user. The simplest bots involve scripted access to web pages to extract or stuff data. Some go directly for the site’s backend API whenever possible. They can make up to 33M requests in one go. We’ll inspect them briefly, because more exciting species are up the food chain.

    Browser bots controlled by an orchestrator, like Chrome Headless with Playwright, have long been a favorite for web scrapers who need to bypass protection. If there is any aggregated data of any value – credentials, stock prices, scientific measurements – there’s a high chance browser bots are collecting it for fun and profit. Why them? Most valuable data has a short lifetime, and lots of artificial users are needed to grab it fast and still remain unsuspicious. Being headless, these browsers can be optimized for minimum CPU & memory consumption, so you can keep them up in thousands for your virtual bot farm relatively cheap. Thus, while constituting just about 2% of all bot traffic we’ve inspected last quarter, these robots punch well above their weight, affecting performance and business metrics alike.

    Recently, Chrome has received an update adding a --headless=new mode, making scraper bots nearly undetectable. The complexity of required countermeasures has also jumped up a notch, yet still doesn’t guarantee 100% detection. With that in mind, it’s time to look at the available damage control strategies, namely:

    ‘Home defense’ against more primitive but persistent scripted web & API bots

    Jury-rigging a basic browser bot detection for your website

    Finding out whether you really need something bigger: specific cases when advanced browser bots can cause lots of pain

    I really hope this talk helps you get useful insights on how to treat bad bots in your traffic. Whether to observe, to manage, or to try and get rid of them – it’s up to you!

    The talk was accepted to the conference program