George Tarasov
Qrator Labs
4 July, 18:10, «Hall 1»
Bots are much more active and varied than humans when it comes to using the Web. Among them are the complex ones imitating human-controlled web browsers for data scraping, fraud, or just to ruin your pet project’s day. What’s their gain? Why are they so popular? How can we stop them? Let’s find out.
While useful bots make themselves instantly recognizable in your website’s traffic, the unwanted ones either don’t care or do everything to pass as a real human user. The simplest bots involve scripted access to web pages to extract or stuff data. Some go directly for the site’s backend API whenever possible. They can make up to 33M requests in one go. We’ll inspect them briefly, because more exciting species are up the food chain.
Browser bots controlled by an orchestrator, like Chrome Headless with Playwright, have long been a favorite for web scrapers who need to bypass protection. If there is any aggregated data of any value – credentials, stock prices, scientific measurements – there’s a high chance browser bots are collecting it for fun and profit. Why them? Most valuable data has a short lifetime, and lots of artificial users are needed to grab it fast and still remain unsuspicious. Being headless, these browsers can be optimized for minimum CPU & memory consumption, so you can keep them up in thousands for your virtual bot farm relatively cheap. Thus, while constituting just about 2% of all bot traffic we’ve inspected last quarter, these robots punch well above their weight, affecting performance and business metrics alike.
Recently, Chrome has received an update adding a --headless=new mode, making scraper bots nearly undetectable. The complexity of required countermeasures has also jumped up a notch, yet still doesn’t guarantee 100% detection. With that in mind, it’s time to look at the available damage control strategies, namely:
‘Home defense’ against more primitive but persistent scripted web & API bots
Jury-rigging a basic browser bot detection for your website
Finding out whether you really need something bigger: specific cases when advanced browser bots can cause lots of pain
I really hope this talk helps you get useful insights on how to treat bad bots in your traffic. Whether to observe, to manage, or to try and get rid of them – it’s up to you!
The talk was accepted to the conference program
The largest professional conference for developers of high-load systems
Participation options
Offline
The price is soaring — the closer the conference is, the more it costs.
The current price of a ticket is — 63500 AMD
Online
All report flows (but not the meetup flows) will be broadcast online.
The current price of a ticket is — AMD
*for new orders. if you already have an application in your personal account - pay and come to us!
Changed your mind?
Tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems
Offline
The price is soaring — the closer the conference is, the more it costs.
The current price of a ticket is 63 500 rsd
If you have any questions write to our support service: support@ontico.pro
Pre-order on HighLoad++ Serbia
Changed your mind?
Tell us why.
Thank you for your reply!
Congratulations!
Your ticket has been booked at the lowest price possible.
We'll send you an email when the ticket sale opens up.
In the meantime, let's hype you up with the recordings of the top 8 talks from HighLoad++ 2022.
https://www.youtube.com/playlist?list=PLq0OI1TAdxBZlIF4OSR8fhs7qJH2MkAi6