Bots are everywhere, crawling all over the internet. Some are good, cataloguing websites and enabling you to search for pictures of cats with ease. Others are all about information gathering, theft and fraud, and are bad news. More and more time is being spent accessing the Internet from mobile devices, and apps are becoming increasingly important as the software performing this access. Apps are a new and challenging arena for existing anti-bot techniques and attackers are starting to shift their focus from the mobile web channel to mobile apps to try and circumvent current protection mechanisms.

The Good Bots

There are many valid uses for bots, search engine web crawling probably being the most common. Other tasks that the good bots may be performing on websites include checking for changes for archiving purposes, website health checks and usability and security tests commissioned by the website owners. These activities introduce a small and fairly uniform overhead to most sites, but do not constitute a problem. They are in fact beneficial, making websites available for search and ensuring they are working correctly. Without google crawling the entire internet looking for search results, it would be much harder for any e-business to connect with their customers.

The Bad Bots

The malicious bots on the other hand tend to be much more targeted, focusing on individual sites and either degrading performance, spamming them, searching for weaknesses in their security or stealing information. This can be costly in a number of ways. DDoS attacks are the most straightforward, directly increasing your server costs and worsening the experience for your customers. It can also lead to extortion as companies are forced to pay to stop the attack. However, perhaps most insidious of all are scraping bots which do serious damage to the business and reputation of online retailers, travel sites, aggregators and price comparison sites.

Web scrapers target websites looking for valuable information which they will attempt to extract via automated tools. They do this for various reasons. On a site which is built on the quality of its own content, a web scraper can potentially steal that data in order to resell it to a competitor. For example a flight aggregation site may have a reputation based on the quality of the search results it returns and invests time and money in making those as good as possible to attract and retain customers. If someone can scrape the pricing information and use it to populate their own competitive site then they can attempt to undercut the original site, thus stealing traffic and revenue. The problem of web scraping is certainly something that budget airlines like Ryanair are very concerned about. In fact the airline actually took the operators of the Wegolo website to court to prevent them from scraping airfares from Ryanair’s website. They have also used terms and conditions on the website as well as litigation to try and prevent companies from accessing their data but this can be a slow and expensive approach.

The Anti-Bot Arms Race

Bots used to be very dumb. The very first web scraping was entirely manual, people copying website information by hand. Programmers then started to automate the process using simple scripts that would call commands like grep to search for particular pieces of information and retrieve them. These scripts would only understand simple html and were easy to spot. As countermeasures started to be deployed against these simple bots, an arms race began. Bots have become progressively more complicated to circumvent the detection algorithms used to uncover them. Advanced bots now work inside a complete browser stack which can process javascript and some even utilise humans to solve captchas for them during their scraping activities.

Current solutions for detecting bots fall into a number of categories. The most significant are behavioral analysis and signature based bot detection. Behavioral analysis depends upon web scrapers visiting a site in a way that would easily distinguish them from a human. If you can uniquely identify a browser or bot and track its behavior, you can compare it against the patterns you might expect to see from a real human. A human is unlikely to visit every page on a website in order, only lingering for less than a second before moving on. A bot is also less likely to do things like load images. Signature based detection interrogates the browser that is connecting to a website and identifies attributes of it that reveal it to be a bot. There are various methods that companies employ to detect popular automation frameworks which help them block bot traffic to websites and there is a continual arms race between bots and anti-bot solutions. Of course bots try and spoof the identity and characteristics of real browsers. State-of-the-art bots now include a complete javascript engine which can respond to challenges issued by servers and the levels of sophistication will only increase.

The Move To Mobile

With the rise of mobile apps the vendors that provide the current tools to identify and counter bots have a new challenge. The problem with mobile apps is that some of the techniques that apply to web detection of bots will not translate. If a mobile app is accessing a RESTful API directly to retrieve the information it needs, it may well have a lot of the content downloaded already. It may just use the connection to the server to retrieve the up-to-date information it is interested in displaying to the user. For example a travel website would have to communicate with the server to retrieve everything it wants to display on the page, whereas a mobile app version would only need to talk to the server to retrieve the specific flight search results. If the app is just querying the API for results, it may have limited features available and it may make life much more difficult for anti-bot solutions trying to build up a device signature. Browser identity analysis extraction code in Javascript cannot be injected through mobile APIs. Compounding this further is the trend of advanced bots to attack from a large list of IP addresses and use more realistic timings to hide from the anti-bot vendors.

The New Way

This new set of challenges requires a new solution. Instead of examining the behavior of a device and trying to infer whether it is a bot, Approov from CriticalBlue uses a positive authentication model. Our custom SDK integrates seamlessly with the genuine app, allowing it to present an authorized app identity to the server. Real customers can then confidently be given full access to the backend server assets while suspicious activity can be blocked or rate limited. Our technology incorporates sophisticated anti-tamper mechanisms and helps secure mobile APIs against the new bot threats developing in the mobile app channel. One of our penetration testing activities revealed a clear example of this problem. The API in question did not require a user login to perform searches and because the search is done inside the app, there is little to identify the agent requesting the information. This is very common in travel apps. If the app was enhanced with Approov technology, it could present a token proving that the software used to perform the request was the genuine mobile app. The server could then only respond to requests it knows are from a valid client.

The world of anti-bot technology is evolving rapidly. The days of very simple, easy to detect bots are long gone. As bots become harder to detect and switch their attack vectors to the mobile app channel, more sophisticated approaches are required to effectively detect them. Approov from CriticalBlue is an anti-bot solution for mobile APIs and adopts a positive model to allow you to authenticate the software being used to communicate with your servers prior to granting access, hence removing an important vector for scraping backend data.