Menu

Author: Ben Kneen

What is ads.txt?

Ads.txt is an IAB Tech Lab project that was created to fight inventory fraud in the digital advertising industry. The idea is simple; publishers put a file on their server that says exactly which companies they sell their inventory through. The file lists partners by name, but also includes the publisher’s account ID. This is the same ID buyers see in a bid request, which they can use as a key for campaign targeting.

Buyers use a web crawler to download all the ads.txt files and the information contained within on a regular basis and use it to target their campaigns. This means buyers know that if they bid on request that comes from an authorized ID, it’s coming from a source the publisher trusts or has control over. Buyers seem to be taking the idea seriously, too. Just a week ago Digitas published an open letter on Digiday saying they won’t buy from any publisher without an ads.txt file.

Ads.txt isn’t a silver bullet for all inventory quality woes, but it is a dead simple solution. You’d be stupid not to lock the door to your house, even if it’s not a guarantee of safety, right? The important bit is that for the first time publishers have a tool against inventory fraud instead of relying on the programmatic tech alone.

Are you a developer or patient person? Then try the ads.txt crawler yourself

As part of the program’s release, Neal Richter, a long time ad technology veteran and one of the authors of the ads.txt spec wrote a simple web crawler in Python. The script takes a list of domains and parses the expected ads.txt format into a database, along with some other error handling bits.

Developers will probably find it a piece of cake to use and non-developers will struggle a bit, like I did. That said, I got it running after pushing through some initial frustration and researching how to get a small database running on my computer. I wrote a detailed tutorial / overview of how to get it working for anyone interested in a separate post.

12.8% of publishers have an ads.txt file

At least, among the Alexa 10K global domains that sell advertising. To get this stat, I took the Alexa top 10,000 domains, removed everything owned by Google, Amazon, and Facebook – which don’t sell their inventory through 3rd parties and therefore don’t need an ads.txt file – and removed the obvious pornography sites. After filtering, I had 9,572 domains to crawl. I sent all those through Neal’s crawler and found 1,930 domains selling ads, and 248 with an ads.txt file. 248 / 1,930 = 12.8%, voila!

Update: Nov 1, 2017

In the less than 6 weeks or so since I published my first analysis, ads.txt adoption has continued to mushroom and now stands at 44%. I’m astonished to see the adoption rate triple over such a short time frame and I have to imagine this sets a record for publisher embrace of any IAB standard. So what’s driving it? My own opinion is the primary driver is Google’s Doubleclick Bid Manager’s announcement that they’d stop buying unauthorized supply paths at the end of October, which had led to a big grassroots push among the major exchanges with their publisher clients.

This post is a step by step walkthrough of how to start using Neal Richter’s ads.txt web crawler Python script posted under the official IAB Tech Lab’s git repository. You might know him as the former CTO of Rubicon Project or current CTO of Rakuten, but he also served as a key contributor to the ads.txt working group.

Getting this script working ended up being a great way to learn more about Python and web scraping, but was primarily so I could compile the data necessary to analyze publisher adoption of ads.txt since the project was released in June of this year. For more on ads.txt or to read my analysis and download my data on the state of publisher adoption head on over to my State of Ads.txt post.

What the ads.txt web crawler does

The script takes two inputs – first, a txt file of domains and second, a database to write the parsed output. Once you specify a list of domains, the script then appends ‘/ads.txt’ to each and writes them to a temporary CSV file. The script then loops through each record in the CSV file, formatting it into a request which then leverages Python’s request library to execute the call.

Next, the script does some basic error handling. It will timeout a request if the host doesn’t respond in a few seconds, will log an error if the page doesn’t look like an ads.txt file (such as a 404 page), if the browser starts getting redirected like crazy, or other unexpected things happen.

If the page looks like an ads.txt file, the script then parses the file for the expected values – domain, exchange, account ID, type, tag ID, and comment – and logs those to the specified database.

What the ads.txt web crawler doesn’t do

Neal is pretty clear that this script is intended more as an example than a full fledged crawler. The script runs pretty slow for one because it can only process one domain at a time, rather than a bunch in parallel. It also uses a laptop to do the work vs. a production server which would add more bandwidth and speed. It leaves something to be desired on error handling, both on crawling domains and writing the output to the database.

I was closer to chucking my laptop out the window than I’d care to admit trying to get around UnicodeErrors, newline characters, null bytes, or other annoying and technically detailed nuances that made the script puke on my domain files. And finally, the database is also just sitting on your laptop so it won’t scale forever, even if CSV files are typically small, even with tens of thousands of records.

All that said, I’m not a developer by trade and I was able to figure it out, even if it was a bit painful at times. Hopefully this post will help others do the same.

How to get your ads.txt web crawler running

First things first – before you try to run this script need to have a few things already in place on your machine. If you don’t have these pieces yet it’ll be a bit of a chore, but it’s a great excuse to get it done if you want to do more technical projects in the future. (more…)

What is Server Side Header Bidding?

Server side header bidding promises to improve latency, scalability, and auction logic issues seen in traditional header bidding by moving communication with exchanges away from the browser and into servers. The process is essentially the same way SSPs and DSPs integrated for years, except this time the SSPs are integrating with each other.

We’ve all seen them; you’re casually browsing your favorite app or playing a game on your phone and suddenly you’re being redirected to the app store to download Candy Crush. What gives? It’s another obnoxious mobile app store redirect ad that’s automatically sending you to the app store without a click.

These are the pop up ads of the mobile age and virtually everyone hates them, including the chain of mobile publishers, exchanges, and other ad tech that unwittingly served them in the first place. But they can be quite difficult to find; because no one wants to serve them they are purposefully targeted and served in a way that makes them difficult to replicate. TechCrunch wrote a good overview of the complexity of this problem in 2015, and also linked to Sergei Frankoff’s detailed technical description on the Sentrant blog of how frame-busting JS combined with 302 redirects was able to automatically open the app store.

Below is a step-by-step guide designed for ad operations teams to demonstrate how to use Charles Proxy to identify and eliminate a mobile app store redirect ad, though it unfortunately won’t act as a detection system or a blocker of any kind. Thankfully most exchanges have found ways to ban frame busting code like the one mentioned in the TechCrunch article at this point.

Charles Proxy is a program that will sit between your browser and the internet and record all the different interactions when loading a web page, even those you can’t see in the source code. The benefit of using Charles is it doesn’t matter how fast the page redirects or how many parties are involved in the process. Charles will record everything and let you meticulously search through all the interactions at your own convenience. It has a free trial version, but if you work in ad operations you should just buy a license. The tool is essential for all sort of debugging needs, and a license is just $50.

Step 2: Catch a redirect to the app store by navigating through your app

This is the hard part. Mobile app store auto redirect ads are typically frequency capped, targeted in obscure ways to avoid detection, and are difficult to replicate. If you’re having trouble, see the Advanced section at the bottom of this article for ways to speed up the process. Make sure you leave Charles Proxy recording your traffic, since you’ll search through the results to find the root source of the mobile app store ad. I’m not trying to pick on Candy Crush here in particular either, they’re just a popular example these ads point toward. (more…)

Note: Special thanks is due to Scott Eichengrun for this article, specifically showing me how to get the two phone rig going.

This article is a tutorial on how to configure Charles Proxy to inspect traffic over cellular networks, and while it’s designed with ad operations use cases in mind, it’s applicable to any front end web developer with similar needs.

There are many articles out there on how to use Charles Proxy through your phone – I’ve writtentwo myself, in fact. All those articles assume you’re leveraging a Wifi network when connecting to the internet, however and while that’s fine for basic testing on mobile web and mobile apps, there are often reasons why you want to inspect traffic over a true cellular network instead.

Perhaps you need to debug an ad that relies on carrier targeting, or you want to measure data usage or network latency through a true cellular connection. These are more advanced use cases to be sure, but when you can’t get by with using your phone with Charles over Wifi, or a mobile emulator in Developer Tools. Plus, Charles Proxy offers a host of powerful features like breakpoints, which you can use to test an experience in stages, or rewrite rules to reference a local file before you update your server, or blacklisting, which can be helpful to isolate the root of various technical problems.

The Hardware Setup

Before you start, you’ll need:

A laptop running Charles Proxy. I’m using a Mac in this case, but you could do this with a PC as well.

Two phones, at least one which can be enabled as a mobile hotspot. I’m using two iPhone 7 in this case, but you could do this with Android devices as well.

Yes, unfortunately you’ll need two phones to setup this rig – the first is used as a mobile hotspot, the second to actually browse the web / app you need to test. The reason you can’t simply run the connection over a single phone is because you have to manually set the IP address and port of your network connection for Charles to inspect your traffic, and you can’t do that on your phone’s cellular connection. Creating a mobile hotspot however gives you the ability the adjust those settings on the device connecting through it. So you’re using one phone for its mobile network and the other phone as the client that proxies requests through Charles.