Tag: time complexity

This project requires a bit of background, for those who aren’t familiar with Google Ads. (For those who are, please skip down to the horizontal bar.)

When you search for something, some ads may show up at the top of the results page. Those ads were paid for by companies who had decided those ads were bringing in enough profit to be worth it.

Think about it in terms of four steps. 1, the company decides which keywords they want to bid on, and how much to bid on them. 2, the ads show up to users, and some percentage click through. 3, of those who click, some percentage actually convert (conversion defined however you decide – they may download a white paper, buy a product, or sign up for your email newsletter). 4, the company measures all aspects of this funnel and adjusts their keyword bids and/or ads accordingly.

This particular project involves automation of a specific portion of step 2. To try to maximize their ads’ conversion rates, marketers only want to serve their ads to people who have the highest chance of clicking, and ultimately, converting. A very important tool in the marketer’s arsenal for this goal is the negative.

Negatives, short for negative keywords, are keywords that tell the search engine that they should not display your ad if that keyword is part of the search query. For a quick example of why you might want to do this, consider that you’re running a bakery and you have two ads, one for cake recipes and one for carrot cake recipes. Now, obviously, you want only the cake recipe to show up when someone searches “cake recipe” or something similar, not when they search “carrot cake recipe” – you have a separate ad for that for a reason! So what you can do is make the word “carrot” a negative for your cake recipe ad.

This concept of negatives is useful for any circumstance where you don’t want your ad to show up – including because the keyword has a low conversion rate. And this is what this project is about.

This project involves taking a list of search terms and finding individual words which are always associated with a lack of conversions over an extended period of time and bringing them to the attention of the marketers running the ad campaigns so they can negative them. Therefore, it involves taking a table with a column of strings and a column of ints and isolating individual words (separated by spaces) which are associated with, and only with, int values of 0.

I considered a wide variety of possible algorithms in the course of working on this problem but ultimately wanted to find something which had a minimal time complexity and could run moderately quickly over tens of thousands of data points (six months or more worth of keywords and conversions).

I wanted to make sure I was looping through the dataset as few times as possible, so I would make sure to fetch the data, only store what I would absolutely need, and store it in a separate array so I didn’t have to deal with the hassle of deleting elements and having the machine shift everything else back, since I was liable to do a lot of deletions.

I will admit that this problem took me a long while to crack. I had too hard a time visualizing a massive dataset with so many different terms to think about doing any logically complicated data manipulations to it. So my first step was to simplify it to a little table, like this:

Strings | Ints"a x" | 0"a y" | 1"b x" | 0"b y" | 1

Abstracting away everything but the bare bones of the problem helped me finally realize what I succinctly described at the beginning of this essay, that I wanted to find all the elements with, and only with, an int (# of conversions) value of 0.

Once I removed that block, the ideas started pouring out. I went through about twenty different possibilities before settling on this basic concept:

Fetch the data from an Excel spreadsheet to a Pandas DataFrame.

Loop through the DataFrame, splicing each individual search term and splitting it into component words.

Add each word associated with the number of conversions of the search term it’s a part of into another array, in alphabetical order.

For each word in the resulting array, while the word is equal to the one in the next row after it, check if all conversion values are equivalent to each other and to 0.

If they are, add the word to the final array; as soon as they’re not, skip to the next word and run that through step 4.

Now what’s left is to actually implement that. *exaggerated sigh*

What did you think of this post and this algorithm? Is it a good idea? Did I screw something up colossally? Please let me know in the comments below!