Behavioral targeting: What you need to know

Behavioral targeting is advertising's attempt at supplying you with specialized ads developed from your Internet history. The intent of this article by Michael Kassner is to shed light on how behavioral targeting works and how it could affect you.

Behavioral targeting, to say the least, is an interesting concept. It may change how all of us view and use the Internet. Knowing that, I'd hope everyone would want to understand what it's all about. Briefly, behavioral targeting first determines what you like, based on where you go on the Internet. Then, behavioral targeting selects advertisements that are most likely to influence you, displaying them on the new web pages you ask for.

Before we get too much deeper into behavioral targeting, I need to point out some related technology that makes behavioral targeting possible. Behavioral targeting only became feasible when Deep Packet Inspection (DPI) matured into an established technology. Therefore, it's important to comprehend what DPI is. To help in that regard, please refer to a recent article of mine called "Deep Packet Inspection: What You Need to Know." I know it seems like a shameless plug, but it helps if we're all on the same page.

The infamous cookie

The next piece of the puzzle we need to understand is the much-maligned cookie. Cookies are benign text files that are sent to the web browser from the web server that's hosting the web pages being queried. Cookies have multiple purposes: they allow automatic authentication, keep track of the browser's session state, and identify the user/web browser combination to the web server. The fact that cookies can identify the user/web browser combination is paramount to our discussion about behavioral targeting, because the behavioral targeting process installs additional cookies specifically to track what web pages have been viewed. That's why cookies are so important and why I'd like to describe how a cookie is installed:

The process starts when I type the URL of a web site into the web browser.

The browser will check the computer's hard drive for a cookie associated with the web site I just entered. If it finds the appropriate cookie, the browser will send the cookie information along with the URL to the web server controlling the web site being queried. If the browser doesn't find a cookie, no data is sent.

The web server receives the request for a page. It then checks to see if a cookie was sent as well. If so, the web server can use that information to tailor the web page specifically for me.

If the web server didn't receive a cookie, it knows that I haven't visited the site before. The web server then creates a new ID for me in the web server's database and sends a cookie in the header for the web page to the computer I'm using. My computer then stores the cookie on the hard drive. From that point on I'm uniquely identified when I ask for web pages from that particular web server.

We now have all the relevant pieces, so let's get to the important stuff. Behavioral targeting is an application that uses information it has gleaned from our web-browsing habits to display ads it thinks we'd like to see. The process starts with companies like Phorm or NebuAd (behavioral targeting development companies) talking to our ISPs, offering the ISPs money if they will allow Phorm or NebuAd to install equipment in the main traffic stream of the ISP. This equipment serves two purposes:

It inserts a Phorm or NebuAd cookie that uniquely identifies each ISP subscriber and is associated with every cookie domain that has been issued to the subscriber.

By using DPI equipment, reads every web page that the subscriber has asked for and creates a profile of the subscriber's interests based on a predetermined checklist.

Phorm's approach

The subscriber profile is of obvious interest to advertising firms -- why serve ads about baby diapers to someone interested in joining AARP? The advertising firms negotiate with Phorm or NebuAd to provide content through the behavioral targeting application. Thereafter, no matter where the subscriber goes on the Internet, those specific ads will show up on the web page being served to the subscriber. To help clarify this, let's take a look at the process (diagram courtesy of Wikipedia) Phorm uses to set up behavioral targeting:

Click to enlarge diagram.

I want to go to www.techrepublic.com, so I type the appropriate URL in my browser and the browser sends the query out to the Internet.

The Phorm application then checks to see if there's a Webwise.net cookie (a domain name owned by Phorm) associated with the Techrepublic.com domain.

There's not one initially since the Phorm equipment has just been installed. Therefore, Phorm blocks access to www.techrepublic.com.

Now a Phorm server at the ISP steps in and pretends to be a web server at www.techrepublic.com, returning a "307-Temporary Redirect" to my web browser. The 307 redirect tells my browser that the URL I asked for has been relocated temporarily to a different location.

My browser now thinks that www.techrepublic.com has moved to www.webwise.net, so it makes a redirection query to www.webwise.net. If my web browser locates a Webwise domain cookie it will also be attached to the redirection query. Phorm then knows who I am by the unique ID associated with the Webwise.net domain cookie.

If there isn't a Webwise domain cookie, the Phorm server will assign one to me and send it back to my browser in another 307 temporary redirect response to a fake www.techrepublic.com page that is on the Phorm server. Therefore, if I didn't have a Webwise.net cookie, I do now and it's a first-party cookie.

Next, my web browser sends a redirection query to the fake www.techrepublic.com page. The Phorm server once again steps in and pretends to be the web server at www.techrepublic.com. Remember the browser thinks it's sending a query to www.techrepublic.com and there's still the unique Webwise user ID in the query. The Phorm server now sends the final 307 temporary redirect response to my web browser telling it to go to the actual www.techrepublic.com web page. This part is important: The Phorm server also sends back a Webwise cookie, but it's placed in the Techrepublic.com domain and becomes another first-party cookie.

Finally, my web browser sends a query to www.techrepublic.com and it appears that it will make it this time. The query has a unique payload of cookies as well, one for the Techrepublic.com domain and one for the Webwise.net domain.

The Phorm equipment at my ISP intercepts this query and the Webwise.net domain cookie is stripped off before the query actually proceeds to the Techrepublic.com web server. This appears to take place to avoid public visibility of the Webwise.net cookie, as it shouldn't be on a www.techrepublic.com query.

The contents from my query to www.techrepublic.com come back and are intercepted by Phorm equipment. A copy of the information along with my Techrepublic.com domain and Webwise.net domain cookies are sent to a secondary piece of Phorm equipment.

The secondary Phorm equipment scans my Techrepublic.com web page for key information that will be added to their browsing profile about me.

Sorry for the long, drawn-out description, but that's exactly what takes place every time a web browser sends out a query. In this way, Phorm knows exactly where I go on the Internet and what I'm looking at. With these profiles, Phorm, for a fee, will tell advertising firms what ads to place on the web pages being served to me. The company states this whole process is anonymous, but that requires trust in what Phorm says, as the Phorm application is proprietary and not available for peer review. I don't have an opinion one way or the other as to the claims of anonymity by Phorm. As mentioned earlier, I'm just concerned that most users are not aware of this technology, and I want to correct that.

NebuAd's approach

NebuAd is another major player in behavioral targeting. Their process is slightly different, and I'd like to explain the differences, even though the results are the same. The NebuAd equipment is also placed in the ISP's main data stream, but NebuAd doesn't use the cookie shuffle like Phorm. NebuAd, to their credit, uses a very innovative approach I'll explain by using the following example:

I want to go to www.techrepublic.com, so I type the appropriate URL in my browser and the browser sends the query out to the Internet.

My ISP receives the request and passes the query on to www.techrepublic.com (different from Phorm).

The web server at www.techrepublic.com replies to my web browser's query with the appropriate web page.

The NebuAd equipment at my ISP is monitoring this exchange and as the last packet reaches the ISP, the NebuAd application injects one packet to the end of the traffic from the web server at www.techrepublic.com.

This final packet contains JavaScript. The script causes my web browser to go and retrieve scripting code at a NebuAd web site.

My web browser then runs the script and a NebuAd cookie is planted on my computer.

The NebuAd cookie is similar to the Phorm cookie in that it uniquely identifies me and allows the NebuAd applications located at my ISP to track my Internet activity, scan the returned web pages, and create a profile that's of interest to advertisers. At this point, Phorm and NebuAd are almost identical.

What's it all mean

That's the ultimate question, and I'll leave that to the pundits who are much more knowledgeable than I am. As I mentioned, my goal was to make you aware of what's coming. I'm not sure I want a business entity tracking my every move on the Internet. My government, sure that's a different story. They aren't doing it for monetary gain; they are protecting me. That being said, I do understand the need for advertising. Part of my professional existence depends on advertisements. I'm just not sure being this invasive is the answer.

Also, I suspect that this business model will place the advertising world in some sort of turmoil. For instance, who gets to decide what ads are displayed when I go to Techrepublic.com? TechRepublic or someone paying my ISP? For more insight into how this topic is playing out, I suggest that you listen to Steve Gibson of GRC.com and Leo LaPorte from Twit.TV. They cohost a series called Security Now and have put together several of their pod casts that explain relevant pieces of behavioral targeting. I'd especially recommend listening to the podcast, "Episode 153: DePhormed Politics," where Steve and Leo have an enlightening discussion with Alexander Hanff, a technologist and anti-Phorm activist from the UK.

Preventative measures

There are options that you can use to avoid behavioral targeting cookies and DPI scrutiny. Encrypted tunnels through your ISP disallow the installation of behavioral targeting cookies. Also using VPNs, whether they are IPsec, L2TP, or SSL, will negate any effort by DPI to decipher the encrypted traffic. E-mail is another subject, and once again the only for sure way to ensure its privacy is to encrypt the message. There are not a whole lot of options, but that's because behavioral targeting applications are being placed only one hop away from your network perimeter.

Final thoughts

Whew, this is a tough subject. I know that opinions about behavioral targeting will run from A to Z, and that's good. I'm not even sure what my final thoughts are. What bothered me was the lack of information about behavioral targeting. Hopefully, I was able to change that with this article.

Michael Kassner has been involved with wireless communications for 40 plus years, starting with amateur radio (K0PBX) and now as a network field engineer and independent wireless consultant. Current certifications include Cisco ESTQ Field Engineer, CWNA, and CWSP.

About Michael Kassner

Information is my field...Writing is my passion...Coupling the two is my mission.

Full Bio

1TopSpy.com software is quite effective. With the positioning function, tracking phone messages, I feel it is useful. Can I track my husband's peace of mind without worrying about him being discovered. Also thanks to the track with 1TopSpy.com phone, I knew my husband was faithful. Thus, our family has become happier. http://www.1TopSpy.com Thank you very much!

I've just found out that Phorm and BT are going ahead with yet another testing regimen. The difference as I know it this time is that Phorm is opt-in. My problem is that I would need proof of that. It's just a matter of trust.
http://www.itwire.com/content/view/20889/53/

Like you, I tend to trust my government ... at least a lot more than I trust a bunch of greedy SOBs trying to sell me viagra. Excellent explanation, now could you please tell us how to trash their cookies? Optimally, I'd like to feed garbage data into their system, but failing that, deletion of the cookies from my system will do.

I just completed a new article about Behavioral Targeting. This new article is focused on what's actually happening in the real world. There's even information about a governmental questionnaire that was sent to 33 ISPs. Check and see if your ISP is one of those that has already responded.
http://blogs.techrepublic.com.com/networking/?p=624

First off, re: your article "Deep Packet Inspection: What You Need to Know.", behavioral targeting was among my first thoughts, right there with ?Watch the Web start to look way too much like TV."
Psychology has contributed enormously to targeting this or that demographic. I don't know what most people think of advertising on radio and TV - TV in particular - but it appears to be aimed at a 'lowest common denominator', which, while successful perhaps, is insulting as hell.
Polling and surveying have offered quite a lot of general information re: buying habits. Demographically specific in most cases, results analysis applied in conjunction with targeted application of Psychology are an effective - and virtually inescapable - methodology for sucking the money right of the pockets of the unaware. With insult.
DPI has the potential to be exponentially more insulting. And money sucking.
Yes, behavioral targeting is an invasion of my privacy. Via any venue.
Attempt by one to manipulate another to selfish end is reprehensible.
tidypoet

As far as I'm concerned, they can target me as long as they want. I never even look at advertisements on web pages, they only manage to annoy me if they are really popping up in my face (hmmm... reminds me of the Techrepublic welcome advertisement which of course I never look at and immediately skip - I put my cursor right away in the right top corner to "continue to main page - :-))

This article gives the impression that behavioural targeting is reliant on DPI, it is not. In fact behavioural targeting can be very effective without DPI. DPI is only an extension to behavioural targeting which relies on ISP involvement.

That's great news, but I caution against celebrating early. Companies like Phorm (as well as the bad ideas that spawn them) have more lives than a cat. As has been said before, "Eternal vigilance is the price of freedom."
May Phorm and it's supporters catch a disgusting disease and die in abject poverty.

Did you know the cable company can "profile" your TV viewing as well? They can know what signal is on when and where, (approximately) and for how long. This technology has been in use for over a decade.
This ability is not a mere result of the infrastructure though, it has to be deployed. I forget the details but I believe devices like TiVo and newer TVs have the circuitry to allow the cable co to know who's watching what, though pinning it to an individual household in real time will require full deployment of IPv6 and every "appliance" (literally) being on the 'net.
I've been saying since the standard for IPv6 was announced that the day that goes live is the end of privacy and liberty.
Here's the question:
Do you know what date IPv6 was "officially" launched? (no googling now... ya gotta know it)

My son who is a business/HR major talks about psychology all the time. He even has taken what I would consider an inordinate amount of classes in it. We discuss ads all the time and how effective they are.

Most ads come from one of a few huge ad server farms. If I really want to get rid of the ads I put an entry in my /etc/hosts for the ad server (by name works) and point it to my loopback. Being as you run a browser as a normal user, the ads will all say "page cannot be displayed" because the loopback has simply dropped the non-root request.
=)
Not sure if there's a windows equivalent.
I use firefox, mainly for the extensions. One of them is a tool that pausing the mouse over a link displays the location, IP and name (all if available) of the server. Comes in handy for the above.

Some privacy experts are concerned about mission creep and oversight. One of their arguments is that it would not take much more to use DPI on unencrypted email.
It wouldn't be hard to imagine that BT and DPI are actually going to increase security on the Internet by increasing the use of SSL and encrypted email.

That's the only one that I know of and it's a test. None of the big US ISPs are involved at this time as far as I know and that they are admitting. The Congress is really on top of this for once.
I think one issue slowing Phorm use is the infrastructure required to make it work. As I understand it for the BT test of 18,000 users it will require the addition of 300 servers.

Thanks for your comments, Thomas
I think I may have mentioned that Phorm already changed their name once. So, you are absolutely correct. Money talks and capitalism rules. The whole debate centers on whether capitalism is above board or not.

In my former life as a corporate pilot I frequently flew the owner of a huge Madison Avenue ad agency to work and back every so often.
The stuff he told me... he told me a bunch of German pharmaceutical, chemical and psychiatric eggheads were deposited in his care beginning in 1945 and on. He said they were really nice folks, and the smartest folks to be had on the planet.
And rather than be tried for war crimes our "government" loosed them on our heads via the advertising business. What, oh what on God's green earth does that tell you?
BTW our public school systems are nothing more than 12 years of advertising. The "product" is belief in a "government" that isn't really a government, but a for profit corporation.
Check out what the "federal reserve" really is, for instance. It's a private, for profit FOREIGN OWNED corporation.
Another trivia question.
What do Abe Lincoln, Benjamin Harrison, William McKinley and John Kennedy have in common? Here's a big hint:
http://usrarecurrency.com/WebPgFl/A51298086A/1963$5UnitedStatesNoteSnA51298086A.jpg

I am working and do not have the time for anything but a quick response.
At home, I use Qwest DSL, at work I use Comcast. What they use? I don't know and don't care.
NoScript prevents the execution of java and other scripting can prevent flash from operating.
AdBlock prevents the display of ads.
So, if you send a script, it will not function unless I allow it to. If you send an ad, it does not display. I can do nothing when the ISP provides the info, but I do not have to see the ads. This is the pocketbook talking. Ads do not reach me, no revenue. No revenue, eventual cessation. With government, they have other ways.
Both extensions and Firefox are free.
When using Linux, I spoof the user agent to indicate Windows and IE 6 or 7. With Windows, I spoof the UA for FireFox on Linux. This is just to toss a monkey wrench into the works.

The Windows hosts file can be found at C:\WINDOWS\system32\drivers\etc and modified in notepad to do the same thing. Also there are some spyware tools that block many of popup add. (I use Spybot)
My question is if you found out the DPI server ip and blocked or redirected it would this still allow normal browsing while only blocking DPI from monitoring you or would you just be creating a DoS on yourself?
Unfortunately I believe the later would hold true. Any comments from anyone who can/has try this would be appreciated.

That works the same for MS products. It must get cumbersome though as there are a significant number of ad server farms.
I also wonder if that's how it's going to work or if the Phorm servers are going to serve the ads.

When an ISP and DPI is not involved the only solution, that I'm aware of, is a cookie dropped by the adserver. The cookie will hold an identifier, limited to a session or some relatively short period of time, and can consider site history (within the domain of the adserver) and can consider recency & frequency of a visit. Now apply some context in the form of a tag (special keyword) and you have an effective behavioural targeting system. Apart from the technical difference of involving an ISP, the main non-technical difference is the reach of this system, limited to the Adserver's domain. Cookies can only be dropped when a property in the adserver's reach has been visited, but for large networks this is not a problem. The ISP-DPI solution may, or may not, provide greater reach but this reach is complicated through issues arising from partnerships with other publishers and an ISP. The benefit to involve the ISP and DPI for behavioural targeting really needs to be analysed on an individual basis.

Like everything, it comes down to intent.
A NAT box is intended to capture an external port and forward it to an internal ip/port. Similarily, a clear device on the wire for security intercepts the stream but leaves it alone unless the triggers get flipped.
An advertsising company is not not providing security or required network bridging. The intent is purely to present more refined ways to convince people to shell out for things they don't actually need.
I'm preaching to the choir here though; hurray mission creep. If we can't get it approved in the business case, let the project slip a little when no one's looking.

Neon Samurai,
I agree with you completely, but ISPs have precedence. The use of NAT routers and Proxy servers already intercept and alter packets. I realize there are valid reasons for doing that, but once again this could be defined as just infamous mission creep.

If I request a website and another server pretends to be the website I've requested; that's not acceptable.
If I setup a server to intercept requests and then my server places itself between forwarding the request and returning the webserver responses. I'm providing a fraudulent identity to both the client browser (dns spoof) and the webserve (client identity spoof).
The part I don't understand is why this is acceptable for advertising research companies when anyone else doing it is suddenly the evil criminal attacking people.

Once again definitions come into play. I interpret forged as imitating something. With that in mind the cookies are not forged, they are actual webwise.net cookies that are added to the domain of the web site being accessed.
I see your interpretation as well and it's valid as the cookies are not what you asked for or expected.

If I understood the explanation, Websense puts a 1st party cookie on my machine in the name of the site I was attempting to access. That *IS* forgery, or I've somehow missed the definition of forgery in school.
I understand the dance, I think. I understand that they will put the cookies back on my machine each and every time. So what I am interested in is this.
If I cannot FORCE them to keep their garbage off my machine, if I cannot FORCE them to cease forging the name of the sites I visit, then I want to corrupt their database.
If the details about the format of their forged cookies can be circulated, someone will want to assist in this. Imagine, if you will, a program to randomly reset their 128-bit ID in their cookies on my machine. If enough people get involved, soon their database will be too corrupted to be valid.
Phorm may think they have the upper hand, by giving money to MY ISP to spy on me. But if they can lie, so can I. So, what can you tell us about how to identify their forged cookies?
Thanks!

The cookies aren't forged. The first step is to place a cookie for Webwise on your computer. The second step is to make that Webwise cookie part of each domain cookie that you have. So if you go to Google, the Webwise cookie that's part of the Google domain goes along and identifies you to the Behavioral Targeting application.
It's more complicated than just removing the cookies. As Neon Samurai mentioned, each time you ask for a web page, the BT application will go through the whole process if you don't have the Webwise cookie or a Webwise cookie for the domain you are trying to obtain web pages from.
Your specific User ID comes into play here as well and that will still identify you to the BT application.

For Windows, there are a few cookie managers. Cookiecutter was popular a while back as one that would recieve cookies and keep them in memory then dump them without ever being saved to the drive. You'd be tagged each visit but it would only last as long as that visit.

Pgit grasped the salient point of my question. Based on your article, I'd like to know how to damage and/or remove cookies that are used by Behavioral Targeting. However, as I understand your article, the forged the identity of the website I was going to, was used to produce the cookie in the first place. So, is there anything which I can use to determine that a forged cookie was placed on my machine and either corrupt or delete it?
Thanks!

Talk about a change of professions. Also I am totally not good at that sort of thing.
My son calls me the world's best straight man, as I don't comprehend too many things other than networking or IT stuff.

Well, that's how people should be responding anyhow. And, for reasons existing before the DPI discussion.
Cheers for keeping up with it though. I've the "What Gov want's to know" article marked for a detailed read later.

Your question is what is bothering the privacy sector. Without any oversight or ability to examine the application code, there is no confidence as to what is happening. It's just their word.
The privacy sector is also worried about mission creep. For example, the advertiser's next step is to say we can target your ads better if we read your email messages.

Then again, what personally identifiable information is given about me? My IP. So, you take the IP and do a reverse DNS and use it to mail me paper ads - I routinely throw them away without opening.
We live in a data-centric society; we really have no privacy. The only thing we can do is render the data as useless as possible.

As you describe it AdBlock would prevent the display of ads regardless of where they came from as it's a function of the browser.
It still doesn't stop the compilation of a profile on your web browsing habits though.

Thanks for the path. I haven't used a hosts file in windows, laziness I suppose. But I've got a few windows to Linux automated directory syncing projects going. Using hosts will make things clearer and simpler.
Good question there, can't answer it myself. But the redirects are conducted by your ISP, seems it would appear to you that you're communicating with them rather than the DPI. Do you ever even see the DPI in any of the traffic? If not, how could you ever block it?
I'm beginning to agree SSL is going to get a popularity boost out of this. Oh well, back to the books...

Just to be clear, the DPI server isn't what redirects you, it's the Phorm or NebuAd server. As I understand it you are correct in assuming that your access would be denied as everything flows through the Phorm or NebuAd servers.

Hi,
Atlas was only recently acquired by MS. I'm not sure how much you'll find on the Atlas site, but it does give an overview of what they offer targeting wise via their Admanager product (well it did the last time I looked)
Regards,
Sergio.

Hello Sergio,
I agree and have been thinking about that. I suspect that the DPI equipment at each ISP would would then send the profiling traffic to a centralized database. Just because of the sheer volume of information.
Is Atlas owned by MS? Forgive me, but I didn't see very much technical information on their website.

It comes down to the configuration, if you want to drill down to the page then you can configure this. However, the behavioural data needs to be stored in a database and accumulates quickly. If pages of sites are considered then, for a large network, pre analysis needs to be completed to determine which site pages would yield the greatest value. Generally I would think only the site would be initially configured. Take a look at Atlas Admanager for more understanding.

Is the technology you refer to similar to DoubleClick? If so, that technology is totally dependent on the number of sites where they (DoubleClick) can plant third party cookies.
DPI and Phorm or NebuAd don't even consider or care about that. They inject a cookie on every domain. Also, they read the content of each page. For example, if you went to Amazon, DoubleClick would only know you were at Amazon. Phorm and DPI would know what individual pages at Amazon you were interested in.