Wednesday, March 24, 2010

2009 Year in Review: Web Optimization

In this second segment of my series, "2009: Year in Review," I discuss issues related to managing my web presence. Some of these methods directly result in income, such as advertising dollars, whereas others indirectly affect income, such my ranking in search engines or by directing traffic towards monetizable content. Nothing discussed here addresses my actual sales and licensing methods, which was addressed in Part 1 of this series.

Web Traffic and Advertising

Traffic to my site has marginally increased by 16% from the same time last year (2008). More specifically, I averaged about 15,000 visitors a day in 2009, but the number would have been much higher had it not been for a technical mis-decision I made during the summer months that dramatically dropped my rankings, which had to do with "keyword stuffing", discussed later. Normalizing for that, my traffic has been pretty steady at around 16-18K unique visitors a day, compared to 14-15K/day in 2008. (Stats can be seen here.)

While that may sound impressive, it's not that simple. There are a number of devils in the details, and sifting through the data is only half the battle. For example, the bounce rate (the rate at which people leave my site after viewing the first page) rose to 8.5%, and the average time on site dropped by 11%. In other words, people are leaving my site sooner than before.

One would think that this is a bad thing, but there's other data that suggests otherwise. For example, advertising revenue more than doubled; in some cases (some pages and topics) tripled and quadrupled. All those people "bouncing" away without spending time on my site are clicking on ads. For 2009, advertising revenue jumped to represent 17% of total income.

One might say that I'm losing potential buyers to advertisers, but that's not what's going on. Most of the ads on my site are not for photography prints or licensing, which is the lion's share of my online transactions. That is, people are clicking on ads because they decidedly do not want anything I have to offer. I don't care that they leave; it just so happens that they're paying me a effective "exit tax." Or rather, the people who are getting my traffic are paying that tax.

Indeed, this turns out to be mutually beneficial: advertisers whose own sites don't rank well for some search terms, actually get a lot more relevant traffic from my site than they would if they paid to get onto Google's search page directly. That is, they'll pay ten cents to a dollar per click to put an ad on my page (through Google's adwords program), compared to twice or three times that much to put the same ad on Google's search results page. They may not quite get the same number of total traffic, but they'll get much more relevant traffic that converts to revenue if they place those ads on my site (or any of the other top-ranked sites). This kind of advertising-indirection costs them less, they get better bang for the buck. Best of all, I get a cut of it. :-)

I should point out that this isn't always so straightforward for advertisers, because targeting a specific site can be costly (in the form of lost opportunity, not necessarily money) if that site isn't consistently well-ranked. That is, if they target a site that appears to rank well sporadically (because their content changes), they could get a boost of traffic for a short time, and then go dark. Since my site has been around for a long time and is generally stable, this risk is not a concern.

In fact, many advertisers come directly to me and pay me to put their ads on my pages, rather than going through Google. There are advertising aggregators that have clients that pay them to do this analysis, and my site is coming up more often in their radar. My advertising rates are not based on clicks or impressions; they're flat fee rates, which advertisers like a lot for a high-traffic site like mine.

This then begs the question: what was the actual end-user looking for that they landed on my site, even though I didn't have what they were looking for? Why am I ranked so highly for them? Isn't that a problem with the search results?

First of all, the bounce rates are still quite low. Google does accurately put users on pages that match their searches. Of the low number of people who bounce, it's usually because they used the wrong search terms in the first place, and Google couldn't possibly know that ahead of time.

Take the Olympics in Vancouver, for example. If you search for "photos of vancouver", I'm currently ranked #8 on Google. (Before the Olympics, I was ranked among the top three.) So, I get a lot of people looking for olympics photos, even though they didn't use the term, "olympics" in their search query. When they don't see such images on my Vancouver page, users click on an ad that gets them where they wanted to go.

Vancouver is only one of a long list of examples. At the moment, I score very highly for phrases like:

"black and white pictures" (Google Rank: #4)

"what kind of camera should I buy" (#6),

"learning photography" (#2)

"photography business" (#1)

"model release" (#1)

"star trails" (#1)

"fill flash" (#1)

"photographing people" (#1)

"selling prints" (#1)

"photography marketing" (#3)

"sahara desert" (#5)

"stairs" (#6)

"photos of doors" (#1)

"photos of new york city" (#3)

"photos of san francisco" (#1)

"photos of kids" (#1)

"photos of united states" (#1)

"photos of patagonia" (#3)

"photos of cuba" (#1)

These are but a few among hundreds of phrases that Google ranks my site and/or pages among the top-five. But the key is that these terms are generic and they themselves do not bring traffic that can be attributed to a single dime of sales revenue.

While they are good for generating advertising revenue, there's an even better benefit to ranking high for generic search patterns: Non-buyer traffic out-strips buyers by orders of magnitude, and any traffic--buyers or not--contributes to the overall ranking of my site. When people search using more specific terms (for content that they do want to purchase), my site will rise in those search results, yielding sales.

So the objective is to have as many pages rank as highly as possible. One key strategy here is that I don't particularly care to rank highly for any single or small set of search terms--that doesn't necessarily benefit me. It's just having my site itself be indexed well for whatever content the search engines deem appropriate. And therein lies the question: how do they determine what search terms should send users to my site? Since they cannot determine what's inside of a photo the way a human eye does, search engines look for other clues to determine the content of a page that otherwise has very little text: metadata.

Keywording

I've blogged before about keywording; it's a huge topic. I'm not going to reiterate points I already made, but to appreciate how and why I employ my keywording methods, you need to at least understand this very basic set of truisms:

Most image buyers use search engines first, stock agencies second. Search engines act like "metasearch" for all the stock sites, as well as many other image sources, including mine, yours, everyone else's. It's best to use keywording techniques advised by search engines, not stock photo agencies.

Search engines are intelligent about search queries. Unlike days long ago, they know all the synonyms that are related to a common root. So, you do not need to include the singular and plurals, all the variants of "dog" (canine, puppy, pooch, etc.), and so on. What's more, intelligent search is becoming more common, even among stock agencies. The need to stuff your images with synonyms and other related keywords to make your list "more thorough or complete" is gone. In fact, attempting to do so can backfire on you. (More about that later.)

Controlled Vocabularies are a complete waste of time. There was once a time when such lists were useful, because it made the job of image search much easier for unsophisticated (brute force) search algorithms. Controlled vocabularies helped you use a small, consistent set of words, which kept you from using dozens of similar words that might come up with different search results when the user input search queries.

While that premise was useful, it only addresses half the equation: the weakest link in search is not you, it's the end-users. Or rather, the search queries they submit. These people are not going to conform to controlled vocabularies. So, in order to map their queries to your images, their input text has to be converted to root words anyway. If the search algorithm is going to do this to end-user queries, it can (and should) also do it with your keyword list. Forcing you to conform to a list becomes a waste of time.

Keywording should take only a few minutes and minimal thought. It's very easy to over-think how people might find your images, or to worry that your images might not be found if someone uses a series of queries that you didn't think of. But this kind of over-thinking can negatively affect if and how your images are found. End-users learn very quickly to be very conservative in their search queries, or they will get a lot of irrelevant results, rapidly wasting their time. They may experiment with creative, conceptual, or "refined" queries to see what they get, but it doesn't take long to learn to "keep it simple." So should you. Keywords should include only the most basic, obvious, and prominent items in the photo. Search engines also rank the quality of photos (and the sites that host them) on their brevity. More than ten keywords will diminish a photo's rank because it usually means that someone is going to stuff the keyword list with unrelated words in an attempt to game the system. This is a common technique among photographers who submit their images to dozens of microstock agencies who do not enforce such restrictions, and who use brute-force (letter-for-letter) search algorithms. Keyword stuffing--also known as "keyword pollution"--has proven to be effective for such photo sites because it allows those images to be found ahead of other, potentially more relevant results for any given search.

In fact, I fell victim to "keyword stuffing" myself midway through 2009. In my automated keyword algorithms, which normally strips redundant or "similar" keywords, I had thought I was being clever by adding in location information (city, state, country) into the keyword list. Yet, what I found was that because the IPTC data already had these keywords, which search engines tap into, and because my keyword list grew (unnecessarily) by three more words, this dropped my rankings down by several notches, which kept me out of the "top fold" of search engine results. It's a huge deal dropping from #3 to #6 or #7 for a given search term, and you can see the results of this in my site traffic data over the summer of 2009.

Needless to say, this cost me quite a bit in traffic, which affected every other aspect of my business, from sales to advertising rates.

You can imagine, therefore, that "effective keywording" (so that images and website are deemed "credible" and ranked highly) is a hotly debated issue in the photo community. It's also one where entrepreneurs try to come up with solutions--some good, some not so much.

One example is a product "imense annotator" (annotator.imense.com), which has some interesting ideas, such as an image-recognition algorithm that tries to guess keywords that might describe the people in an image. It will do a reasonable job in ascertaining the ages, sex and ethnicity of people in a photo, and then attach those keywords to your images. Clever, and possibly quite useful more to a stock agency than an individual. This is because agencies have millions of images to process, none of which have been (or will be) seen by company staff. On the other hand, original photographers that shot the images could do this task quite easily on their own. One can only shoot so many images in a day, and since one has to eventually go through a manual (if not minimal) keywording phase anyway, one can assign the keywords associated with the "people" photos as part of that process. This shouldn't be all that time-consuming for reasonably well-disciplined photographers. And human analysis on such things is always going to outperform a computer. (Yes, I say this as an active programmer.)

(Note: The annotator only does people/facial recognition.)

All other aspects of annotator look and sound cool, but are considerably less effective in practicality. Again, these include "commercial vocabularies", "crowdsourcing" and "controlled vocabularies." As noted earlier, these ultimately contribute to the perils of keyword stuffing that search engines don't like--and which only serve to confuse stock agencies' less sophisticated search algorithms.

Another thing to keep in mind is keywording is often done once, and then you never touch those particular images again. Therefore, whatever you use as keywords today are likely to stick with your images long into the future. But technology doesn't sit still--especially image-recognition and search algorithms. For these, time has a tendency to speed by rather quickly. Before you know it, most search engines will be incorporating the same sort of algorithms like the annotator above. In fact, Google's own image recognition features are rather well developed, and can be seen in action if you use their Picasa image management solutions.

In any event, the point is that keywording is a classic case where "less is more." Images should have minimal base tokens in the keyword list; the search "intermediary" interprets the uncontrolled end-user queries and maps them to the minimal keyword list in your images. This is and will always be the most effective way for images to be found.

While I don't necessarily fault software companies for coming up with creative ways to "enhance" keywording, I draw the line when companies actually recommend methods and behaviors that are wholly counter-productive. An example is Cradoc Software's latest product, fotoKeyword Harvester, a product that does a form of semi-automation of keywording your images. While I am a fan of the company in many ways because it tries to also be the photographer's "coach" on many vital business matters, it has never been on the forefront of the photo business--rather, they seem to be stuck in the 1990s with many of them. Alas, most of their advice, while applicable 10-15 years ago, is well behind the times today.

In the case of the Keyword Harvester, the company sent out an article titled, "best ways to keyword images using concepts and attributes." A quote is: "You'll need to start paying attention to how images convey messages in advertising." They say:

One of the most valuable types of keywords for an image are things called Concepts. A concept is a term that describes non-concrete aspects of your image, an abstract idea. Concepts are used by advertisers to sell their product with the use of your image. They want the consumer to think of something specific when their product is thought of. (...) For example: Wells Fargo Bank uses images of cowboys, wagon trains, horses, and the wild west to promote their business. The concepts for these images are: excitement, freedom, trust, historic, strong, powerful.

There are several problems with all this. First is one I highlighted above in my bullet list: photo searchers (commercial or not) do not use conceptual search terms very often--at least, not with much success as they once did when the stock industry was far smaller, before digital images, and before the internet--a time when almost all stock sales were dominated by Getty Images. Back then, yes, conceptual keywords worked. And this was because Getty internally controlled all keywords for all images. Also, they had their own intelligent search, and they controlled the images in their databank.

Today, images are found in many places, are keyworded by arbitrary staff--or worse, photographers--and the consistency is impossible to centralize and manage. The direct result is that photo buyers don't search the way they once did. (This is an example of Cradoc seems to be stuck in the 1990s.)

It's easy to put this to the test: go to images.google.com and search for the "conceptual keywords" that Cradoc said represented the kind of themes Wells Fargo uses in their imagery. I tried every word on their list, as individual search terms, in pairs, in triplets, and as the entire group. Not one single set of results from these queries contained images that would ever be used by Wells Fargo. They are totally unrelated to all their business models. This is not unique; it's rarely ever the case that conceptual keyword searches yield desirable results. That's why most searchers don't use them anymore.

By contrast, if you search for images based on the actual elements used by Wells Fargo imagery -- cowboys, wagon trains, horses -- image search results show many images similar to those the bank actually uses.

Again, the lesson: keep it simple. Don't get clever. Do not try to anticipate what the searcher might use as search terms. Photo researchers are more afraid of you than you are of them. They are going to keep it simple, too.

I can verify this with my own statistics: My site gets about 19,000 search queries a day on my own search pages. Of the search terms I get, 99% are for very specific items. Furthermore, when someone actually licenses an image from me, and I track their search patterns that lead up to the sale, it is never the case that people use conceptual terms.

In preparation for this article, I interviewed one particular client about how he tends to search for images. He said, "I found that sites are so inconsistent about search terms, that I've learned not to use big words. Just be as specific as possible to the actual things I want to see in a photo."

When I asked him how he chose the particular photo he licensed from me, and what search terms he used leading up to it, he said he wanted a "futuristic landscape." When he tried that phrase (and derivatives, such as "future" and "cityscape") on Google, Getty and Corbis, he got nothing like what he wanted. So, he just got specific: "glowing buildings", which lead him to the image he licensed from my site, which can be seen here.

The image's filename should include the most relevant elements of the image. For example, if it's a photo of a boy and a dog, use "boy-dog.jpg". If you have many such images, use sequences: boy-dog-1.jpg, boy-dog-2.jpg, etc.

Use keywords sparsely. The more keywords you try to associate with an image, the more you dilute it, bringing down its "rank" and relevancy (and credibility) with search engines, or with given search queries. This is because search engines use two key metrics to determine how well a given image matches a search parameter: the ratio of matches between an image's keyword list and that of the search query, and the filename of the image. For example, if the user entered the query, "boy and dog", the search engine sees two words: "boy" and "dog." (It throws out filler words like "and.") Here, the image named, boy-dog.jpg has a 100% hit ratio of query terms with keyword terms, and the keywords were in the filename. Note that the actual photo itself may very well be that of a fish and a boat. (Google doesn't actually look at that, because, well, it doesn't know how.)

Avoid using synonyms and other "related" terms in keyword lists That is, do not attempt to be thorough in describing images with keywords. That's not your job. Search engines already know how to do that. They've got thousands of programmers with PhDs doing that for you (and for the end-user). The more you try to "help," the more you're actually interfering with the process, which reduces your relevancy and ranking.

The good news about keywording is that proper and effective use of keywords is extremely simple and shouldn't require much (if any) thought or time. Using myself as an example, my workflow involves two phases: the edit phase (where I rename all my photos so that their filenames reflect their content), and the keywording phase, where I apply individual words to images--usually in very large batches.

For example, let's say I'm on a photo shoot of a boy and a dog. After editing out the stuff that gets tossed, I'm left with several hundred images, where I then name them just as recommended by Google: boy-dog-lake.jpg, boy-dog-bridge.jpg, boy-dog-1.jpg, etc. In order to assure the highest ratio of search queries to keyword terms, I try to limit filenames to two to six words, though most are either three or four. This is a difficult decision because if I use too many words, I may "match" more queries, but the ratio will be diluted. If I use too few words, I will rank highly for very narrow searches, but may miss more opportunities. This trade-off is a zero-sum game, so rather than try to game the system, I just be honest: determine what's in the photo, and use that as the filename.

Any words that may be "in" the photo, but seems to be less relevant are then added to the keywords list in the image's metadata. And even then, I rarely add more than two or three words, usually modifiers such as "young" or "funny."

Naming files is often very quick because most are batches of similar images. One only needs to browse a given gallery on my site to see the number of similar images that are shot together. The keywording process is similarly fast, also involving mass-assignment of specific, unambiguous words to large batches of images. My rule of thumb is that keywording thousands of images should take no more than 30 minutes.

Most any image-management software can add keywords; I happen to use Adobe Bridge, which is bundled for free with Photoshop or any of the creative suite products.

Note that if you inspect the images on my site, you may notice that they appear to have lots of keywords. Most of these keywords aren't actually in the images that I process--these are added later by an automated post-production algorithm that generates all my static html pages. I do all this to present hints to the end-user for suggested related search terms to stimulate new search ideas.

Maps

The newest addendum to my website is the use of Google Maps. Essentially, each of my web pages incorporates a google map to represent where every photo was taken. While it may seem frivolous, there's been great advantage to the maps. (It also wasn't entirely easy; Google set up the whole mechanism for the sole purpose of presenting maps based on specific street and/or mailing addresses. I have no interest in that level of detail; I just wanted to generate maps for generic locations, like city/state/country. Well, that isn't quite so easy because there are many streets named after cities, states and countries, and there's no way to tell Google maps that I'm not interested in street addresses, just general city maps.)

Though I instituted maps onto my site late in December, the effect its had on my traffic and ranking has been a surprise. Search engines seem to give extra boost to web pages that are geo-tagged--that is, they indicate location. When people search for images where the search parameters include a location, my pages get an additional bump. I've seen about a 10% boost in traffic two months after having introduced geo-tagging onto my web pages, and I look forward to seeing more data to quantify the extent to which geo-tagging has long-term benefits.