Web 2.0: Consistency, Relevancy and Reliability

I’ve seen this question from varying folks over the past year or so as the next obvious evolution in the “tagging” craze — “Why don’t we just tag everything and then everything will be searchable?!”

Great question, because it has implications for how one develops websites in almost any field, but it makes for especially interesting fodder in health and mental health.

Should we just tag everything and call it a Web 2.0 day? Do Web 2.0 concepts, such as combining social networking and tagging together, bring something new and valuable to the technology table?

What is Tagging?

Just to bring you up to speed… “Tagging” is the action of adding a bunch of keywords to a piece of content. Content can be a photo (which rarely has internal keyword information embedded in it), a link (again, rarely has any way of expressing taxonomy or categorization internally), or anything you can think of. Then, when you go to search on that keyword, ostensibly you’ll find the stuff you want in some “better” manner. Better is, of course, a purely subjective term, as you’ll see below.

With the services that have been popularized by this notion — namely Flickr and Del.icio.us — the tagging makes a huge amount of sense. People have been struggling for years on how to identify images and photos because outside of the filename, there’s been little intelligence to indexing systems of photos. (Yes, there is metadata, but the Average Joe doesn’t know how or can’t be bothered to use it consistenly.) Flickr simply allows you to associate pieces of text with photos. One could make a similar argument for Web links (URLs), since outside of the title of the link, it’s hard to add more categorization information to the link without external help (e.g., folders).

One of the keys of tagging services such as these is that owners can add keywords to their images, but so can anybody else. The more people who tag the same keywords on the piece of content, the more “relevant” that piece of content is to a search on the same keyword.

Why Tagging Works for Photos, Videos and Bookmarks

There’s no mystery as why the most popular services online today that actually use tagging in a useful, integrated manner are for photos, videos and Web links. These pieces of content are notoriously difficult to keep track of and organize on our computers, much less on the Web. Web browsers have given lip service to helping people organize their bookmarks or favorites, but nobody has every developed a creative solution that “just works” until Del.icio.us came along. The same is true for Flickr or vimeo.

When the network of people who use these systems are just ordinary folk and relatively small in number, I believe these systems can work fairly well. But, like other Web 2.0 projects, when it comes time to scale, they appear to lose something in the transition. I believe that unless such projects are designed from the ground up to address the concerns I’m about to outline below, they are bound to run into scalability issues from a usability standpoint.

Quality and Relevancy Matter in Health

Unlike tagging a photo of a castle with the word “castle,” people want reliable and factual information about health concerns, diseases and mental health issues. They want relevant results, but they also wants results that haven’t been tainted by others with a hidden agenda. Health (medical, and mental health) concerns are more complex than, “X causes Y and Z is the usual treatment for it.” Anybody who has worked in this field for more than a few years understands that even as we learn more and more everyday from reams of research, the relationsihps and knowledge becomes more and more complex in medicine and mental health care. So concepts such as relevancy, consistency, trust and reliability are key components in this area of the online world.

One way that Web 2.0 services try to address issues of bias and usefulness of a registered user’s tags or votes is by employing a rating system of registered members. Unfortunately, most such rating systems are fairly simple things which will be fairly unreliable if the community is still relatively small. Even as the community grows, ratings can be connected more to seniority and frequency of behaviors within the community than as an actual indication of that person’s quality of tagging or responses. Often, there’s little differentiation in these rating systems between 10 bad people rating 1 bad person especially good and 10 good people rating 1 bad person especially bad. The former should carry little weight, while the latter should carry extraordinary weight. But few user social networking systems understand this kind of value relationship.

Examining Results from Flickr and Del.icio.us

I was curious as to how two of the more popular Web 2.0 applications, Flickr and Del.icio.us, answer people’s health or mental health questions. Flickr is, of course, a photo sharing application, so its relevancy to health concerns is going to be limited. But by using it as an example, we can learn about the strengths and weaknesses of a system like Flickr, since it epitomizes the very definition of Web 2.0 services.

So let’s see how these two popular, Web 2.0 applications stand a random test of relevancy and accuracy. Let’s start first with Flickr, a service I use regularly. Flickr is interesting for more than one reason, because it doesn’t even pretend to make an attempt at the idea of “relevancy,” since every keyword or tag is user-created (using judgment that spans the quality spectrum). Instead, it allows you to sort through photos by “most recent” or “most interesting.” Most interesting is Flickr’s way of getting at a concept similar to relevancy:

There are lots of things that make a photo ‘interesting’ (or not) in the Flickr. Where the clickthroughs are coming from; who comments on it and when; who marks it as a favorite; its tags and many more things which are constantly changing. Interestingness changes over time, as more and more fantastic photos and stories are added to Flickr.

So let’s see how Flickr stands up…

Browsing for Photos on Flickr

Keep in mind in the samples below that I ran on Flickr, there’s very little health or mental health information one could search for as an image. So I went with just random topics that came to me from hobbies, interests, or things that seem to have been on the public’s mind. I created my list from scratch on a piece of paper, and ran the same search on Flickr as a non-logged-in user (visitor), logged-in user, and on the Google Image search. Also, I understand that Flickr is going to have a lot less photos than Google does, since Flickr is reliant on people actually uploading such photos to the Flickr servers. I’m more concerned about the quality of the end result.

As a frustrating aside, it’s virtually impossible to find a simple search box for Flickr once you’ve logged in. You have to click on a “Photo Search” link in the bottom list of a dozen or more links in order to get to a search box. Search results also appear different if you’re logged in than if you come to the site as a visitor.

Search term: George Washington

Public search: 543 photos found

Logged-in search: 462 photos found, relevancy seems to have taken a hit

Actual photos of George Washington? They’re in there (pictures of him on a dollar bill, a plate, a statue, etc.), you just have to dig a bit. On Google Image search, the entire first page is covered with photos of George.

Search term: The Space Needle

Public/logged-in search: 0

I accidentally wrote “the space needle” instead of just “space needle.” For whatever reasons, Flickr is very literal. It can’t know when you’ve put in unnecessary words. Google Image search had no problem returning results of the Space Needle in Seattle, as did Flickr after I removed the offending “the.”

Search term: Iraqi soldier

Public search: 0

Logged-in search: 21

Now things are getting interesting. Why does Flickr show some photos only when logged in? The only settings available to me when I upload a photo are to show it to Everyone, or to show it to only specified friends or family. I don’t see the option to show it only logged-in users. Perhaps there’s some political or other reason for this occurring, but I did find it odd. Of the 21 photos I looked at, most of the soldiers appeared to be American, not Iraqi. In Google Image search, I found over 6,500, with plenty of actual Iraqi soldiers shown.

Search term: Grand Canyon night

Public search: 0

Logged-in search: 6

So now I know it’s not for political reasons that logged-in users are seeing something different than public users; it must be for some technical reason. But nothing in the FAQ or other help sections on the website explained this discrepancy. The six photos displayed are beautiful images, and capture the kind of memory I was hoping for. You’ll also find 1750 photos in Flickr for “new york night” when logged-in, but only 1 on the public search.

Search term: Gibson guitar

Public search: 8

Logged-in search: 396

Big discrepancy again. I’m beginning to suspect that Flickr is trying to entice people to become registered users through this mechanism, but oddly doesn’t say so when viewing the 8 results here, or the 1 result for “new york night.” It seems also that the more specific you are about objects, like this example, the higher the quality of the results.

Search term: MRI scan

Public search: 0

Logged-in search: 10

One of the few health-related searches I could actually look at (unless you want to see lots of people in hospitals or people showing off their surgical scars via “surgery”). Some scans shown, as well as photos of the MRI scanning machine.

At the end of this short exercise, I think Flickr remains a great service if you’re looking to upload your photos and not have to pay for storage or such. But as the results illustrate, to me at least, that there’s a fair amount of unexpected variability in the quality of the search. Almost like it is idiosyncratic. Almost like it was put together by people who didn’t necessarily look at the world the same way, or weren’t on the same page.

I think that can be useful and beneficial for things like photos, since as previously mentioned, it’s been so darned difficult historically to categorize them effectively and efficiently. Tagging seems to largely work in this context. For the record, though, throwing your images on a Web server and allowing Google to take care of the job also seems to be pretty effective too. And Google seems to be a bit smarter when it comes to figuring out what a person really is searching for or means. Flickr could easily improve in this latter respect.

So Flickr is a useful and valuable service to people, at least for those needing to upload and organize their photos. Perhaps a little less so for those searching for a photo. Let’s examine another popular Web 2.0 service, Del.icio.us.

Del.icio.us: Design for Everyone?

The challenge in social networks where tagging is a vital component and at the core of the service’s functionality is to design it from the ground-up for everyone. That translates into designing not only for the good-minded individuals who want to use the service for its intended purpose, but to also walk through typical user scenarios informed by the trials and tribulations that other services have experienced. History teaches us much, if only we would listen.

It is ridiculously simple to game Del.icio.us today. As long as you’re interested in a set of keywords or a concept that isn’t extremely popular (as apparently many health phrases aren’t currently), anyone can put any result near the top of the search results listing within 24 hours (apparently only because it takes 24 hours for Del.icio.us to re-index its database). In 5 minutes I had setup 10 new dummy Del.icio.us accounts and in another 5 minutes, had them all pointing to a dummy page for certain keywords (depression drugs). The next day I went to the site, typed in the health phrase I was targeting, and voila! There it was at number three:

Del.icio.us was, apparently, not designed for people looking to get their own pages near the top of a search result. And yet, we’ve known for over a decade now that if you place any search interface on any database, people will try and use that system to get their own results near the top. It’s unclear to me why it was so incredibly easy to do what I just did. The only thing stopping more people from doing it is that Del.icio.us isn’t yet quite on most people’s radars (but it is growing in extraordinary leaps and bounds).

But let’s assume Del.icio.us is run by some very smart people who will figure out simple controls to stop the abuse that I just illustrated. What about the regular results, ostensibly tagged by ordinary folks looking to help others out? How relevant are they and what kind of quality can we find here?

With Del.icio.us, it’s far more easier to run health and mental health searches. I chose a term that my own users often type into my website’s search engine, so I know it’s a commonly searched-on term. On the result I gamed, “depression drugs,” let’s look at the other results that bubbled up through tagging:

Although the title is listed as “Health education: Stress, Depression, Anxiety, Drug Use” it’s actually a website that is an online book entitled, “How to Survive Unbearable Stress” which was published in 2004. It’s certainly an interesting site and appears to have useful information on a cursory review, but its relevance to “depression drugs” escapes me (I couldn’t find any content that discussed drugs commonly prescribed for depression).

The second result is also from 2004, and is a news artcle about the publication’s belief (supported by some questionable assertions) that at one time, President Bush was taking antidepressants (and that is somehow “news”). Interesting to someone looking for political conspiracy theories, but completely irrelevant to someone searching for depression drug information.

My gamed result pointing to a Geocities placeholder page.

The fourth result, a journal article from PLoS Medicine, is the most interesting of the first 10 presented, by far. Well, to me as a researcher and writer in this field, it’s interesting. To someone looking for depression drug information, it’s probably a little too academic and lengthy for most to read through or care about. It’s also offering a different kind of political view, one debated within the field itself. I’d argue this result is both relevant and of high quality, but not exactly what I was expecting.

The Erowid Splash Page is some type of personal page “documenting the complex relationship between humans and psychoactives.” It’s the sort of thing I would’ve expected to find from a search done in 1996 or so on any of the available search engines at the time, because it’s sort of a fringe website that mixes personal experience with actual information. Probably somewhat relevant, but I’d certainly question the accuracy of some of the information found on the site.

BBC News article from Feb. 2005 that describes a study that found St. John’s wort, a common herbal remedy for depression, as effective as a specific type of antidepressant. The article pays some lip service to the overall mixed picture of research on this herbal remedy.

A Pravda news article from March 2005 that describes “whipping therapy” for the treatment of depression. I won’t even comment on how ridiculous this is as a search result.

This is a blog entry from Oct. 2005 about a study on depression, illicit drug use, and teens. I’m beginning to see how a simple word like “drug” used as a tag can lead to confusing and mixed results.

A press release from Eurekalert from Dec. 2005 about how a new antidepressant drug is shown to increase the brain’s “own cannabis.”

The last result on the first page is a blog entry about the same study, from BoingBoing.

So in this example, we have 7 of the 10 results related to a recent news article or study about some finding related either to depression, drugs, or depression and drugs, and sometimes antidepressants. We have one planted result (mine), and we have one online book that has little to no relevance to my intended search (apparently because of the many meanings of the word “drug” I chose to use). And we have one “personal page” that seems to be an interesting mix of all of the above. Clicking on further results pages doesn’t help me find any information about the drugs available to treat depression.

Okay, so that was a mixed bag. Let’s try a search phrase that is bound to be more specific and helpful, “anxiety symptoms.” Seven of the top 10 results are spam, 1 is for a specific anti-anxiety herbal remedy that is available for the treatment of practically any mental health issue you might have, and two are good quality, relevant results. Additional pages were full of more spam. Definitely not a good return on quality for this set.

“ADHD treatment” gives us two (repeated) blog entries, two (repeated) articles on ADHD treatments, two (repeated) tables of drugs prescribed for a variety of disorders (including ADHD), one news article, one blog entry, one journal article, and one very questionable treatment method. Additional pages seemed also to be of similar high quality.

Once someone is diagnosed with cancer, they often go online to find an online support group to talk about the emotions they’re experiencing. “Cancer support groups” seemed to be a reasonable way to capture this query. Five of the top ten results went to cancer support groups. Three went to large, popular websites, including one to Del.icio.us itself. One was a blog entry, one went to a news article. Only 12 results were listed.

The results from Del.icio.us show a mixed bag. Some searches provided as relevant and useful results as you’d expect from Google or the like. Other searches were full of spam. Still others were half and half. Quality seems variable and, again, idiosyncratic. Are we starting to see a pattern here?

The Masses, Value and Variability

The random examples I’ve covered here — Flickr and Del.icio.us — seem to vary widely in consistency. These sites, instead, are characterized by being idiosyncratic. Remember, these sites were designed first and foremost as services to individuals — to allow a person to upload photos or bookmarks to better track them. I suspect their public nature (allowing you to search for other tagged items similar to your own) was almost an afterthought. Yet it is this afterthought that has nearly as much hype as the actual, originally-envisioned service. It’s great to have a service personalized for one’s own needs. But why make it publicly searchable unless the creators of these services believed such services brought a particular value to the rest of the world?

That value, whatever it is, seems to vary with a fair amount of variability within results as well. Perhaps that is a result of these services’ relative sizes, yet both are highly ranked by Alexa and Flickr was even acquired by Yahoo! So it’s not like these are small, inconsequential services. They are growing and have more traffic than 95% of websites online today. So I’m not sure their inconsistent and idiosyncratic results can be explained by lack of traffic or users. Perhaps it can be explained, in part, by their very nature of their design — they are designed to emphasize unique behavior.

In some ways, services that rely on tagging for categorization are amazing Orwellian, turned upside down. Through sheer numbers alone, any group of people can change the usefulness of an understood term or word. If most people tag a photo of an apple as “banana,” Flickr will recognize it as a banana because most people said so. It’s fun to think of the absurd things one could do with such knowledge, but it turns darker when you contemplate the number of groups who have an agenda they’d very much like the rest of the world to read more of. Groups, for instance, that want to suggest that any mental health treatment that involves medication is wrong and should be avoided. A small, organized group of such individuals could easily cause a great deal of havoc in a system such as Del.icio.us. (And, until recently, such groups did exactly that over on Wikipedia, in different subject areas.)

The Value of Consistency for Health Information

Consistency is a fundamental take-for-granted component of the online world. And it’s an important component to any service that is trying to offer something of value to others. Imagine walking into a restaurant and getting great food one time, and lousy food the next. Would you keep visiting that restaurant time and time again if you couldn’t rely on some basic consistency in the food’s quality?

Now take that example and apply it to my world, health and mental health. Imagine if we relied on a world of health or medical information that varied in consistency — some of it was great, some of it was so-so, and some of it was just plain wrong. How would you know what health information was valid, trustworthy and helpful? Bad or poor information often looks fairly valid to others who don’t know any better (as Wikipedia has shown us). How could you differentiate the “plain wrong” information from the “so-so” information unless you knew the field yourself? How can others do the same thing if they don’t have that specialized knowledge?

I’m certain Web 2.0 services will improve and with those improvements will come improvements in quality of the information and services they provide. I’m less certain, however, about whether the consistency of the information they offer will automatically improve as well. The idea of consistent, reliable and relevant information that isn’t easily biased or altered (even subtilely, as clever people are wont to do) hasn’t seemed to yet permeate throughout the Web 2.0 culture. Much of it seems to have been focused on, “Look at the cool things we can with pulling technology and people together in this manner!” Not enough thought has been given to how people might use that service to forward their own agendas, as human beings like to do.

Related Articles

About John M. Grohol, Psy.D.

Dr. John Grohol is the founder & CEO of Psych Central. He is also an author, researcher, and expert in mental health online, and has been writing about online behavior, mental health and psychology issues -- as well as the intersection of technology and human behavior -- since 1992. Dr. Grohol sits on the editorial board of the journal Computers in Human Behavior and is a founding board member and treasurer of the Society for Participatory Medicine.