Posted
by
kdawsonon Sunday July 11, 2010 @02:37PM
from the do-what-i-mean dept.

adaviel sends a link to work out of Yahoo Research indicating that demographics can help Web searches; e.g. a women searching for "wagner" probably wants the 18th-century German composer, while for men in the US "wagner" is a paint sprayer. The Yahoo researchers claim that by taking user demographics into account, "they managed to get the chosen link to appear as the top-ranked result 7 per cent more often than in the standard Yahoo search." New Scientist mentions this research and two other innovative adjuncts to current search practice: following the mouse cursor as a proxy for eye tracking, and taking back bearings on online criminals by studying the searches they make. (The latter raises disburbing privacy questions: would you want Google trolling through your search data? How about governments?)

If you're logged in to you gmail account, Google provides you with results closer to the ones you've previously look for. For example, searching for an acronym with different meanings seem to return different results depending on who's searching for. Same occurs based on location. Google yourself from different accounts and you may see the variations among the results.

Are you sure? I just searched and the first result is this Slashdot article which clearly says that he was an 18th century composer, right in the summary.

Good heavens, why was this modded Insightful? I think the poster was going for Funny. Anyhow, a quick Wikipedia search reveals that Richard Wagner lived from 1813-1883, making him a 19th century composer.

That would probably be Georg Gottfried Wagner (1698-1756), who also played violin for Bach (1685-1750), another 18th-century composer, and not to be confused with Leonhard Emil Bach (1849-1902), a 19th-century composer.

Either that or KDawson thinks that "18 century" means "1800s."

(I am a musicologist, but I am not your musicologist, and this post is not intended as musicological advice).

I presume that the goverment and google are already trolling through my search data so nothing new for me here. It's too invasive if they start tracking the mouse cursor on my screen but since I don't surf much I'm not too worried.

What would be useful is if I could choose to search from a different persons/demographic's point of view. Whether for ebay, amazon, google.

For example say I am looking for a gift for someone else. Or I am helping someone else search for stuff. Or I'm the sort of person who has rather different interests but with search keywords that overlap.

Same goes for reviews of restaurants/movies/etc. What I like, someone else may detest.

Lastly, it could also be interesting (and even beneficial) to be able to more easily see things from other people's point of view.

And my first thought was: I'm a geek who happens to be female, please don't lump me with a demo by gender!
But really, I suppose that's true of most geeks. I suspect, for example, that the interest in sports of the average Slashdot reader is somewhat lower than "normal". Etc.

I'd prefer the search engine to not assume based on any information about me. A search engine should report the exact same results regardless of who performs the search. I should be able to tell someone "do a Google search for 'Wagner paint sucks'" and they should get the exact same results as I get, assuming they do the search at about the same time (as the results will change over time as links expire or websites change).

If I want to search for information about shitty Wagner paint sprayers, I'll search

Stereotyping search queries causes problems: One, a lot of people lie about their age and other stats. Two, just because it's true for the group doesn't mean it's true for the individual. For example, gays and lesbians have far different profiles than their heterosexual demographically-matched counterparts. Profession can mean a lot to a search too, or even race. And I'm sure this isn't motivated at all by making more targeted advertisements, too! Last, what if you want to know what other people not from yo

Remember this is for the top-ranked results; if you don't match the demographics, it just mean you'll possibly have to click a few more pages.

Besides, search engines (at least Google) give you a way to disable personalization [google.com]. I wouldn't bet that they actually delete and stop collecting data, but at least it probably doesn't apply it to re-order the results.

Just because it's true for the group doesn't mean it's true for the individual.

Improving search results is about aggregates -- returning the best results for the most queries. Individuals don't matter. Google has used this fact to their advantage to show many links to many people while keeping their interface clean: each user only sees three links at the bottom of the main page, for example, but each of n>>3 links displayed in that spot is viewed many times.

If Yahoo can move relevant links higher in the result list for 15 percent of queries, the only concern is about the quan

Sadly, I think you're right about how SEO is practiced, though I would think that REAL SEO (true Scottsman?) would mean ensuring that your site is showing up more for those folks for whom it IS relevant, and less for folks for whom it is not.

In other words if I have a niche site selling foo, then my site is very relevant to folks searching for foo. If there is some real correlation between folks who like foo and folks who also like bar, then my site may also be relevant to them. However, if baz is totally u

The problem with it is that it works.
The more people you get to see your listing, the mor eclick on it, and the more that end up giving you their money (through whatever method)
Thus, modern SEO works to give their customers the best value for their money, which, currently, is higher matches for their chosen keywords.
Do keep in mind, however, that these keywords typically have to be fairly relevant for the rest of it to work. The best SEO companies don't go about rating 'home improvment' type sites for, s

Valid point. Yet on the other hand, implementing those stereotypes -- oops, demographics -- as rules has increased their accuracy as measured by click-throughs. There's a reason for most stereotypes; and when you can build those stereotypes based on objective and measurable past data, there's more value to them.

Stereotype- All black people like to eat fried chicken.Demographic- Out of any given group of black people, there is a probability of X% that any given person will like fried chicken. (Or, that X% of the group will like fried chicken)

So, is it stereotypical to say "Most black people like to eat fried chicken", if > 50% of the group does?

This is the whole problem with the "racism!" and "stereotyping!" cries. Certain groups of people DO have certain trends in their behavior. Obviously, not all of them ac

When Google started to change from just linking the "Did you mean?" results to actually inserting them in place of the results for what I actually searched for, I realized on some level that this might be appropriate for people who don't know what they're doing and aren't paying attention, and that those people might be in the majority... But I didn't bother mentioning that in my angry feedback. =)

Maybe Google doesn't care about customer feedback because they're not in a position where they have to worry ab

What next, a search result that depends on your religion? If you type "Origin of the Universe", you get articles about the Bible if the engine thinks you're Christian, and scientific material otherwise?

They need to understand there is little value in subjective data. Their results are already biased enough, they should take steps to fix that, not make it worse.

Just imagine trying to share tips on finding things with someone. "Well, it helps if you're a male Atheist, otherwise I'm not sure how to find things related to this." Again, smart search engines are worse than dumb ones, because you can't predict how a smart one will respond to your query. Either it gets it right, or it gets it wrong and there's little insight you can have into why. Give me a dumb tool that does what I tell it and whose behavior I can predict and thus adjust to.

We already don't know how Google works. If you want to tell someone about something, you can give them a link, or you can log out of your Google account if you are doing this on Google and this comes in the way. This technique allows to give people the link they are looking for more often than if it isn't in use, and that's exactly what a search engine is about. I'm sure you can opt out and most people using search engines aren't as knowledgeable about what they are doing as you might be.

I don't disagree with the general principle, but I have to wonder if 7 percent is worth the time, effort, and privacy issues involved. Also, note that the 7% is of a specific 30% subset; the actual value for all queries is 1.5%. I then have to ask how many of those 'upgraded' top-ranked results were already near the top (i.e. in the top 10/first page of results). I feel that the whole idea is getting less fruitful by the second...
- S

They must have my demographic setting wrong. Half my searches for naked women come back with women's undergarment stores.

Joking aside, when you've got multiple people of different genders (such as in your average multifamily dwelling) using the same computer, such demographic results won't work too well. I wonder if this might explain, in part, why my search results really are less pertinent when I'm not signed into my gmail account.

If you're searching for something where this would help - like home depot products and you fit the demographic you are in then great - add a button that keeps you in your area and helps you avoid german composers.

To me though, this would be very restricting if I'm truly trying to look up something I (and therefore maybe my demographic) knows just a little about. Steering me back to results that I already know about would get to be very annoying when what I am looking for isn't usually searched by my demogr

When I was living in France for a while (job related), I was quite annoyed by all those websites that assumed that because my computer's IP was in France I wanted to see the site in French, even if the site was a.com and I explicitly tried to click the "English" link. (My French is good enough to buy some baguettes with rillettes, but not for reading technical articles.)

This goes into the same direction: It works in many cases but when it doesn't, it will piss off the user.

THIS! I too have major hate of forced localization, everytime I set-up a new browser and load up Google, it goes to google.de (I'm in Germany, I speak the language well enough, but I want the content that I want, you stupid f'ing websites!). Even worse is Comedy Central and their South Park clips, an English-language blog embeds a clip from a South Park from Comedy Central, I click play, and guess what happens? The clip is dubbed in German! Aaarrrrggghhh!!!

... this idea smacks of a tool that's trying to be *too* helpful, and ends up getting in the way. Kinda like the old microsoft paperclip. I went and turned off this function in google accounts when I realized that my search results were being shaped based on my history, since that partially defeats my expectations of how a search engine behaves, and degrades the utility, insofar as the utility (to me the user) is based on receiving an unbiased sampling of the matches.
I'm also troubled by this trend in the way that google delivers their news offerings, it seems that the logical progression of this is that we will mostly only be exposed to material that fit our highly individualized pre-existing reality bubbles.

> I'm also troubled by this trend in the way that google delivers their news> offerings, it seems that the logical progression of this is that we will> mostly only be exposed to material that fit our highly individualized> pre-existing reality bubbles.

You don't have to be logged in to a Google account to use Google News or Google Search, you know (in fact, you needn't even accept cookies). As for the "highly individualized pre-existing reality bubbles", that's why people read Huffington Post/Fo

The first thing I thought of when I read Wagner was the popular brand of jeans.

There was/are gender predictors out there that will look through your search history and try to predict what gender you are. They were mildly successful (though dead wrong in my case). I think I prefer Google's more invasive yet more accurate method of paying attention to which results I click on and giving me more of the same without regard to gender or age. I DO like getting local results though.

Search history presents a great potential for loss of IP. I do technology development in an area of considerable interest/value. From looking at my search entries, it would be pretty easy to determine the directions of my development work and anticipate it. It's clear that search history mining is gonna happen. I'm interested in anonymizing my search activities as a result.

A search engine is supposed to find things which fit the regexp that you request.
Often someone will tell me in a forum to "search for x in google", what happens when the results are not exactly the same worldwide because of this technique?
Also, there are loads of people that use proxies and so on to search the web. (like people in china) Their demographics would appear all skewed because it would seem that someone in the proxy's country of origin is requesting to search for webpage x.
I don't agree with

The search results are not just a regex matching. A modern search engine, like Google's, returns a ranked list of search results to you, and this ranking already has bias: the Pagerank algorithm sorts the results based on how popular the page is, as measured by the number of incoming links to that page. Of course, that is the general gyst of Pagerank as of the Google founders' research paper back in the late 1990s, and undoubtedly Google and other search engines have fine-tuned their algorithms since then to return "better" results to the user. But the point is still that there is already bias in the results.

Make no mistake that Google has not already thought of similar search result ranking algorithms similar to that posed in this Yahoo Research paper. The difference is that Google does not have a research arm like Yahoo, so they do not publish ideas like this. In hindsight, the Google founders were foolish to publish their Pagerank algorithm in the first place, but they were still at Stanford then.

Or maybe they were not so foolish since advertisers and others can actually trust the ranking. Secrecy isn't always good for business, even if you're absolutely dependent on some business secrets perhaps the core of what you're doing is best not to be secret, only the finer details on how to do it well.

The search results are not just a regex matching. A modern search engine, like Google's, returns a ranked list of search results to you, and this ranking already has bias: the Pagerank algorithm sorts the results based on how popular the page is, as measured by the number of incoming links to that page.

That's fine, because if I'm doing a two-word search, I probably want results that are more popular, rather than some random obscure websites that happen to contain those words. That's a perfectly valid way o

This would not be an issue if Google simply did not save that information. Sure, I know: they say they want all that information for "targeted advertising". BUT... surveys have shown that people do not want "targeted advertising" in the first place! Despite claims of the "benefits" to consumers, turns out they're not interested if it means losing privacy.

I actually like targeted advertising, as it helps me find out about things I may be interested in. However, I don't see why anyone needs to save any information long-term to do this. For instance, on a Google search, they shouldn't need anything more than my most recent search (or perhaps the searches I've done in the last few minutes, if I'm doing several searches with progressively-refined terms) to find things I'm interested in. I don't want ads based on searches I did two weeks ago.

Trust me, you DON'T want the paint brand. Wagner sprayers are total pieces of shit that only splatter paint all over, making a big mess. Read the reviews; they simply don't work as advertised. I think their whole business model is relying on people to listen to their ads and buy their crappy sprayers, try them out, find they don't work, and then throw in them in the trash because it's easier than taking an hour to clean up the sprayer to return it to the store.

Sorry, just trying to do what I can to steer people away from those horrible Wagners. I'm a very handy guy--between fixing houses, rebuilding car engines, and woodworking, I've done it all--and the Wagner sprayer is the biggest POS tool I've ever tried.

How are the search engines capable of doing this on their own? It needs to be remembered that almost 80% of internet users (in India at least), use dynamic IPs. Most ISPs here charge extra for static IP and most users just don't bother - what use would the average layman user have for a static IP? I'm assuming that's how it is in most other places too. Correlating searches and search patterns with demographic details needs active cooperation from all ISPs, isn't it?

I absolutely don't mind a search engine giving me an option to interpret my search, but it would be terrible if I can't switch that option off.

How many times do we search for one keyword (or even a string), spelled exactly so? Just like in a library catalogue. The last thing we want is some algorithm applying an undocumented filter to our search results.

It's bad enough that Google insist on fuzzyfying that string (even when you put it between quotes), but when it starts interpreting my search intent bas

e.g. a women searching for "wagner" probably wants the 18th-century German composer

As would anyone taking or interested in Philosophy [wikipedia.org]. Of course this is an example of a terrible search query, like searching for "strange" without specifying "quark", but I'll take the generalized results all the same, thanks. Please don't tell me what I want.