Where librarians and the internet meet: internet searching, Social Media tools, search engines and their development. These are my personal views.

November 25, 2009

If you've not read my first blog post on the Michelle Obama image fiasco you might want to start there first. I should also point out that I'm going to be referring to some words that people might find offensive in this post - I've used asterisks and don't really think anyone will be upset, but it's your personal call.

Google's concept of images and SafeSearch is fundamentally flawed. Let's take a quick look at how this is supposed to work: "Use Google's SafeSearch filter if you don't want to see sites that
contain pornography, explicit sexual content, profanity, and other
types of hate content in your Google search results." Note please that we're talking about pornography AND hate content. One could assume from this that if you turn ON the SafeSearch filter you won't see hate content, yet clearly - as demonstrated in my previous post - that's exactly what you get. Yet ironically, if you turn OFF the SafeSearch filter you don't get to see the image. So things work exactly the opposite way around to that which any sane person would expect.

Let's look at how this works with other terms - I ran a couple of searches for sexual terms and another two for racial slurs, because I wanted to see how Google Images SafeSearch filter worked, and the results were very interesting. A search for f*ck gave 20 images with SafeSearch turned off, and 0 with it turned on. The same result was found for c*nt as well. This is exactly what I'd expect to happen. However, when I ran searches for n*gger and w*g, the results were very different, with an overlap of 14/20 and 12/20 terms. Clearly these racist terms are not regarded in the same way as the sexual terms; they appear to be less 'important' if you will. Rather than blanket coverage of 0 results, Google is quite happy to display images for racial slurs, but not for sexual content. Remember the 'hate content' element of SafeSearch? Please forgive my hollow laugh.

Google's search algorithm is also broken. If I want to do a search for Michelle Obama, I want to see images of her, yet the first or second image coming up is not of her at all, it's of an image that may once have been OF her, but isn't any longer. If I do an image search for 'bus' I'm not going to be impressed if I then get given a result for a rocket. I would expect Google to tweak the algorithm to fix that; I wouldn't expect to put up with it. Google should actually come clean and say that the algorithm hasn't worked, and pull the image for no other reason that search accuracy.

However, Google isn't doing that. In their own advert that they've taken out for this search they say "Google views the integrity of our search
results as an extremely important priority." Clearly however they DON'T do this, because the integrity of their results here has been blown out of the water, with this wholly inappropriate and inaccurate result being returned.

They then go onto say "Accordingly, we do not
remove a page from our search results simply because its content is
unpopular or because we receive complaints concerning it." In other words, they are hiding behind their algorithms and saying that it's nothing to do with them. Well I'm very much afraid to say that it is because they wrote those algorithms in the first place! They chose how they were going to work, and they sat and tweaked them artifically until they got the overall result that they wanted. So the results are artifically affected by Google employees anyway! They can also make the point, which they do, that lots of people contact them asking for images to be removed, and they only do this if a law has been broken. It would be easy to deal with this - once 'x' number of people complain about an image, simply move it automatically into another category. The 'x' could be quite high to ensure that a small pressure group couldn't easily manage it, and they could always look manually at the image if necessary. You simply cannot tell me that Google hasn't got the resources available to do this.

We're left with one stark and rather nasty conclusion. Google is happy to block access to unpleasant sexual imagery, but they're not prepared to do so for unpleasant racial imagery. That should make everyone a little uncomfortable.

November 24, 2009

OK, there's a big fuss going on at the moment with Google images and a nasty little badly photoshopped racist image of Michelle Obama. If you run a search in Google images for her name (without quotes) you get this image popping up in first place. Google apparently removed it, then decided to put it back and take out an advert talking about it instead. Naturally enough, I was interested in seeing what all of the fuss was about. I did my search, and this is what I got:

No, I can't see the image either. 15 pages scrolled through later and I still can't see anything. Maybe it's because I'm in the UK? I tried both .com and .co.uk with the same result. I tried logging out, in case that was causing the problem. Same result. I tried IE instead of Firefox. Same result.

I'm starting to wonder at this point - I asked a friend to run the search, and she could see it. I then ran the search again in Chrome, and at last I saw the image. My first thought was 'WTF?' and then looked a little more closely - what was different between Chrome and IE and FF? Then I realised... in both of those browsers I have SafeSearch:Off. In Chrome it was on. So back I go to Google and try again, this time with SafeSearch on moderate. This time I get:

And once again, with SafeSearch:Strict I get this:

(I blurred the images myself by the way - if you're desperate enough to see it for yourself, you know what to do.)

Now there are a few things that interest me here. Firstly, I get to see racist images if I HAVE moderation on, and I don't get to see it if I DON'T have moderation on. Isn't that... y'know, the wrong way around? Secondly, it's the same image in both cases, but from two entirely different sites - Prisonplant in one case and Ohot-girls in the other.

Google's coming under a lot of flak on this one - should they allow the image or not? Whatever they do, someone is going to be very upset, but for me, that's not really the question. My question is more along the lines of why are racist images acceptable in some moderated versions and not others - specifically, not in the 'adult/unmoderated' version, but in the strict moderated version? And wouldn't it make a bunch more sense to limit the image to the unmoderated version?

My view on the whole censoring thing - for what it's worth. Google has algorithms that they have put in place, so they're already arranging results in one way or another. They decide on what is offensive and what isn't - naked bodies and graphic sexual images are potentially offensive, which is why they're only available if you turn Safesearch off. However, once you start searching for other types of images, Google's approach seems slightly different - a search for 'the N word' brings up substantially the same images however you have Safesearch set. So images of naked people can be offensive, but racist images are not offensive, according to their algorithms is that it? This is a very slippery slope. Google is very keen to make sure that they don't do things by hand, and they let the algorithms take control. But, and it's a big but, who WRITES the algorithms in the first place? Yup, Google employees. It is disingenius in the extreme to blame your code, and use that as a get out clause when you write the code in the first instance. Google already fiddles around with their code - that's how they have got rid of certain well known Googlebombs - they've played around with code until it gets the result that they want. That's a very thin line from there to dealing with specific images.

Of course, they could use the defense that if an image isn't against the law, who are they to do anything with it? I'd certainly agree on that point - if an image is legal, then should they? Well, they're doing that anyway with hard core porn images, and using the Safesearch option to protect minors. Of course, we then get into the 'which laws are we referring to' which is a whole other can of worms.

In the final analysis, in my opinion, Google has got it badly wrong this time. That image should have been caught and slipped into the 'only view without safesearch' option. To do otherwise is just plain wrong - not just because of what the image is, but because they're not consistent. And if they're not consistent, how can they be trusted with the results they give us. I'm probably not the best person to ask, because my opinion has always been 'Don't trust Google', but there you go.

I've been told about TipTop, which is an interesting Twitter search engine. Pop in your search and you get some interesting material back. Rather than a straight feed of data, TipTop arranges it into several panes of information - 'pro' the subject, 'anti' the subject, everything else, related content, and web based information. There are also several channels that focus on information related to subjects such as Food, News, People, Music and so on.

The results appear below - they're not in chronological order, which I personally find irritating, as I also don't understand the ranking criteria. I'm also not sure why the 'anti' tweets are called piT tweets, but I guess I don't need to know in order to use them.

I'd also like to have a much clearer indication as to the author of each tweet - I have to do a mouseover in order to get that data, which again is irritating - the To/From just gets in the way and seems to serve no purpose at all.

The related content is also confusing - does it actually mean content related to the search that I've just done, or related information about the search I've just done? It seems to be the latter, and I get an interesting wheel of data which lets me see pro/anti information quickly.

The pro/anti stuff is a great idea, except it doesn't work very well, and it's easy to throw the software off the scent, so a tweet like 'Shame I can't go to the CILIP event :( :( :(' may well be taken as negative though in actual fact it's quite the opposite. However, early days, so I'm sure they'll improve.

November 22, 2009

You've probably seen mention of Google Chrome as a browser - it's now becoming an OS or operating system. The video below explains it in more detail, but if you want to get a quick handle on it - Google ChromeOS turns your computer into a dumb terminal. That's not necessarily a bad thing, but that's essentially what's happening.

November 21, 2009

Let's face it - the 'Recent' option on Google results is great, but if you're looking for really recent material, you don't have much control over it. At least, until now. A nice post over at SearchEngineLand explains how you can limit your results to data that Google has added as recently as the last second or so. This is how you do it:

Run your search, and then add &tbs=qdr:X##&tbo=1 right at the end of the URL, with no spaces - let it run straight on. BUT before you try the search, you need to change the X## element, so if you want a search for material added in the last 5 seconds, change that to s5 or n5 for the number of minutes of h5 for 5 hours.

November 18, 2009

Why It's a Bad Idea to Send Huge Files by Email.
Gmail has increased the maximum attachment size to 25 MB in June, but
some people want to send larger files. Daniel (at Google) wrote a
thoughtful comment that explains why it's a bad idea to send huge files
by email:

People who demand large message size limits rarely understand the limitations of the email transmission.
Because of the MIME encoding used when sending binary attachments, your files expand 33% when sent via email. In other words, a 15MB attachment requires 20MB plus the message text, plus message headers.
When you carbon copy 20 of your friends & coworkers, a separate message is sent to each. 20MB x 20 = 400MB. That's half a freaking CD.
If 5 of those friends are on the same small company email server, downloading those messages saturates the entire bandwidth of their T1 data line for nearly 9 minutes. Because each message has separate headers, it isn't easily cached and gets completely downloaded by each recipient.
Compare this to uploading the same attachment to a web server, FTP server, file transmission service like YouSendIt, or video streaming site like YouTube. One copy is uploaded. The download is typically 8-bit so minimal expansion factor. The small business' network can cache the content, so it's only downloaded once then fetched locally from the web caching server.
Bottom line, sending a large attachment via email is relocating using the U.S. Postal Service as your moving company. It is painful, limited, and expensive.

Google is continuing to explore the possibilities of image search. They've recently done work on images and Creative Commons, Google Similar Images, and Google image size search. They're now playing with something they call the Image Swirl, and you can play along by going to the Image Swirl lab page. Now, don't get too excited, because you can't just play around with any old term - but there are about 200,000 terms that you can use. I tried my usual 'Hubble Telescope' search, and this is what I was presented with:

Nice, but not exciting at this stage. However, you can see that behind each image is a stack of others. What Google is trying to do is to sort of types of images, so we have collections of images of the telescope, but also various different types of images that it's taken. Click on a stack, and Google Swirl brings it centre stage thus:

This should actually look a little bit familiar as it's akin to the Google Wonder Wheel that they've had available for some time now. I can continue to click on images and Google will continue to present them to me, while keeping my history available, and this is where we end up with the 'swirl' concept:

I can of course simply click on an image at any point to go directly to it - a mouse cursor over gives me basic information about the image - size and web location.

Swirl is a great tool if you're interested in browsing through a lot of images quickly, and it's also very nice if you want to focus on a particular type of image - searching for buses led me immediately to the option of looking at different coloured buses as well as line drawings. Bing is going to have their work cut out to try and catch up.

November 01, 2009

The first thing to say is that I said 'lists of Twitter users' not 'Twitter lists'. Now that Twitter has rolled out the lists option to everyone there's a lot of discussion about the basic concept of putting people into lists, yet this isn't new - it's been around for a while.

I suppose the first version is the 'Follow Friday' concept where people can suggest people to follow. All nice and cosy, and a lovely idea, but it just doesn't work properly - it's either just as many people you can put into a tweet, or a smaller number with a brief description of why they should be followed. It's also a very casual system without any archival aspect, so if you're thinking that I'm pushing the Follow Friday too far into the lists concept I'd probably agree with you.

The first Twitter list concept that I came up with was when I started to use Tweetdeck; creating groups of people that I could follow in different areas - UK librarians, US librarians and so on. This allows me to just see the tweets from those people, filter their tweets, search on them and so on. Of course, these groupings are entirely private and not available for anyone else to see or use.

Public listings have been available ever since people started to create directories of Twitter users, such as Wefollow and it's easy to see a collection of librarians for example, either by number of followers or by influence. Easy to follow and people can choose to be on lists or not as they choose.

Then we've got the Tweepml concept which works very well with Tweepsearch to quickly create lists of virtually any type. The members of the list(s) are defined and controlled by the creator, although people can suggest that they should be on specific lists. These lists are designed to be public and shared. The limit of 100 people on a single list was lifted a while back and increased to 250. I've certainly created a couple of these lists, but unless they're kept at the front of people's minds it's all too easy to forget them. Obviously it's not a list of tweets, just of people.

Finally we come to Twitter lists themselves. Anyone can create a list, call it what they want, add who they feel like and let anyone follow the list (or keep it private of course). Once you follow a list you can simply click on the link in the right hand Twitter menu bar and read the tweets of the people in the list. This is something that is causing lots of people lots of problems. I don't think Twitter has been very intelligent about the way in which the lists have been rolled out. For example, I can create a list called 'Stupid Librarians' and put anyone I like onto the list, irrespective of their wishes in the matter. Twitter doesn't inform people that they've been added to lists; the only way that you can find this is to click on the link on your own Twitter page. I was somewhat surprised to find that I was on over 30 lists that people had created (and I have no idea how many private ones). I'd need to go through the entire collection in order to check to make sure that I'm not on a 'stupid librarian' list, which isn't an ideal way to work. Of course, if I was on a list and didn't want to be, the only way (other than asking to be removed) that I could get off a list is to block the creator. This really isn't an acceptable option and requires me to do a fair bit of work. Better would be for Twitter to give me the option of being blocked from public lists, or give me the option of choosing if I want to be on or off a specific list.

Some people think that the list concept makes Twitter more dangerous to use. I think this is something of an exaggeration; I don't agree with one of the basic concepts that it's going to make life easier for spammers. They can send @reply spam at the moment, and I see little to suggest that spammers are targetting specific users or types of users - that's not how spam generally works.

What I do find interesting, and haven't seen anyone talking about yet is the way in which lists are totally changing the way in which Twitter works. The whole following/follower concept might now begin to break down. I can follow a list, and read what people are saying without the necessity of following them. In turn people can follow my tweets easily, without having to become a follower. This doesn't particularly bother me, but then I've never bought into the 'more followers the better' concept. This might make it more difficult for people to get noticed on the system though, and I suspect the importance of particular people will skyrocket. New users will be able to identify key players and subscribe to their lists and lurk, without contributing themselves. To an extent they can do that already - I'm followed by lots of people (and follow back), but they seldom if ever contribute to the twitterstream, and I think the Twitter lists concept is going to make this even easier.

Another key element with the list idea is that, as with Tweetdeck, I can now create my own list of 'favourites', keep it private and just refer to that, rather than my entire twitterstream. So I can just merrily add anyone and everyone now on a one to one basis, but still only follow a much smaller number. Again, this isn't a new concept, But Twitter is now making it much easier, and I think it's devalued the entire concept of followers/following, though it could be argued that it was a flawed concept right from the first person who though they were better than someone else because they had more followers.

I'm not sure that it's going to make me use the native interface
any more than I am currently doing - I prefer Tweetdeck for the iPhone
and Brizzly for the web interface, and if I'm keen on lists or groups
they're easy to create in those resources. Twitter is going to have to
work much harder to get me to go back (and see their adverts as a
result) than this.

I don't think lists is going to make Twitter more 'dangerous' to use; I don't think it's going to increase spam in any great amount, but I do think that it's putting a very large nail into the basic premise of the service. If this is detrimental to Twitter in the long run I'm not sure; we'll need to wait a while and see how people end up using the list concept.