Wednesday, June 23, 2010

Ask a Librarian: Why Boolean?

If you've used a library catalog or database, you've most likely been exposed to Boolean operators, even if you've never heard of them before in your life. What's a Boolean operator? Words like AND, OR or NOT that you put between search terms to broaden or narrow your search. But if you're a non-librarian, you've probably wondered why in the world libraries insist on using this seemingly archaic way to search things.

Wonder no more.

Remember last week, when I talked about how searches need to be exhaustive and exclusive? (Meaning they need to return all the results you want and only the results you want.) In order to create a search like this, you need more advanced tools than your typical search engine can give you. When you type something into Google, the search engine does a full-text search through all of its indexed pages for the phrase you typed. By default, Google uses AND to connect the words you're searching for.

Library catalogs and databases, on the other hand, are different machines. They're designed to allow the user to create very precise searches. It's possible to use Boolean logic with Google, if you go to their advanced search page, but Google doesn't really do as well with Boolean--it's not designed to, just as databases don't do as well with phrase searching--they're not built to handle it. (This makes sense--Google searches a wide variety of material, most of it not created specifically for Google, so it has to be as inclusive and forgiving as possible. Databases, on the other hand, search very specific records, each of them created just for that database, so the search device is designed to be more "high performance," if you will.)

So what do the different operators do?

AND--Searches for every record that has both terms in the record. In the diagram above, that would mean it returns everything where A and B intersect. Useful for narrowing your search--making it more focused. You don't just want things about cats or things about dogs--you want things that talk about cats and dogs at the same time.

OR--Searches for every record that has either of the terms in it, meaning it would return everything in the diagram above. This is a great way to broaden your search. You want things about cats or dogs--it doesn't matter which. Typically, you'd put OR between synonyms (searching for fat OR obese OR overweight OR chunky OR chubby OR . . . you get the picture).

NOT--Searches for every record that doesn't contain the search term. In the diagram above, searching for A NOT B would return everything shaded pink. This is useful when you're trying to eliminate something that typically shows up in a search for another term. So if you're looking for information on the Nile delta, you might throw in a "NOT airplane" to try and get rid of results that have to do with the airline company.

XOR--Ah, the famous XOR. It's not used much, but it searches for records that have one term or the other, but not both. In the diagram, this would be everything that's pink or blue, but not purple (the middle section). Think of it as the Boolean equivalent of either/or. I'll be honest with you--I've never used this in a search. Does that mean I fail as a librarian? Shh . . . don't tell anyone!

How do you string Boolean terms together?

You do it by using the wonders of the parentheses. Brush off your old algebra memories and get busy. (A OR B) and C would return everything that has A and C in it, and everything with B and C. A OR (B and C) would return everything that has A and everything that has either B or C. Clear as mud? Try this one:

(A OR B) AND ((C OR D) NOT E)), or, written with actual search terms, (cats OR felines) AND ((cartoon OR animated) NOT Garfield)

Yes, it's possible to quickly get lost with this sort of approach, and yes, databases are working on becoming more user friendly. Ideally, a search tool would be as forgiving as Google but as precise as a database--and I see databases heading that direction. Give them a few more years. In the meantime, know your Boolean operators, and don't be afraid to use them.