Work the Shell - Spreading Out Numbers

The hardest part of any game is coming up with likely, but incorrect, answers. In this month's column, Dave looks at ways to calculate probable wrong answers for the evolving movie trivia game.

The past few months, we've been writing a movie trivia game with
the intent of having it be a Twitter client and sporadically spit out
questions on its Twitter feed of the form “The film Sunset
Blvd.
was released in 1943, 1946, or 1950?”

What initially seemed like the most difficult task, finding the list of
films and then extracting release dates, turned out to be a manageable
one through the expedient of utilizing the terrific Internet Movie
Database site (imdb.com) and pushing the data through some filters and
transformations.

The end result is that with a simple invocation of a script, we can
generate a data file called top-250-films-with-release-dates.db
that looks like this: “Sunset Blvd. | 1950” (and now you know the
answer to the question in paragraph one).

Generating Interesting Adjacent Numbers

Last column left off with the puzzle of generating good
“adjacent” release years. That is, if we're talking about a movie
like Prince Caspian, released in 2008, we want the adjacent
values to be quite close—maybe 2005 and 2007. If we're talking
about Rear Window, released back in 1954, we want the
adjacent values to be spread out more, because offering up 1951, 1954 and
1955 is going to be more annoying and nit-picking than 1940, 1950 and
1954 or similar. See what I mean?

What we could do is simply subtract the release year from the current
year, then apply some sort of multiple to tweak the delta. Then, Prince
Caspian would have an “adjacency” of zero, and
Rear Window
would have one of 54. Let's consider dividing the value by five and using the
ceiling value to see what the calculation for a half-dozen movies produces (Table 1).

Table 1. Calculating Adjacency for the Movie Trivia Game

Title

Release Date

Adjacency

Factor

Der Untergang

2004

4

1

Metropolis

1927

81

17

Sin City

2005

3

1

Chinatown

1974

34

7

Some Like It Hot

1959

49

10

That's not bad. Sin City could have incorrect year values within
one year of the actual release, while Metropolis could be off by as
much as 17 without most people realizing. I mean, if I asked you right
now, “Did Fritz Lang's masterwork Metropolis come out in 1927,
1931 or 1947?”, would you know the answer?

This leads to an important realization: we can't have the values
perfectly spaced out, so the Factor above is the upper range
of a 1..Factor choice. So, the amusing Some Like It Hot can have
incorrect guesses that are anywhere from one year to nine years off.

Okay, enough discussion. How do we implement this in code?

Well, we have the release date of the movie in releasedate, and we
have the current year in thisyear, so here's a simple test script:

This demonstrates an important facet of shell scripting: sometimes
thinking through the solution is more time consuming than actually coding
your resultant algorithm. I could share an anecdote about my boss telling
me to “stop thinking and start coding” in one of my earlier jobs,
but I'll skip it. Just keep in mind that thinking through solution
paths is a critical step in any job.

Now that we have a way to calculate our adjacency factor for a given
movie release year, let's take the next step and actually calculate
possible values:

There are two problems I see with this algorithm as is, however. First,
we can end up with release years in the future (that is, Iron
Man
could end up with a release year of 2009, which is wrong). Second,
for movies released in the last five years, we also could end up with the
actual release year always sandwiched in the middle once we de-dupe the
results. (I hope you can see why that's the case.)

To fix the first problem, we need to add a test to ensure that the
closeyear is never greater than thisyear, which is
straightforward. For the second problem, I think that having a minimum
delta of two, rather than one, gives us a bit more wiggle space, though
any movie released in the current year is basically a gimme anyway for
people who are paying even minimal attention.

Now we have all the building blocks, and next month, we'll put them all
together and create the movie trivia game. With luck, we'll have
space to start pushing it out on Twitter too. In the meantime, if you
want to sign up on Twitter for the game and watch as I develop it,
follow FilmBuzz.

Dave Taylor is a 26-year veteran of UNIX, creator of The Elm Mail System,
and most recently author of both the best-selling Wicked Cool
Shell
Scripts and Teach Yourself Unix in 24
Hours, among his 16 technical
books. His main Web site is at www.intuitive.com,
and he also offers up tech support at AskDaveTaylor.com. Follow him on Twitter if you'd like:
twitter.com/DaveTaylor.

Dave Taylor has been hacking shell scripts for over thirty years. Really.
He's the author of the popular "Wicked Cool Shell Scripts" and
can be found on Twitter as @DaveTaylor and more generally at
www.DaveTaylorOnline.com.