Search

Work the Shell - Generating Turn-by-Turn Driving Directions

I'm happy to report that this month, I'm answering a
reader's question about how to script something. Dunno what's up
with the rest of you readers, but apparently writing to me with your weird
and challenging shell-scripting puzzles isn't making the short list
right now. Reader Paul M. asks:

Is there a way to screen-scrape Google Maps direction results? I'm after
the text (turn left at Ho-ho-kus Blvd), not the maps. When I look at a
saved results page, all I can see is CSS and JavaScript code. If I do a
manual copy and paste of the directions, however, the turn-by-turn
directions appear. Got any suggestions on how to grab
turn-by-turn driving directions automatically, Dave?

Ah, those tricky programmers over at Google Maps make this
pretty darn difficult! Poke around at the source pages generated by
maps.google.com looking for directions, and it's clear that
they're using a method=post or other advanced way to hide the starting
and ending points from the URL itself, along with some very fancy coding to
make the Web pages highly interactive.
So to heck with it!

After much digging around and looking at how the different mapping sites
work, I settled on Expedia.com as the best place to get driving directions
so that we'll be able to specify start and stop points via URL and
also understand the output. To get started, check out Expedia's interactive driving
directions in your Web browser at www.expedia.com/Directions.

You can see that Expedia wants an address unwrapped and split by street
address, city, state and zip code (though if it can figure out the location,
it appears you can skip the zip code, as shown in
start above).

Now that we have that, let's use sed to extract just the table of
results, without the other superfluous information. This is done by manual
analysis of the source file and noting that it's all in a table that
starts with this HTML line:

<TABLE BORDER=1 BORDERCOLOR=#E4E4E4 CELLSPACING=0 CELLPADDING=4>

Not surprisingly, the line we seek that denotes the end of the table is
</TABLE>. Here's the code that lets you slice things as desired:

sed -n '/BORDERCOLOR=#E4E4E4/,/<\/TABLE>/p'

Put them all together and save the output to a temp file.
After that, the next challenge is to turn that HTML table into something
you actually can read.

To do that, we're going to turn to a great open-source utility called
Lynx. You might already have Lynx on your system, but if you
don't, grab a copy of the Lynx text-based Web browser from
lynx.isc.org. We'll use that to interpret and convert the HTML
markup to raw text.

Fortunately, Lynx excels at this kind of challenge, as
demonstrated by the working code:

I'll leave it as an exercise to you, dear reader, to create a wrapper
that prompts people for starting and ending addresses and then uses the
curl invocation to Expedia and subsequent invocation of Lynx to display
turn-by-turn driving directions.

Dave Taylor has been hacking shell scripts for a really long time, 30
years. He's the author of the popular Wicked Cool Shell
Scripts
and can be found on Twitter as @DaveTaylor and more generally at
www.DaveTaylorOnline.com.