Turn Text Into HERE Maps with Python NLTK

What is in your Top 5 travel destinations? I asked somebody this recently and the response surprised me. The answer was very specific and unfamiliar.

Venice

Mdina

Aswan

Soro

Gryfino

OK, I know Venice but I’m a software engineer in Northern California and haven’t studied geography in quite some time. I have to admit I have no idea where some of those places are.

Maybe you have found yourself reading a travel article that beautifully describes a glamourous locale, places to stay, visit, and where to eat. These articles can be swell, but a simple map can go a long way to add the context of place that some expertly crafted prose alone cannot do. If the author or publisher didn’t include a map for you, Python can help.

Solution

To solve this problem we will stand up a Python Flask server that exposes a few APIs to

Download a given URL and parse the HTML with BeautifulSoup.

Extract locations from the text based on some clues with the Natural Language Toolkit (NLTK).

Geocode the location to determine a latitude and longitude with the HERE Geocoder API.

If you aren’t using Virtual Environments for Python you should be. You can find more from the Hitchiker’s Guide to Python to get off on the right footing. You’ll want to initialize your environment with the libraries in requirements.txt which can be done with pip install -r requirements.txt if the requirements.txt contains the following dependencies.

Flask
Flask-Script
gunicorn
nltk
requests

App

We will use manage.py as the main entrypoint to our application. It looks like the following listing:

At this point we should be able to run python manage.py runserver and have proof of life. If you use your browser to go to http://localhost:8000/healthcheck we should get a response that confirms our server is up and has our app_id and app_code properly configured.

You may not want to display this once you hit production but is fine while we’re at a “hello world” stage.

Text

For the purposes of getting started I will use a simple text file with just our locations from before.

Extract

We need to extract text from HTML and tokenize any words found that might be a location. We will define a method to handle requests for the resource /tokens so that we can look at each step independently.

We’ve trimmed the wordcount down dramatically from the original article but there is still much more work that could be done to fine tune this recognition process. This is good enough for a first pass though without adding more complexity. There is obviously still some noise and not all of these are locations, but that's what we can use the HERE Geocoder to help with.

Geocode

The HERE Geocoder API very simply takes a human understandable location and turns it into geocordinates. If you put in an address, you get back latitude and longitude.

For simplicity and brevity I haven’t included any of the error / response handling you should do here. I’ve also cheated a bit by just storing the image to the local filesystem for illustration.

Now by calling this endpoint with a comma-separated list of latitude, longitude pairs it will return a map with all of the locations having markers.

Place names without additional context can be ambiguous so in some cases there was more than one match. This map is only showing the first match, despite how much fun Venice beach may be.

Summary

The reason for making /tokens, /geocode, and /mapview separate endpoints is that this illustrates how you might setup microservices with Python + Flask for each operation you want to perform. This would allow a deployment to scale them independently.