Wednesday, February 20, 2019

In this 3-part series, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Amazon Web Services. Today in Part 1 we'll get the data loaded into DynamoDB with the help of a Lambda Function, and we'll also create Lambda Functions for accessing the data. Later in Part 2 we'll set up a web site for browsing and searching the data; and in Part 3 we''ll create an Alexa skill for querying by voice.

Architecture

For those who follow my blog and are feeling deja vu, I recently completed a similar series for Microsoft Azure. Having done this once already will accelerate the effort. This second time around, I'll be going less into the fine details of what we're doing and will leverage some of the prior work.

About the CIA World Factbook Data

The US Central Intelligence Agency publishes an almanac-style reference on the countries of the world known as the CIA World Factbook. There's a wealth of data, and you can learn a lot on the site. I urge you to explore it and drill into the detail. Happily, this data is in the public domain which means we can use it for our own purposes. Note however that you are not permitted to replicate the agency seal; and naturally, you should give proper attribution if you use the data.

The data on the site is not particularly approachable for software purposes, but fortunately a gentleman named Ian Coleman has seen fit to create a JSON edition of the data, which is what we'll be using as our data source. It comes as one big 14MB JSON file, but we'll divide that into a JSON record per country (260 of them).

What We're Building

Today in Part 1 we have two goals:

Get the country data into a DynamoDB table.

Create Lamba functions for accessing the data.

We'll concentrate first on getting our country data loaded into DynamoDB, with the help of a Lambda Function.

Country Data in Dynamo DB

Then, we'll create Lamba Functions are accessing the data at various levels:

Data Retrieval via API & Lambda Function

With the above accomplished, it will be smooth sailing to create user interfaces to the data.

Loading the Data

We want our data both in S3 and in DynamoDB. Let's take a look at our source data, a JSON country record:

We can see the data is very detailed. We've also added 3 fields of our own: key, timestamp, and source. Key is a derivative of the country name suitable for using as a filename or general key; it's the name converted to lower case, with some characters removed (commas, parentheses), and some characters replaced with underscore (spaces, hyphens). Thus the key for "United States" is "united_states". Timestamp is when the data was last collected. Source is just "Factbook"; we add it because DynamoDB expects a field of the document to map to a partition key.

Loading Files into S3

S3 will hold, for each country, the country JSON record as well as image files for flag and map. We don't really need the country JSON in S3 for this project (since we're going to query DynamoDB for country data), but we're going to be importing the JSON from S3 as a staging location when we insert the data into DynamoDB. I've already retrieved the JSON data and split it into 260 separate country JSON records previously, as well as the flag and map image files. All were originally stored in Azure blob storage. You can get a blow-by-blow account of that here. To copy over the country JSON and image files, I first downloaded the Azure blobs using my Azure Storage Explorer tool; and then uploaded them to S3 by dragging them into the AWS S3 console. Here's what our end-result in S3 looks like:

Country files in S3

We now have a JSON document for each country, as well as a flag and map image for each country:

armenia.gif

armenia-map.gif

Loading DynamoDB

Next, we want to get our country data into DynamoDB, one country document per country. To do that, we create a DynamoDB table in the AWS Console named factbook. DynamoDB requires us to think about partition key and sort key, which collectively form our unique key to a record. Although our country document records are very deep, the actual number of records is small: 260. Accordingly, we will use the same partition key ("Factbook") for all of our records. The source field we added to the JSON contains this value, so our partition key field is source. For sort key, we'll use country name, captured in the name field.

Creating DynamoDB Table

In my original project, I wrote a durable function which ran on a timer once a week, processing 260 country records in parallel. We may do the same for AWS at some point, but today we'll be more modest: we'll develop a Lambda Function to create a country record in DynamoDB. The function will be called via HTTP with a key parameter, which will be a country key such as "afghanistan" or "united_kingdom". The function will read the country's .json file that is in S3 and insert it into DynamoDB. We'll have to invoke the function for each country.

Lambda Function to Load DynamoDB Country Record

Our load-country function, written in Node.js, first retrieves the JSON file from our S3 bucket (lines 28-44); the function has a role assigned whose policy grants access to our factbook-data S3 bucket as well our Factbook dynamoDB table. We next replace empty strings with nulls because DynamoDB does not allow empty strings. Next we parse it into an actual JSON variable so we can work with it (line 65). The code adds three housekeeping properties to the original json: key (country key), timestamp, and source ("Factbook") at lines 78-80.

Now we can insert our DocumentDB record. We created the necessary DocumentClient in lines 24-27. Now in lines 90-106, we create a params object containing the table name and document data; and store it with a docClient.push. If no errors occurred, our record is added and DynamoDB now has the country document.When we test our function, it says all is well.

Invoking load-country

..and, we can verify that by viewing the new record added to DynamoDB in the AWS console:

Viewing added county document in DynamoDB

Lambda Functions to Access Country Data

Now that we have the World Factbook data in a DynamoDB table, we can write Lambda functions to query it.

country

The first function we want to write is named country, and its purpose is simply to return an entire country document given a country name. We're writing in Node.js and developing right in the AWS console. Our function is triggered via API Gateway, so that it can instantiated with an HTTP request. We bump the memory to 512MB (the default size is too small for working with DynamoDB).

country function in AWS console

Let's review the code below to understand how it works. We declare a DocumentClient (lines 3-5), which is how we'll access DynamoDB. In line 14, we extract the expected country name in a URL query parameter called name; if for example you want the country record for Japan, you'll add ?name=Japan to the end of the URL. To retrieve the country record, we know that our partition is always "Factbook" and our sort key is the country name. To query the data, we issue a docClient.query (lines 33-46). If successful, the data is returned in the response.

The parameters that are set up for the query (lines 20-31) deserve some explanation. The KeyConditionExpression is our query. We're merely interested in a source (partition key) of "Factbook" and a name (sort key) equal to our country name parameter. name We would normally specify a KeyConditionExpression value this...name = :name and source = :source...except that name and source are both DynamoDB reserved words. To get around that, we use #name and #source, and define those in the ExpressionAttributeNames parameter (lines 22-25). Our query then ends up being this:KeyConditionExpression: '#name = :name and #source = :source'If you haven't worked with DynamoDB before, the :name and :source may be unfamiliar. These are parameters that get replaced by values in the ExpressionAttributeValues parameter (lines 26-29).If the query is successful, we return the entire result. Here's what it's like to invoke country from a browser (note: I have the JSONView Chrome Extension installed which nicely formats the JSON):

Invoking country function from a browser

people

The country function is great, but it's a big blast of data. Perhaps we're interested in a smaller part of the whole. The country JSON has subsections named introduction, geography, people, government, economy, and so on. Let's create a people function to return just the people section.The only area of people that's different from country is the query parameters: we've added a ProjectionExpression that limits the results to the people section of the document.

Here's the result of running people in a browser. Now we're dealing with a much smaller section of the country JSON.

Invoking people function from a browser

We can similarly create sister functions named introduction, geography, economy, communications. etc. In each case, the only change needed would be the ProjectExpression.

population

Let's consider one other example. What if we only need to retrieve a single field from the JSON document, such as population? population lives under people.population.total in the country JSON. Here we can again modify the ProjectExpression, but this time we'll use dotted notation to indicate a path through the document. Once again though we have to deal with the fact that total is a DynamoDB reserved word. We can resolve that with another #attributename shortcut. Here''s what our parameter code ends up looking like:

The above will return just the population value, but it will be wrapped as follows:{ "body": "{\"people\":{\"population\":{\"total\":329256465}}}"}To shorten the result to just be the value, we can change our callback as follows to bypass the containing people and population objects.callback(null, { body: JSON.stringify(data.Items[0].people.population) });Now the result is:{ "total": 329256465}Any time we want to return just a scalar value, we can use this technique of a dotted document path in a ProjectionExpression.

In Conclusion

Today in Part 1 we brought public-domain CIA World Factbook data into AWS, storing country records in DynamoDB and image/JSON files in S3 storage. We used a Lambda function to read JSON files from S3 and inject them as documents into our DynamoDB table. Working with DynamoDB from JavaScript was fast and easy. We did have to learn how to work around a few caveats, including empty strings not permitted in the document data and how to deal with reserved words in queries.We then created Lambda functions to get at the data. We saw that we could return an entire large country JSON, or a subsection of it, or just a discrete individual property. Once we had functions at each of these levels of data, creating derivates for other sections or properties was trivial. Developing Lambda functions, editing and testing right in the AWS console, was also a quick and painless experience. We did have to be careful to adhere to proper JavaScript coding patterns for asynchronous methods such as the use of promises.We now have our data in place and a means to access it. Now that we've laid this groundword, we'll go on in Parts 2 and 3 to create web and voice interfaces so users can work with the data. Stay Tuned!

Monday, February 18, 2019

In this 3-part post, I'm going to show how you can take CIA World Factbook data and use it for your own purposes on Microsoft Azure. In Part 1 we retrieved the public-domain data and loaded it into a Cosmos DB using Azure Durable Functions. In Part 2 we created an API using Azure Functions and Cosmos DB, and a web site that uses the API. Today in Part 3, we'll add analytics to the web site in the form of charts. To implement charts we'll be using Cosmos DB, Azure Functions, Azure API Management, and Google Charts.

When we started this project, one of the justifications for storing the data in Cosmos DB instead of blob storage was search, which was implemented in Part 2. The other was analytics, so that we could query the data in various ways in order to support charts and reporting. That's our focus today.

Although not all records have all elements, the structure is generally consistent across our 260 country records in the database.

To come up with the data queries involved nothing more than visualizing the data we wanted to capture, then refining queries in the Azure Portal's Data Explorer for Cosmos DB.

Area - Largest and Area - Smallest

The first query I created was for Area - Largest, where the goal was to list the top 10 countries with the largest area. Total land area can be found in the JSON under the geography section, area subsection, where there is an object named total. geography.area.total.value will give us the land area in square kilometers.

Geography Section of Country JSON

We can take advantage of the global_rank field and use it in an order by clause in our query. Since the total area is also broken out by land and water components, we'll also return that data (when present). Here's our query:

Imports/Exports - Largest and Smallest

The next set of reports are for imports (largest and smallest) and exports (largest and smallest). Imports and exports are structured similarly in the economy section of the country JSON.

Economy - Exports Section of Country JSON

The query for the Imports - Highest (top 10 countries with the highest exports) is below. Once again there is a global_rank value which makes the order by clause easy. We have decided to return the first (most recent) export value, but also the prior year value.

SELECT top 10 c.name, c.key, c.economy.imports.total_value.annual_values[0], c.economy.imports.total_value.annual_values[1], c.economy.imports.total_value.global_rank FROM c order by c.economy.imports.total_value.global_rank

Imports - Highest Query in Azure Portal Data Explorer

Once again, the sister query for Imports - Lowest is nothing more than adding desc to the end of the same query:

Population - Highest and Lowest

The last report chart query we'll do is for population (highest and lowest). We can find that data in the people section of the country JSON:

People - Population Section of Country JSON

Here are our queries:

SELECT top 10 c.name, c.key, c.people.population.total,c.people.population.global_rank FROM c order by c.people.population.global_rank

SELECT top 10 c.name, c.key, c.people.population.total,c.people.population.global_rank FROM c order by c.people.population.global_rank desc

Azure Functions

With our Cosmos DB queries ready, we can now write the Azure Functions for each chart. We'll be using Cosmos DB input bindings for each function which is where we'll configure each function's SQL query. Aside from that, all these functions have to do is return data, so their code is merely this:

module.exports = function (context, req, countryList) {

context.res = {

body: countryList

};

context.done();

};

Let's configure the report-area-highest function's Cosmos DB binding:

Cosmos DB Binding for report-area-highest Function

Now it's just more of the same. We create the following functions, all identical with minimal code, where the only difference is the query in the Cosmos DB binding:

Report Functions

Creating our functions took almost no time at all. We test them, and they work:

Testing a Chart Function

API Operations

In Part 2, we front-ended our Azure Functions with API Management operations. We'll continue that pattern here. We add new operations to our Factbook API, with the same names as the functions we just created: to run the report-area-highest report, the API path will /report-area-highest.

The only configuration needed for each function is its Backend Policies, where we set the URL rewriting and CORS policy.

Configuring API Operaiton Backend Policies

Once again, things are cookie-cutter: each API has an identical back-end policy, except for the URL rewrite value.

We test our API functions, and they all work.

Testing API Operation

Configuring our API operations was a fast and easy activity.

Adding Charts to Web Site

Now that we've done the advance work of database queries, new Azure Functions, and new API Operations we are ready to add charts to our web site.

Layout

Layout-wise, we already have a banner area with a country select list and an input text box / search button for performing searches. We're going to add another drop-down for charts. That will give a user three ways to interact with the site: 1) select a country, 2) search for countries, or 3) select a chart. On the desktop, all three control areas will be visible at the same time:

On phones, however, it would be crowded to show all three areas. So, we're going to allow users to switch between Search and Charts with a button:

Phone view - toggle between Search and Chart

When the user selects a country, initiates a search, or select a chart, the main display changes accordingly.

The country area is the accordion of topical content (Introduction, Geography, People, etc.).

The search area lists search results. Selecting a country switches to country view.

The chart area shows a chart rendered by Google Charts.

Charts

We'll be using Google Charts to draw our charts, which will all be column charts. Google Charts generates top-notch charts for minimal effort, and is free.

Before we can render our charts, we need to fetch the data for the selected chart. Here's the JavaScript code to call the appropriate API function after a chart has been selected:

The above code sets up the data structures the Google Charts API needs and renders a bar chart. For a single-series chart such as Population - Highest, this is the appearance:

Population - Highest Chart

The user can perform some interaction with this chart. On the desktop, hovering over a bar will provide a label-value tooltip. On a phone, a long-press on a bar will do the same. A click or tap on the bar will bring up that country in country view, just as if it had been selected from the country select drop-down.

Some charts provide multiple series, in which case there are different color bars. Here's the Area - Largest chart, which has 3 bar series (total area, land area, and water area):

Area - Largest Chart

If you rotate your phone or resize your desktop browser windows, charts will redraw to fit the new screen dimensions.

More Data

The site we created in Part 2 only shows a small part of the available country data. During this Part 3 activity, work was done to show more fields. The following were added:

Geography: climate, terrain

People: population growth, birth rate, death rate

Government: chief of state, head of government, national symbol, national anthem

Economy: inflation rate, imports, exports

New Data in People and Economy Sections

I'd like to call attention in particular to the Government section where national anthem information has been added. The World Factbook data includes a nation anthem audio link for most country records. I've added a Listen link to allow users of the web site to be able to listen to the national anthem.

New Data in Government Section & Listen to National Anthem

Other Site Refinements

Lastly, polishing and refinements were made to a number of areas to improve usability.

There is now a shortcut URL to the site: http://world-factbook.davidpallmann.com. The web site is static, hosted in Azure blob storage in just 3 files: index.html, site.css, and site.js. You can see the full source on github (link at end of post).

Most noticeable perhaps is the spinner that appears when a user has to wait. Since we rely on Azure Functions, a user may experience a few seconds' delay when the function they are accesing has a cold start (because it hasn't been accessed recently).

In Conclusion

In this post we built on the front-end work of Part 2. To support charts we first came up with queries for our Cosmos DB database; then created chart functions that run the queries via Cosmos DB bindings; and lastly extended our API from Part 2 with additional operations for the new functions. All of this went extremely rapidly as we used the same patterns for each report. All of our cloud services contributed to rapid development.

With the API operations added, charting was implemented in the web site using Google Charts. With today's work, the World Country Data web site has become more useful and more polished.