Native Geospatial Proximity Analysis Using Xcalar

Large scale infrastructure and software solutions are available to support geospatial queries, which includes geospatial data storage, metadata design, and scaling.

I will demonstrate in this blog, how you can use Xcalar Data Platform for native geospatial proximity analysis without specialized geospatial infrastructure and software.

This blog post describes how:

You can create relational queries that natively calculate a geographic location proximity to other locations. NOTE: For this blog, I will refer to these kinds of queries as a multidimensional geospatial query.

You can calculate basic geospatial queries that take into account the Earth’s radius of curvature.

You can quickly obtain multiple answers and insights from a single Xcalar Design Filter operation.

To demonstrate Xcalar’s geospatial capabilities, I performed analysis on statistical crime data taken from data.police.uk. I specifically wanted to understand what types of crimes occurred within specific police force jurisdictions. I chose this data source, as it is freely available and you can customize the date ranges and police force jurisdictions before you create and download the data. The CSV-formatted data contains street-level-crime, outcome, and stop-and-search information, broken down by UK police force jurisdictions, which suited my requirements for this exercise.

Latitude and Longitude

Because I wanted to use the resultant algorithm on areas within other countries other than the UK, I used both degrees and radians in my equations:

Latitude is expressed as -90 degrees (-/2 radians) to 90 degrees (/2 radians) with respect to the equator.

Dataset

With Xcalar Design, I imported the CSV file into a table and then used the Cast operation panel to change data types for both Latitude and Longitude values to a Float type. I was mainly interested in the Month, Falls within, Latitude, Longitude and Crime type data values.

Figure 1: Imported UK Police Data in Xcalar Design

Figure 1, displays the geo_police_data table that was created from the UK police data source. It contains over 11 million rows, which is a reasonable size for creating an algorithm. It is important to note here, that when you need to cover larger geographical areas, Xcalar Data Platform can easily handle hundreds of millions of rows.

The Latitude and Longitude columns display their values in degrees and the Falls within column contains the police force jurisdictions.

Figure 2: Sample UK Police Dataset Record

Breaking Down the Analysis

The following, lists the tasks to complete the algorithm. As I was creating the algorithm I was also able to answer a few questions:

The jurisdiction with highest crime.

Locate a specific crime from the police force jurisdiction result from step 1.

Define the neighborhood or radius of curvature.

Locate all the crimes within the radius result from step 3.

Filter the result by the crime type.

Filter the result by the month.

Query the results by the specific crime types and police force jurisdiction within a given radius.

Step 1: Finding the Police Force Jurisdiction with the Highest Crime

To show which police force jurisdiction had the highest crime rate, I ran a GroupBy count query on the Falls within data column. As shown in Figure 1, London (as defined by the Metropolitan Police Service) had the highest crime rate.

Figure 3: Top 10 Jurisdictions by Crime

Step 2: Choosing a specific crime - from the results of step 1

To learn more about a specific crime, such as the final outcome:

In Xcalar Design, from the resultant table of step 1, I mapped the Latitude and Longitude columns to radians() and labeled them latr and lonr respectively. I then filtered on the resultant column from step 1 (in this case Metropolitan Police Service). Finally, I double-clicked the Crime ID record for the crime that I was interested in, which opened the Data Browser, as shown in Figure 4:

Figure 4: Picking a Specific Crime Incident

Step 3: Defining the Neighborhood Radius of Curvature

This step looks at the density of the data points and shows Xcalar Data Platform’s flexibility. Xcalar Design lets you adapt your queries and their parameters according to your resultant data. You can also gradually increment an operation until you find the right balance for your analysis. Because I did not want to limit the results to a 1 mile radius, I gradually narrowed my results until I decided on a neighborhood that was in a radius of:

s = 0.125 miles

Step 4: Finding all Crimes within this Radius

The area of the bounded rectangle and radius of a circle on a curved surface is always more than the one drawn on a flat surface. For the next step, I computed the geometries on curved surfaces.

For a given a location of X[lat, lon] in radians, I calculated a bounded rectangle for a circle with the radius s and where the radius of earth r, is a function of latitude. I used a Map user-defined function (UDF) to compute the equation as follows:

Figure 5: Calculating a Bounded Rectangle

1. I computed the radius of Earth for X[lat, lon] in miles by using the following equations:

2. In Xcalar Design, I then created the following Map UDF to compute the earth radius for each record.

3. Once I had created my UDF, I then ran a quadruple filter using Xcalar Design’s Filter panel that searches for all the points in the bounded rectangle. Figure 4, shows the four conditions that I used to filter the longitude and latitude points.

Figure 6: Quadruple Filter Operations for a Bounded Rectangle

As shown in Figure 7, 607 crimes occurred in a 0.125 mile radius of location X.

Figure 7: Result of a Geospatial Query

Step 5: Filter by Crime Type

I ran the Filter operation to remove all empty Crime types and then I ran a Profiling operation from Xcalar Design’s Profile window, which revealed that Violence and sexual offences was the most frequent crime, as shown in Figure 8.

Figure 8: Profiling by Crime Type

Step 6: Profile by Month

I then ran a Profile operation on the Month column, which revealed that June 2016 had the highest number of crimes.

Figure 9: Profiling by Month

Step 7: Performing a Multidimensional Geospatial Query

As stated at the beginning of this blog, with Xcalar Data Platform you can create multidimensional geospatial queries without requiring additional geospatial infrastructure or software. In Xcalar Design, I can run one single filter operation that combines geospatial and other relational constraints.

For example, in Figure 6, I ran a filter on four relational constraints for a bounded rectangle to answer one question. In Figure 10, as shown below, I ran six relational constraints to find answers to multiple questions.

Figure 10: Performing Multidimensional Geospatial Query

With this one filter operation, I did not only check for crimes in the given 0.125 mile radius, I also checked for a specific crime type (Violence and sexual offences) and jurisdiction (Metropolitan Police Service).

Running this one filter operation, gave the resulting values, as shown in Figure 11.

Figure 11: Result of Multidimensional Geospatial Query

Xcalar Design enables the user to quickly gain insights. By providing multiple relational constraints within one operation, users can unearth multiple answers faster and gain knowledge that they may not know existed.

In addition, with Xcalar Design you can run nested queries, disjoint constraints using one or multiple filters, and Map, GroupBy, and Join operations to do increasingly complex analysis within geospatial cubes.

Future Considerations

Since Xcalar does most of the heavy lifting for you in terms of visual modeling and scalability, you can focus on enriching the geometry and algorithms. For instance, instead of using bounded rectangles, you could use bounded polygons, as shown in Figure 12.

Figure 12: Computing Bounded Polygons

For this exercise, I will leave it for you the reader to compute the vertices of a bounded n-sided polygon.

Once the geometry is known, with Xcalar it is just a matter of adding more relational constraints to your geospatial queries.

NOTE: in the case of n-sided polygons, Xcalar recommends that you break down your queries into multiple drill-down Filter operations.

As previously stated, with Xcalar Design you can run incremental operations. This helps you manage your filter operations by spreading out your queries more uniformly in your modeling dataflow. You can then visually track each filter operation and the algorithm’s data lineage from its Dataflow graph.

Conclusion:

This blog demonstrated how you can create multidimensional geospatial queries using Xcalar Design Platform without requiring additional geospatial infrastructure or software. It also demonstrated Xcalar Design’s ability to run combinations of relational constraints in a single operation for quickly providing multiple answers and insights.