GIS Crawling Done Easy Using Python With PostGreSQL

Problem

Company “Help-i-copter” allows users to rent a ride in a helicopter for a lift in style. However, Help-i-copter is having trouble creating software that find the best flat surface for the helicopter pilots to land to pick up their customers. A perfect solution would be crawling data from sites and to store it on a PostGreSQL database. What steps should they take?

Crawling

Crawling, in software terms, is the act of copying data from a website using a computer program. This data can be saved to a file, printed to your screen, or put into a database. It entirely depends on the project and its scale. On a much greater scale, where data can be overwhelming, a database would be the best option, which is why it’s important to have one. You could have your grocery list stored online, but you don’t want to login every time to see it. You could instead crawl that information off the web with a command like:

soup.find_all('a', attrs={'class': ‘list'})

Write a crawling program that collects data from multiple sources, to get the most accurate data. This program would most likely be written in Python as it has access to modules like Requests and BeautifulSoup.

Create an app that allows pilots to enter in a variable they want to query and parameters for that variable. Then use Python and the Psycopg2 module to query the data accordingly. This app with allow non-programmers to access the database without having to learn PostGreSQL.

Databases

So why is it important to store large amounts of data on a database? Simply, it gives you an easy way to find and query data. For example, let’s say you have a list of employees you got from your company’s website, and added it to a PostGreSQL database. In such a database, finding data like “all employees with a first name that begins with an L” would be much simpler, as databases are well organized. Simple commands like:

where employee_name LIKE 'L%'

would return “Larry, Liz, and Lucy” quickly and efficiently.

Bounding Box

This data could be used in a lot of ways. Namely, we could use out latitude and longitude coordinates to create a bounding box and get information about the areas within that box. This could help Help-i-copter in a number of ways. Pilots could use a bounding box to find flat areas near them and sort those by air-traffic, location, etc… It will be essentially asking the pilot the maximum and minimum coordinates of the box and then looking through the database for any surface that fit the description. Here’s what that might look like in Python:

Conclusion

As you can see, crawling and databases can work very well together, especially when crawling large amounts of data. It would be otherwise inefficient and a lot slower to just store data on a normal document, or to do a crawl of the page every time you run the program. Help-i-copter able to efficiently crawl a site, upload data to a web page, and query that data back to a pilot quickly. Thanks to the power of PostGreSQL and Python.