At the Sentinel Project, we are big advocates of making the data we are creating openly available for everybody that wants access to it. Making our data available allows any people of the public to learn from our data, create data visualization, gain new insights or create mashup with other sets of data openly available.

In this blog post, you will learn how to easily access and manipulate the two main flows of data that we are making available.

The software that makes it very easy to get started is OpenRefine (formerly called Google Refine). This data manipulation tool was previously created by Google and later abandoned and given to the open-source community. Go ahead, install it and run it, it runs on Windows, Linux and Mac.

Before you get to run it, you need to decide on the data you will be using and build the URL you will need to access it. The two main streams of data available at The Sentinel Project are available in JSON format through a URL-based API.

Hatebase

Hatebase is the world’s largest online database of hate speech launched in March. On top of being a catalog of hate speech terms, it also tracks usage of hate speech, either submitted manually by our users or automatically through a bot that scans geo-located tweets that contain hate speech terms. All this data is also available for free.

Using OpenRefine

After installing OpenRefine and launching it, it opens a page in your browser.

Click on Create Project -> Web Addresses. That’s where you put the URL link to the data you want to obtain and manipulate, either for Hatebase or Threatwiki.

Choose JSON files parsing

Select in the preview the part of the JSON data that corresponds to a record

Choose a Project Name on the top right and click Create Project

You get your data displayed in a table (excel-stype) type of format

On the button Export at the top right of the page, you can decide to export to other type of file formats (such as Excel) or other formats that would allow you to analyze the data in other software.

You can also use OpenRefine to manipulate the data directly. There are tons of resources out there on how to use OpenRefine. You can filter the data, sort it, change the name of columns, get a list of all the values available in a column, transform the data using a set of scripts, etc.

I’ve made this short video to show you quickly the kind of manipulation you could do with OpenRefine.

I hope this blog post helped you understand how to obtain data through our API. Don’t hesitate to write to us at techteam@thesentinelproject.org and let us know how you use the data!