API

The BrightPlanet Document API is part of BrightPlanet’s REST API. It allows queries against the curated data feeds provided by our Data-as-a-Service platform and behaves similarly to the search feature available in BrightPlanet’s Search Dashboard.

Before digging in, it’s important to know that the Document API is focused on content within the data feed. This means that results will only contain content that is already harvested from your subscribed data feeds. This quick start guide will show you how to find an appropriate data feed and then begin requesting content from that data feed. It’s also important to note that BrightPlanet’s document API can be found and tested in your browser here: https://api.brightplanet.com/.

Quick Start Guide

The API is built around the docs/search call which allows users to request data from BrightPlanet’s harvested documents based off of a highly flexible query engine.

Getting an API Key

Before making any requests you must have an API Key. If you do not have a key, please contact BrightPlanet’s support at apisupport@brightplanet.com to request one.

An API Key is a unique identifier associated to your account and license. All calls made to our API require an API Key to be passed along with the call. Each call is also metered and logged with your key for audit purposes.Never share your API Key. Any applications built around our API should allow the end-user to enter their own API Key instead of embedding your API Key.

An API Key is a GUID and looks something like this: 12345678-90ab-cdef-1234-567890abcdef

Our Technology

Dive into our technology and get a behind the scenes tour of what we mean when we use the terms Harvest, Curate, and Develop Insights.

Get /datafeeds

Once you have a valid API Key you can view which data feeds (or databases) you have access to using the “/datafeeds” endpoint. BrightPlanet provides both standard and custom data feeds for customers. Each customer will only have access to the data feeds that they have licensed. To learn more about additional data feeds, contact your sales representative.

The “/dataFeeds” endpoint is only needed to request which data feeds that your API key has access to and does not change from one request to the next. It is fine to cache the data feed names between sessions.

HTTP GET
/dataFeeds

When using the “/dataFeeds” request, users will need to pass their api_key. Note that the dataFeeds call is case sensitive. The URL request below shows an example.

This will list the data feeds accessible for this api_key as well as the total number of used API requests per data feed. Your used requests will vary based on your license agreement, once the use hits your maximum, additional API requests will produce a rate limit error.

The below response shows that this api_key has access to one data feed called “bits” and they have made 13 API requests already.

{
“datafeedName”: “bits”,
“usedRequests”: 13
}

Get /datafeed/{datafeed}

Each dataFeed that you have access to has custom entities that are extracted from the data as it is harvested. To view the custom entities that are available in a specific dataFeed, you can use the GET /datafeed/{dataFeed}

facetField Parameter

The facetField parameter controls which non custom facets are included in the return format when requesting documents. Company, People, and Places tagged within the documents are always included. To request additional facets, users pass a comma separated list of additional facets for inclusion.

A user wanting to also receive Crimes, Weapons, and Drugs mentioned within the documents simply need to pass the following.

facetFields=crime,weapon,drug

Note that the facets passed are case sensitive and should all be expressed lower case with no spaces in between commas. Available facets change with each feed, for a list of all facets available, email apisupport@brightplanet.com.

Each data feed has unique facetFields available for querying. To identify which facetFields are available for a specific available data feeds, e-mail apisupport@brightplanet.com.

Query Parameter

The query parameter that can be passed in the docs/search call is a highly flexible parameter that allows end users to control documents being returned. The query parameter has a large number of features that modify the behavior of the query: Boolean capabilities, wildcard searches, proximity operators, and the ability to search and return documents based on tagged entities and metadata.

Query

Returns documents….

Big AND Data

Containing both “Big” and “Data”

Bigger OR Data

Containing either the word Bigger or Data

+Big –Data

Containing the word Big but not Data

Te?t

Containing any word that starts with a “te” has one letter in between and ends with a “t”, such as text or test.

Te*t

Containing any word that starts with a “te” has any number of letters and then ends with “t”, such as tempt,

otherEntity_person:”Barack Obama”

Containing Barack Obama mentioned as a person

otherEntity_place:”Paris, France”

Containing Paris, France tagged

“boycott Google”~5

With the keyword “Boycott” and “Google” within 5 words of each other

Best Practices

Ensure that all of the searches are properly encoded

Limit your searches to a maximum of 10 operators

Use your dashboard to help quickly filter results and develop queries

Exploring Document and Entity Counts

BrightPlanet’s Document Search API supports the ability to return counts of documents or entities that have been tagged within the data feed that matches your specific query. We have 3 unique calls that allow you to receive data about the counts.

GET /docs/count – Returns total number of documents for a given facetname/facetvalue

Get /docs/facet/date/count – Returns number of times a facet is tagged by date

GET /docs/facet/count – Returns number of times a facet is tagged by query

Count Parameters

The count parameters are fairly consistent across all the GET count requests, information about each Count parameter that can be passed can be found below.

Parameter

Description

datafeed

The name of the data feed from which you receive data (must be a valid match with those data feeds names returned from the /dataFeeds request.)

facetName

The name of the facet or entity type that you want to include.

facetValue

The specific extracted entity that you would like included.

startDate

The earliest document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.

endDate

The most recent document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.

API Key

The static passkey given to the user by BrightPlanet.

query

The actual keyword string query that controls which documents are returned.

dateGap

The number of day span that results can be grouped into. For example +1DAY, +3DAY, +7DAY, +10Day, etc.

Get /docs/count

Using the /docs/count allows you to return counts of documents that contain a specified facet or entity. For example, we want to get a count of the all the documents that mention the disease cancer in some type of format. We simply need to specify the facetName as a disease, set the facetValue to ‘cancer’ and include the Key and dataFeed.

Get /docs/facet/data/count

The /docs/facet/date/count allows you to see counts of a specific entity as it’s occurring over time. For example, we want to see the counts of how often Barack Obama is mentioned in the Global News Data Feed weekly. To get this result back, we pass ‘person’ as the facetName, ‘Barack Obama’ as the facetValue, specify the starting and end date, and finally specify the dateGap to +7DAY to control the count groupings to every 7 Days. Your Request URL looks like this:

Get /docs/facet/count

This call returns the count of the number of mentions of each entity within a facet that is passed. For example, we want to see the counts of diseases mentioned in the Global News Data Feed from January 1 to February 11. We use the following HTTP Request:

We passed the *:* for our query to specify all data within the given date range. We also used disease as our facetName parameter to say return the counts of disease. Our JSON output then is shown below:

Get /docs/enrichments/{docMasterId}

This call, when passed a docMasterId, returns enrichments of the entities from that document. Enrichments, will return confidence and polarity scores on entities within the documents. In addition, properties within documents, such as external domains, and URLs are also displayed. Confidence is a score from 0-100 that indicates how important that specific entity is to that given document. A polarity score, is a ranking from -3 to +3 that indicates how positive or negative that entity is within a given document.

Let’s find the enrichments for the document with a docMasterId of 988654, the request URL would be:

Your Request URL Is:

Know when two different entities are mentioned within a document

You want all documents that mention Barack Obama as a person and also mention the company Apple. A simple keyword search for “Barack Obama” and “Apple” returns other mentions of apples as a fruit. To help search only when Apple is mentioned as a company you use the following query:

+otherEntity_person:”Barack Obama” AND +otherEntity_company:”Apple”

It’s important to note that the field names are case sensitive, so follow the exact syntax.

Your Request URL Is:

Find documents that contain a specific title

You are looking to search our Global News Data Feed for any documents that mention Google in the title. You need to use the title: field and pass *Google* to specify the keyword Google preceded by anything and followed by anything. You use the following query: