ChEMBL Resources

Thursday, 26 February 2015

Using the New ChEMBL Web Services

As promised in our earlier post, here are some more details on making the most of the new ChEMBL web services. The best place to get started is to head over to the documentation page: https://www.ebi.ac.uk/chembl/api/data/docs. There you will find the list of resources (e.g. Molecule, Target and Assay) that are available and their methods. More importantly you can also execute each method with your own or default parameters, and view the URL, the response content and response status code. This is definitely the quickest way to start familiarizing yourself with the new ChEMBL web services.

Looking at the resources in more detail, you will find that each resource has three basic methods:

1. https://www.ebi.ac.uk/chembl/api/data/RESOURCE - will return all available objects of type RESOURCE from ChEMBL. An example could be https://www.ebi.ac.uk/chembl/api/data/molecule which returns all molecules (remember that data is paginated - more on this later).

2. https://www.ebi.ac.uk/chembl/api/data/RESOURCE/ID - will return a single object of type RESOURCE, identified by ID. For some resources, there can be more than one type of ID, for example the Molecule resource will accept:

The last thing worth noting about image format is that when you add X-Requested-With: XMLHttpRequest header to your request (i.e. you make an Ajax call), the resulting image will be base64-encoded. For example to make ajax call using jQuery library and render result as image you can use this code:

Alternatively, if you don't want to explicitly include the header in your jQuery code, you can use crossDomain: false parameter (yes, you are doing a cross-domain request, but we do support CORS so this will work as discussed here):

Pagination

When you make the following request https://www.ebi.ac.uk/chembl/api/data/molecule only the first 20 molecules will be returned. This corresponds to the first page of the molecule result set being requested. Pagination has been introduced to help reduce server load and also protect us from inadvertent DDoS attacks. It also allows clients to quickly obtain a portion of data without having to wait for the full data set. The most important page parameters are limit and offset, which are illustrated in the image below:

The red border presents a page of limit=10 and offset=10. Limit is a maximum number of objects on single page. Offset is a distance between the first element in result set and the first element in page. Please note, that objects are indexed staring from 0. The default limit is 20 and default offset is 0 and this is why accessing https://www.ebi.ac.uk/chembl/api/data/molecule provides first 20 elements. You can increase page size by providing bigger limit parameter, however the maximum allowed limit value is 1000.

All paginated results come in an 'envelope', which contains a resource object section and page metadata section. An example molecule page in json format looks like:

As can be seen, a 'page' of 20 molecule objects is stored in an array called 'molecules'. A 'page_meta' block provides information on page limit and offset. The 'page_meta' block also provides links (when available), to the previous and next pages and the total object count. In this example you can see that there are 1,463,270 compounds available in ChEMBL. The same information viewed in XML format:

Filtering

Filtering can be complex so let's start with the example. In the Molecule resource, there is a 'max_phase' numeric field. This is the maximum phase of development reached by a molecule. 4 is the highest phase and this means the molecule has been approved by the relevant regulatory body, such as the FDA. So let's select all approved drugs by adding a filter to the max_phase field:

OK, so now we know that the filter can be passed as parameters and the simplest form is <field_name>=<value>, which means that we expect to get only items with field_name exactly matching the specified value. Right, let's add another filter. Inside a Molecule resource, we can find 'molecule_properties' object nested. One of the properties is a number of aromatic rings, so let's select compounds with at least two:

Now we see, that many filters can be joined together using '&' sign. If the filter applies to the nested attribute we have to provide the name of the nested object first, followed by the name of the attribute, using double underscore '__' as a separator. Because we don't want to have an exact match, we have to explicitly specify the name of the relation, in our case 'greater then or equal' (gte). There are many other types of relations we can use in filters:

Please note, that you can't use every relation for every type. For example regex matching is not allowed on numeric fields and ordering is forbidden on text. If you want to check, which filters can be applied to which resource, you can take a look at the resource schema, for example the 'molecule' resource, schema is available here:

And as we see, Helium is our first compound. In order to sort, we just have to add 'order_by' parameter with the value being the name of the field, prefixed with all intermediate nested objects. By default we sort ascending. To reverse this order and get molecules sorted from the heaviest to the lightest we have to add minus '-' sign before field name like this:

This time the first element is a very heavy compound. Note that we had to add a filter to eliminate compounds without specified weight, otherwise, they will stick to the top of the results (This is because NULLS FIRST is the
default for descending order in Oracle DB which we are using in production as described here). We can have multiple 'order_by' params in the URL like here:

In which case molecules will be first sorted by the number of aromatic rings in ascending order, followed by molecular weight in descending order.

Filtering and ordering can be mixed together, but we leave it as an example for the reader.

Equivalent Web Service Requests

We will continue to support the 'old' web services until the end of the year. To help users with the upcoming migration process, the table below provides a mapping between the example web services requests found in the old documentation to the equivalent call in the new web services. It will be up to the end user to handle the different response format and the pagination of the returned data.