Tutorial to use our exoplanet.eu API

All of our database is accessible in python via our API. Here, we show how to use the API to query our database and collect/handle the data you want in an easy way.

Our API is very general as it has been developed by the Virtual Observatory (VO) for all of the databases they give access to (exoplanet.eu being one among many). As such, the syntax presented in this tutorial can be used for other databases (that can be found here: http://vespa.obspm.fr/) very easily.

QUICK START

1) The first thing to do to use our API in python is to install one package called pyvo to your python libraries (just type in: pip install pyvo, or use your favorite way of installing new packages).

2) When this is done, open a new python file (or do it interactively with ipython) and then import this new pyvo module:

import pyvo

3) Select the service you want to use (here the exoplanet database is part of the more general planeto service, which includes many databases, see here, e.g. you have access to asteroid and cometary databases, the minor planet center, ... but some other services that are hosting non-planet-related databases can also be accessed):

service = pyvo.dal.TAPService("http://voparis-tap-planeto.obspm.fr/tap")

4) Now you have to query this selected service and select the right exoplanet.eu table, which is exoplanet.epn_core as can be seen here. The language to query is based on SQL and is called ADQL (Astronomical Data Query Language), which is a language to retrieve information from tables stored in a database. We will detail more about the query possibilities later in this tutorial (point 16). For now, let's query all the fields (*) in the main database exoplanet.epn_core for planets that have semi-major axes lower than 5 au:

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

5) To get the results of your query in a table called results, you should type in:

results = service.search(query)

6) When we have the results of our query stocked in the results array. We can first check the number of rows (i.e. planets) corresponding to our query by typing:

len(results)

7) And the number of columns (fields):

len(results[0])

8) In order to get a specific value for a given field (here the target_name field for instance), one can use:

results[0].get('target_name') or results['target_name',0]

9) You can now save your data permanently as a votable or in a numpy table (after importing numpy by adding: import numpy as np):

To store it as a VO table, first access to the VO table in your results array:

vot=results.votable

Then save your VO table to myresults.xml:

vot.to_xml("myresults.xml")

If you wanted to save it as a numpy array, you could do instead:

np.save('myresults',results.table)

10) And to access your data again from the saved file

If you want to access to the stored VO table again later, use parse (first import the right package: fromastropy.io.votableimport parse)

vot = parse("myresults.xml")

and then access to the data by selecting the first unique table

results = votable.get_first_table()

and if you saved it as a numpy array, then access it via

results=np.load('myresults.npy')

More advanced features after you have reached a plateau using the QUICK START section:

11) In order to know the syntax of all the fields you can query, you can have a look on the exoplanet.epn_core info page, or on vespa which also shows the values of the table or you can also do it directly in python by querying as such:

which will indeed return an array called resultsfields with all the accessible fields of the exoplanet.eudatabase.

12) To know the maximum number of results that can be fetched in one go:

print(service.maxrec)

which is 20000 for now (i.e. greater than the number of known exoplanets).

13) This maximum number of results that can be fetched can, however, be changed by changing service.maxrec directly but it will never go beyond service.hardlimit, which is:

print(service.hardlimit)

20000000

i.e., it will be fine for a few decades!

14) You can also access to the field names that your recorded in a results array (from your query) into the results array by doing:

print(results.fieldnames)

15) Asynchronous queries: It is also possible to start asynchronous queries. This can be useful for long queries or if you need to run thousands of queries as you can start your runs and then come back when they are finished.

Say you want to execute this query asynchronously:

query = "SELECT * FROM exoplanet.epn_core WHERE semi_major_axis < 5"

Then the command is:

service.submit_job(query)

But to really start running it you need to type:

job.run()

To know whether it's still running or completed you can use

print(job.phase)

which will return: RUN or COMPLETED

It can be useful to remember the job's url to access it when you come back to check the results:

job.url

It returns something like that: u'http://voparis-tap-planeto.obspm.fr:80/__system__/tap/run/tap/async/jHFATK'

Then, to be able to retrieve your results later, you can use

job = pyvo.dal.tap.AsyncTAPJob(job.url)

and then the results of the job can be fetched via:

results=job.fetch_result()

16)More about queries and the ADQL language:

A query is usually composed of three parts: the table columns you want to retrieve from the database (the SELECT part), the table or tables that store the data (the FROM part) and the conditions to restrict the data you obtain (the WHERE part). It can be summarised as follow:

SELECT <columns> FROM <tables> WHERE <conditions>

For instance, if you want to select the mass and radius from the exoplanet.eu database you may type:

query="SELECT mass,radius FROM exoplanet.epn_core"

You could also pass in some more complex queries such as only selecting a certain region of the sky, e.g. if you want to only select same part of the KEPLER field (note that there is an easier way of doing that):

We let the reader refer to the end of point 16 to find the most useful parameters that can be used to query the database.

We now show a few more tricks to query the database.

If you are only interested in the first 10 results then your query should be:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core"

You can count the number of systems that have a disc by doing so:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE detected_disc IS NOT NULL"

or the number of systems that have a measured planet radius:

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE radius IS NOT NULL"

Similar functions that return only one element are MIN, MAX, SUM or AVG (arithmetic mean), which you can also use.

You can sort your results directly by using ORDER BY (ascending order ASC is the default and descending order DESC should be specified when wanted) as such:

query="SELECT TOP 10 mass,radius FROM exoplanet.epn_core ORDER BY mass DESC"

You can also use AS to add a new column which will be in your output results and than you can use to sort your table with, e.g.:

query="SELECT TOP 10 LOG10(period) AS logperiod FROM exoplanet.epn_core ORDER BY logperiod"

In order to GROUP some elements together, you can use GROUP BY. For instance, let's say you want to sum the masses of all planets in a given system and list the results for each planetary system, you could GROUP all systems by their star name and sum the planet masses for these different GROUPS:

query="SELECT star_name, SUM(mass) AS smass FROM exoplanet.epn_core GROUP BY star_name ORDER BY smass DESC"

you can also use AND (or OR):

query="SELECT COUNT(*) FROM exoplanet.epn_core WHERE (radius IS NOT NULL AND mass IS NOT NULL)"