This website uses cookies to store information on your computer. Some of these cookies are used for visitor analysis, others are essential to making our site function properly and improve the user experience. By using this site, you consent to the placement of these cookies. Click Accept to consent and dismiss this message or Deny to leave this website. Read our Privacy Statement for more.

Edit This Favorite

Data to Dome: How to acquire and visualize an extragalactic dataset

There were some great discussions regarding science and data visualization at the Beijing conference. Energized by that, I’m going to try something a bit different and have this column take the form of a tutorial.

The tutorial was inspired by Tanya Hill's excellent talk at the conference. In it she presented a number of surveys (including the Galaxy and Mass Assembly Survey, GAMA) and described the process of importing them into SkySkan's Digital Sky software. The wealth and variety of available data was both impressive and eye-opening to a planetarium community that largely limits itself to showing three galaxy catalogs (the so-called "Tully" catalog, along with the 2dF and SDSS).

At the close of the question period Thomas Kraupe declared that he would like to see these datasets made available in all planetarium systems.

That call to action is what Data to Dome is all about—how do we enable the community to become content creators?

The process of using data

In this tutorial we'll step through the entire process: pulling the data from the survey database, performing astrophysical calculations and outputting the data in the formats necessary to visualize it in four different planetarium software packages.

I encourage everyone to at read through the entire tutorial, even the parts that apply to software different than what is running at your institution, because doing so will give you a more complete picture of where the community is and help motivate the final section where we make recommendations on how the process can be made improved.

The tutorial will use the Python programming languages. Please read through it even if you aren’t a programmer or if you are unfamiliar with Python because one of the main points of the tutorial is to show how little code is needed. All of the code, as well as the results, can be downloaded from links at the Science and Data Visualization Task Force page on the IPS website.

Python

Python—a programming language “that lets you work quickly and integrate systems more effectively”—is rapidly becoming the de facto astronomical programming language, supported by an active community of tool builders. Python's capabilities are extended through packages, and there are a number of good ones for dealing with astronomical data.

This tutorial uses astropy, which contains all of the core astronomical functionality, and astroquery, which lets you access astronomical databases from within the Python environment. We also give an example of using the pyWWT package to directly interact with WorldWide Telescope.

One of the challenges of working in Python has always been installing all the packages needed to do your work. That has gotten much easier recently through some excellent package managers. If you are just starting out with Python, I strongly recommend using the Anaconda distribution from Continuum Analytics. Anaconda installs most of the packages needed for scientific computing with Python (astropy is included with Anaconda; you have to install astroquery and pyWWT separately).

Let's get started

Step #1 acquiring the GAMA data from the database
We want to get our galaxy catalog from the GAMA survey. To do that, we need to query its database. A survey database typically contains several different tables, each measuring a number of different quantities. Looking at the structure of the GAMA database (its schema), I see that the quantities I need to locate the galaxies in 3D space can be found in a table called specObj. These are sky position (RA and dec) and the redshift (z) with which we will calculate the distance to the galaxies.

There is another table called StellarMasses which has several interesting parameters. We will pull three that might be useful to visualize later: the mass of the galaxy (logmstar), its intrinsic brightness (absMag_r) and its metalicity (metal).

The database can be accessed using the Structured Query Language (SQL). SQL has a SELECT/FROM/WHERE format: after SELECT, you list what parameters you want to pull from the database; after WHERE, you specify which database tables they are located in; and after WHERE, you can place conditions on which objects you wish to get back.
So our query will be:

The purpose of the JOIN syntax in the FROM clause is to make sure we are looking at the same galaxies in each table. The conditions in the WHERE clause make sure that the galaxies’ spectrum was high quality and that the redshift was certain.

Using the astroquery module, we can query the GAMA database inside Python and return the results in a data table (which, in the astronomical tradition of clever astronomical naming, I call dataTable).

Step #2 Calculate distance to the galaxies from their redshift
To plot the galaxies in 3D space we need both the position on the sky and a distance. We will convert the redshift into a distance, but how? On cosmological scales in an expanding universe with a finite speed of light, there are several ways to measure distance. Which one should we use?

People have made different choices here. Some use the lookback time distance, which is the light travel time from the galaxy multiplied by the speed of light. Others have used the luminosity distance, which is the distance at which the galaxy would have its observed brightness given the inverse square law.

I believe that the best choice is the co-moving distance. This distance is the one that best matches our expectation of what distance is. If we were able to stop the expansion of the universe and go out and measure the distance to galaxies with a giant tape measure that is what we would get.

Yes, that does mean that we will get galaxies with distances greater than 13.8 billion light years, but that is OK—really, it is. If you don't believe me see, the Wikipedia page about misconceptions regarding the size of the Universe.

To calculate the comoving distance, we'll need to choose a set of cosmological parameters. One of the really nice features of astropy is the built-in set of standard cosmologies. Here I choose the Planck13 cosmology. It just takes three lines of code to choose the cosmological parameters, calculate the distances, and add a new column to the data table.

from astropy.cosmology import Planck13
#Calculate a new column of comoving distance to the galaxies
distCol=Column(Planck13.comoving_distance(dataTable['z']),name='comoving_distance')
#Add that column to the data table (as the third column after ra and dec)
dataTable.add_column(distCol,2)

Now that we have calculated the distances, we are ready to visualize the dataset. The process will vary a bit depending on which software package we are using.

Visualizing the dataset in Microsoft Research's Worldwide Telescope

We will follow two different routes to importing data into WWT. One is through an API, where a computer program communicates with WWT. In the other technique we'll write the data to a standard format (the VOTable) which WWT can import and display.

Using an API (Application Programming Interface)
We can send a data table directly to WWT as a new layer using WWT's layer control API. It can be sent to a WWT client running on the same machine or on a remote machine. For example, I can run the Python code on my laptop and send the data directly to Adler's planetarium dome.

Switch to Sky Mode, change the marker type to circles and scale them up a bit and you have Figure 2.

Writing a VOtable

The astronomical community, through the Virtual Observatory initative, has defined an standard data format for catalog data, the VOtable. We can easily export our data table in this format for WWT to import.

Visualizing the dataset in Evans and Sutherland's Digistar

Digistar is also capable of reading the VOtable format, but we'll need to output the coordinates in slightly different form. We need to provide a projection into cartesian coordinates. Fortunately, astropy is very good at dealing with coordinate transformations and projections. All we need to do is create a list of coordinates from our data table and then add the Cartesian projections as the first three columns.

In the code above notice the part that goes:'ucd': 'pos.cartesian.x'. UCD stands for Unified Content Descriptor; these are words that have been defined to describe almost all kinds of astronomical data. Using them allows the computer to parse the dataset and make decisions on how to visualize the dataset (because it knows which columns are position, which are brightness, which are sizes, etc.)

Visualizing the dataset in SkySkan's Digital Sky and SCISS' Uniview

For Digital Sky and Uniview, the process is quite similar. As we did for Digistar, we'll need to provide Cartesian coordinates. However, for Digital Sky and Uniview, we need to do the projection in Galactic rather than Equatorial (RA and dec) coordinates. Fortunately, again, astropy makes this very easy.

Now we need to write out the data in the ".speck" format defined by Partiview. This format is a ASCII table where the first three columns are the x,y,z Cartesian coordinates and the remaining columns are whatever additional data we want to carry around. The table begins with a header that names these additional columns.

Lessons Learned

Yay—you read through till the end! Wasn't too bad, right? That is really the main lesson of all of this. Tools like Python and astropy make the whole process of getting, manipulating, and visualizing data MUCH easier than it was before.
OK, now that we've stepped through things with GAMA, go try it on another dataset.

There are a few recommendations for vendors to make things even smoother.

1. Include support for VOtables, including the parsing of UCDs. This is an astronomical standard and planetarium software should support it.
2. Accept spherical as well as Cartesian coordinates. Astronomical observations occur in spherical coordinates. Converting to Cartesian is an extra step that could easily be done by the computer on import.
3. Accept multiple coordinate systems. It would be great to be able to import data in either Equatorial, Galactic or Ecliptic coordinates.