Trillian is a data access and computing engine that will enable streamlined and simultaneous
comparison of user-supplied astrophysical models against multi-wavelength
data from a wide range of astronomical surveys. The project is currently in
its early development phases and will grow in power and scope over the next several years.
Both interested astronomers and developers are encouraged to join and contribute.

Let’s talk about data.

There’s an awful lot of it, which is fantastic. It’s truly amazing, really, the breadth and
depth of astronomical data that is available. And more than any other scientific field, our data
is open. But the volume has created real problems:

There’s far too much
data to put on your own computer. The full catalogs of WISE, SDSS, 2MASS, Spitzer,
GALEX, etc. are far too large to fit on your computer, and most likely your department’s
servers as well. The effort of performing large-scale, multi-wavelength analyses falls
between difficult and impossible.

The data is too complex
to manage. Even if you had the data releases of all of the catalogs above, they appear
in highly disparate forms: FITS files which are all structured differently, gigabytes of plain
text files, images, etc. Data is found spread across different HDUs or within FITS headers.
Pulling this data together to form a coherent whole that can be easily accessed is a tremendous
bookkeeping task that requires skills that astronomers aren’t typically trained in (not to
mention the time!).

More data becomes
available all the time. Sometimes a data release supersedes a prior one. Sometimes more
of the sky is available. Incorporating and updating new data would be a full time job in
itself.

Astronomical data is
closely tied to the details of the instrument. Which flags indicate bad data? Is this
instrument sensitive enough to detect the star or galaxy I’m interested in? What is the
resolution, and how would I compare it to another data set?

Web forms are so 1999.
Many surveys or instruments provide web forms where you can enter an RA/Dec position
(or a short list of them), and retrieve the matched files. This doesn’t scale when you need to
analyze thousands or millions of objects, or want to perform an analysis on an all-sky
basis.

Most data releases
don’t deal in physical units. This is the difference between your model and an
observation. You are given magnitudes, but you work in physical units: luminosity, temperature,
distance, etc. If you are able to provide a model, the conversion of the observations
of several different instruments back to you model requires specialized knowledge that differs
from one telescope to another. There’s no reason that every astronomer should duplicate this
effort.

All of the above can be distilled into two problems: 1) data bookkeeping and access, and
2) the gap between a physical, astronomical model and a list of observations. The former is a
solved problem in computer science circles (just not applied to our data in a modern, scalable
way). Trillian aims to address this, but let’s set that aside for the
moment as an implementation detail. As an astronomer, you are interested in a physical model.

Let’s consider SDSS as a template. Starting with ugriz images, the
Photo software pipeline performs
source detection. For each source, three model profiles are applied: a point spread function to
detect stars, and two galaxy profiles: a pure exponential disk and a deVaucouleurs profile, each
convolved with the PSF. Sources are identified as galaxies or stars, and this
binary assignment persists to the final released catalog.

The models are fit based on the five ugriz wavebands, but of course there is more
information available from other surveys. Rather than identify a source as “galaxy with a
deVaucouleurs profile”, more data can be used to match the model and produce a likelihood. And of
course there is much more information available to say much more about stars than “fits a point
spread function” (e.g. luminosity, stellar type, etc.). This is not to criticize the SDSS pipeline;
its function is not to perform All The Analyses with All The Data. The survey initially only had
16ms to calculate each fit! The problem is that astronomers still use these fits and
classifications years later without question when the pieces to create a more complete picture
are readily available. And this is true of many other data sets.

Trillian will work with models. Your model.

Trillian will enable you to apply your own models against hundreds of terabytes of publicly
available data, all without downloading a single file. As an example, let’s say I am interested in
finding thermally pulsating AGB stars (TP-AGB) everywhere in the sky. I can’t say “these are stars
that have a magnitude of x” – that alone doesn’t make sense. As an astronomer, I describe
their observable properties in terms of physical parameters: a luminosity range, a temperature
range, surrounded by dust, variable, etc. This is my model. To search for objects that match the
model, I would need to know how these stars look in SDSS, or 2MASS, or Spitzer, etc. This is a
translation from one domain to another that depends on specialized knowledge of each instrument.
However, it’s a known problem with a known solution. Very few astronomers have the expertise to
solve this problem for each instrument out there, but there is little need to; it can be encoded
once into a computation engine.

This is the central idea of Trillian: define your model in terms of physical parameters, hand
it to Trillian, and the result will be a likelihood value for each object available. What would
your model look like as observed by SDSS? By 2MASS? Your model may depend on other existing models
(a given Galaxy dust map, a model that identifies stars versus galaxies, etc.). Look for objects
that fit a spectrum generated from theory or use a custom stellar population synthesis code to
study galaxy histories. All of this is done using as much data is available for each object, a
detail that you don’t have to worry about.

Trillian is open and evolving.

A condition of using Trillian is that when a scientific paper is published based on results
produced by the engine, the full source of the model will be made available. Scientific results
must be reproducible. As with telescope time, users will have a proprietary period before models
and their results are made public. Further, as the library of models are built up the resulting
database would be a catalog of the sky, identifying quasars, AGN, stellar clusters... everything
we have models for. One would be able to ask questions like “Where are all F stars in the sky
(above a given likelihood)?” As new data is added, models can be immediately recalculated,
resulting in new likelihood values. One could see where the likelihood for the model fit changed
dramatically with new data, or remained the same.

When a large number of models become available, Trillian will be able to invert the question
and be able to ask the most interesting question of all, the one that we can’t ask today:
What data in the sky doesn’t fit any of our models?

Please see arXiv:1402.5932 for a more detailed
description of the Trillian platform.