Next topic

This Page

Quick search

Two approaches for working with TimeSeries objects are supported with
the scikits.timeseries module.
The first approach consists in using standard relational databases.
The second is to use the PyTables package.
Both approaches have their pros and cons.

Generally speaking, if concurrent access to a database is required, or if
an interface with an online server is needed (for example, if time series
charts must be generated in real-time for a web site), relational databases
are probably more convenient.

Alternatively, if the only objective is to store data for a specific project,
without having to deal with the administrative complexities of a relational
database, then PyTables may be enough.

It is recommended to experiment with both approaches in order to find which
one is the most performant or the most adequate to any application.

Storing and retrieving time series from standard relational databases is very
simple once you know a few tricks.
For these examples, we use the ceODBC
database module which I have found to be more reliable and faster than the
pyodbc module.
However, I think these examples should work with the pyodbc module as well.

SQL Server 2005 Express edition is the database used in the examples. Other
standard relational databases should also work, but I have not personally
verified it.

Note that it is also possible to use this approach with sqlite databases using
the sqlite3 module included with the standard python distribution. The below
code will need to be modified a bit to work with sqlite, but the basic approach
remains the same.

A database called “test” is assumed to have been created already along with a
table called “test_table” described by the following query:

importceODBCasodbcimportscikits.timeseriesaststest_series=ts.time_series(range(50),start_date=ts.now('b'))# lets mask one value just to make things interestingtest_series[5]=ts.maskedconn=odbc.Connection("Driver={SQL Native Client};Server=localhost;Database=test;Uid=userid;Pwd=password;")crs=conn.cursor()# start with an empty table for these examplescrs.execute("DELETE FROM test_table")# convert series to list of (datetime, value) tuples which can be interpreted# by the database module. Note that masked values will get converted to None# with the tolist method. None gets translated to NULL when inserted into the# database._tslist=test_series.tolist()# insert time series datacrs.executemany(""" INSERT INTO test_table ([date], [value]) VALUES (?, ?)""",_tslist)# Read the data back out of the database.# Explicitly cast data of type decimal to float for reading purposes,# otherwise you will get decimal objects for your result.crs.execute(""" SELECT [date], CAST(ISNULL([value], 999) AS float) as vals, -- convert NULL's to 999 (CASE WHEN [value] is NULL THEN 1 ELSE 0 END) AS mask -- retrieve a mask column FROM test_table ORDER BY [date] ASC""")# zip(*arg) converts row based results to column based results. This is the# crucial trick needed for easily reading time series data from a relational# database with Python_dates,_values,_mask=zip(*crs.fetchall())_series=ts.time_series(_values,dates=_dates,mask=_mask,freq='B')# commit changes to the databaseconn.commit()conn.close()

If the initial MaskedArray has no named fields
(standard type), the resulting ndarray has two named fields: _data
and _mask.
The _data field has the same type as the original array, while
the _mask field is always boolean.
Note that a mask is always created, even if the initial
MaskedArray has no missing values.

If the initial MaskedArray object has named fields
(structured array), the resulting ndarray has as many
fields as the initial array, with the same name.
Each field of the result has two nested sub-fields, _data and
_mask.
The _data subfield has the same dtype as the original field, while
the _mask subfield will always be boolean.

Additional information about the MaskedArray (such as
baseclass,
fill_value...) is stored into a dictionary
named special_attrs.
This information can be accessed through the attr
attribute of the table.

If the initial TimeSeries has no named
fields (standard dtype), the resulting ndarray has three named fields:
_dates, _data and _mask.
The _data field has the same dtype as the original object.
The _mask field is always boolean.
The _dates field is always integer, and corresponds to the integer
representation of the underlying DateArray.

Note that a mask is always created, even if the initial
TimeSeries has no missing values.

If the initial TimeSeries has named fields
(flexible type), the resulting ndarray has the same named fields as the
initial array, with the addition of an extra field _dates.
The _dates field is always integer, and corresponds to the integer
representation of the underlying DateArray.
Each other field is composed of two nested sub-fields, _data and
_mask.
The _data subfield has the same dtype as the original field,
while the _mask subfield is always boolean.

Additional information about the TimeSeries
(such as baseclass,
fill_value, or
the dates frequency...) is stored
into a dictionary named special_attrs.
This information can be accessed through the attr
attribute of the table.

To create a TimeSeriesTable, just use the
File.createTimeSeriesTable method of a standard
tables.File object.

A sequence of integers, corresponding to the indices of the rows to
retrieve

field : {None, str}, optional

Name of the field to read.
If None, all the fields from each record are read.

Returns:

time_series :

Depending on the value of the field parameter, the method returns:
(i) a TimeSeries, if field is None
or a valid field;
(ii) a DateArray, if field=='_dates';
(iii) a ndarray, if field=='_data' or if field=='_mask';
(iv) a MaskedArray, if field=='_series'.