toaster (to Aster) is a set of tools for computing and analyzing data with Teradata Aster Big Data database. It brings the power of Teradata Aster's distributed SQL, MapReduce (SQL-MR), and Graph Engine (SQL-GR) to R on desktop and complements analysis of results with a convenient set of plotting functions.

toaster acheives most tasks in 2 distinct steps:

Compute in Aster using Aster's rich, fully scalable set of analyical functions, transparently running in distributed and parallel environement.

Deliver and visualize results in R for further exploration and analysis.

toaster performs all big data, processing intensive computations in Aster, making results and visualizations available in R. Summary statistics, aggregates, histograms, heatmaps, and coefficients from linear regression models are among results available in R after processing in Aster. Most results have toaster visualization functions to aid further analysis.

You can install:

the latest released version from CRAN with

install.packages("toaster")

the latest development version from github with

devtools::install_github("toaster", "teradata-aster-field")

evaluation version of Aster analytic platform - Aster Express - to run on your PC here and get started with this Tutorial Series.

If you encounter a clear bug, please file a minimal reproducible example on github.

News

Both explicit and implicit support for kmeans functions in AAF 6.21.
Package will recognize versions based on the function's output or
using new argument version. since new output now includes more
kmeans statistics computeKmeans will run faster with newer
version of AAF (#56).

Kmeans clustering can now persist clustered data for both optimized
performance and convinience using new argument persist=TRUE (#56).

Kmeans clustering now supports initial centers obtained with canopy
clustering. Use new functinality computeCanopy to quickly seed
initial centroids and run kmeans with canopy object (#61).

locationName is a vector of the column names containing address,
name, etc. suitable to geocode (find latitude and longitude).
The columns are used in order of appearance: geocoding tries
1st column's values first, then, for the data points that didn't
get resolved, it tries the 2d column's values, and so on.

New text analysis functions computeTf and computeTfIdf
process corpora in Aster and produce results compatible with package tm,
in particular term document matrix.

Both computeTf and computeTfIdf rank terms to return top ranked
ones. Ranking and number of terms to return are provided by
parameters top and rankFunction. Unlimited (all terms) are
returned by default with top = NULL.

S3 classes nGram and token provide pluggable parsers to extract text
tokens to use in the functions 'computeTf' and 'computeTfIdf'.

Text functions support stop words in both Aster (installed stopwords file)
and R (post-processing of results).

Linear regression now is compatible with R standard lm functions returning
object of both classes c('toalm', 'lm'). This means methods summary,
coefficients, etc. work with the object returned by computeLm.
This change is not backward compatible: to obtain result returned in 0.2.5
list contains element old.result.

To compute results similar to lmcomputeLm uses sample (default 1000
rows) to calculate stats like residuals, R-square, etc. in Aster. As before,
linear regression coefficients are calculated on full data set with
SQL/MR linreg function.

getTableSummary is enabled for parallel execution. Simply create and
register parallel cluster of your choice with doParallel package and set
parameter parallel=TRUE. Performance gains may be up to 50% or better
depending on size of the table, number of parallel processes, and number
of columns. Run demo("baseball-parallel") for examples.

computePercentiles is enabled for parallel execution. Simply create and
register parallel cluster of your choice with doParallel package and set
parameter parallel=TRUE. Performance gains may be up to 50% or better
depending on size of the table, number of parallel processes, and number
of columns. Run demo("baseball-parallel") for examples.

Added support of temporal Aster data types in getTableSummary and
computePercentiles. Temporal types are date, time, timestamp, and interval.
in computePercentiles set parameter temporal=TRUE to calculate
temporal columns and run it separately from numerical ones.

MINOR FEATURES

Added factory functions getDiscretePaletteFactory and getGradientPaletteFactory
to dynamically generate palettes with n number of colors.

Legend position in showData histogram format is completely removed if
legendPosition="none".

computePercentiles now returns no rows for the column that contains all NULLs.
Before it threw error without completing.

fixed legend position in plotting functions.

Added error when histogram start value is greater than end value in (Issue #33)

DOCUMENTAION

Completely reworked demo scripts. Now they contain fully functional examples
running on baseball and openDallas data sets. The data sets are available
from github: https://bitbucket.org/grigory/toaster/downloads

createMap: new visualization function for combining maps with data
artifacts from Aster database. Can be used to produce maps of
arbitrary scale (with exception of whole world) and type with shapes
of size and labels corresponding to data computed in Aster. It uses
ggmap and ggplot2 packages and Google API for geocoding data as
necessary. It implements smart logic to choose map tiles to place
geocoded data appropriately, and it also automatically geocodes
data if necessary (Google API restrictions apply).

computeBarchart: for computing data for barchart visualizations. This
is different from computeHistogram as barchart is defined on factors
(categorical data) witch doesn't support defining bins like in histograms.

computePercentiles: for computing multiple percentiles across one or
many subsets of a table in one go. Results are suitable for function
createBoxplot (see next).

createBoxplot: visualizes boxplots for single column across one or
multiple subsets.

computeLm: compute linear model coefficients similar to lm function but
all performed inside Aster.

ENHANCEMENTS

added parameter test to compute- functions (functions that access and
manipulate data in Aster) to produce SQL without executing it. Thus, when
test=TRUE function returns string containing SQL that would have run
in Aster.

package depedencies moved from Depends to Imports section of DESCRIPTION file
except for RODBC package. Keeping RODBC in Depends because toaster requires
access to RODBC connection object and to its function odbcConnect. Other
packages are not exposed by toaster functions so accessing them would have
been needed only for advanced usage (if any).
if you use any function from the packages other than RODBC then those packages
should be loaded with library or require or use their namespace.

facet parameter now supports both one-value and 2-value vector (if parameter
is longer than the rest of values are ignored). Single value defines column
name for wrapping facets in 1 or more column lattice. Two values define pair
of columns to place facets in 2-dimensional grid for each combination of
values found.

computeHeatmap now supports withMelt to melt result using function melt
from package reshape2. This option simplifies visualizating with facets.

createBubblechart now supports scaling shapes by size (default) or by area.
Correspondingly, use shapeSizeRange when scaling by size; and shapeMaxSize
when scaling by area.

createBubblechart added parameters to control label positioning and
formatting. All parameters that position and format label text start
with prefix "label" now. Old parameters textSize, textColour, and
textVJust renamed to labelSize, labelColour, labelVJust.