Better Decisions === Faster Stats

analysis

The Analytics (or stats) dashboard at WordPress.com continues to disappoint, and is a major reason for people to move out of WordPress.com hosting (since they need better analytics like that by Google Analytics which cant be enabled on the default mode)

Its not really beautiful unlike the rest of WordPress Universe!

It can be made better if people try harder! Analytics matters

Here are some points

1) Bar charts and Histograms are not really the best way to visualize trends across time

2) Location Analytics is limited to just country level analysis and the heatmap (?) is aweful in terms of distinguishing gradients

3) Referrers Tab needs to do a better job on distinguishing between mobile and non mobile traffic, social and non social traffic (and there are better ways to visualize than just a simple list)!

4) I cant even export my traffic stats (and forget an api !) so I am stuck with the bad data viz here

Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().

DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:

#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6
#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']
#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]

Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)

The library has been ruthlessly optimized for performance, with critical code paths compiled to C;

Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

Why not R?

First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:

R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;

R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;

Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;

The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.

This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed

Analytics 2012 Conference Details

Keynote Speakers

The following are confirmed keynote speakers for Analytics 2012. Since he co-founded SAS in 1976, Jim Goodnight has served as the company’s Chief Executive Officer.

Dr. William Hakes is the CEO and co-founder of Link Analytics, an analytical technology company focused on mobile, energy and government verticals.

Tim Rey has written over 100 internal papers, published 21 external papers, and delivered numerous keynote presentations and technical talks at various quantitative methods forums. Recently he has co-chaired both forecasting and data mining conferences. He is currently in the process of co-writing a book, Applied Data Mining for Forecasting.

Pre-Conference

Plan to come to Analytics 2012 a day early and participate in one of the pre-conference workshops or take a SAS Certification exam. Prices for all of the preconference workshops, except for SAS Sentiment Analysis Studio: Introduction to Building Models and the Business Analytics Consulting Workshops, are included in the conference package pricing. You will be prompted to select your pre-conference training options when you register.

Sunday Morning Workshop

SAS Sentiment Analysis Studio: Introduction to Building Models

This course provides an introduction to SAS Sentiment Analysis Studio. It is designed for system designers, developers, analytical consultants and managers who want to understand techniques and approaches for identifying sentiment in textual documents.View outline
Sunday, Oct. 7, 8:30a.m.-12p.m. – $250

Sunday Afternoon Workshops

Business Analytics Consulting Workshops

This workshop is designed for the analyst, statistician, or executive who wants to discuss best-practice approaches to solving specific business problems, in the context of analytics. The two-hour workshop will be customized to discuss your specific analytical needs and will be designed as a one-on-one session for you, including up to five individuals within your company sharing your analytical goal. This workshop is specifically geared for an expert tasked with solving a critical business problem who needs consultation for developing the analytical approach required. The workshop can be customized to meet your needs, from a deep-dive into modeling methods to a strategic plan for analytic initiatives. In addition to the two hours at the conference location, this workshop includes some advanced consulting time over the phone, making it a valuable investment at a bargain price.View outline
Sunday, Oct. 7; 1-3 p.m. or 3:30-5:30 p.m. – $200

This half-day lecture teaches students how to integrate demand-driven forecasting into the consensus forecasting process and how to make the current demand forecasting process more demand-driven.View outline
Sunday, Oct. 7; 1-5 p.m.

Forecast Value Added Analysis

Forecast Value Added (FVA) is the change in a forecasting performance metric (such as MAPE or bias) that can be attributed to a particular step or participant in the forecasting process. FVA analysis is used to identify those process activities that are failing to make the forecast any better (or might even be making it worse). This course provides step-by-step guidelines for conducting FVA analysis – to identify and eliminate the waste, inefficiency, and worst practices from your forecasting process. The result can be better forecasts, with fewer resources and less management time spent on forecasting.View outline
Sunday, Oct. 7; 1-5 p.m.

Introduction to Data Mining and SAS Enterprise Miner

This course serves as an introduction to data mining and SAS Enterprise Miner for Desktop software. It is designed for data analysts and qualitative experts as well as those with less of a technical background who want a general understanding of data mining.View outline
Sunday, Oct. 7, 1-5 p.m.

Modeling Trend, Cycles, and Seasonality in Time Series Data Using PROC UCM

This half-day lecture teaches students how to model, interpret, and predict time series data using UCMs. The UCM procedure analyzes and forecasts equally spaced univariate time series data using the unobserved components models (UCM). This course is designed for business analysts who want to analyze time series data to uncover patterns such as trend, seasonal effects, and cycles using the latest techniques.View outline
Sunday, Oct. 7, 1-5 p.m.

SAS Rapid Predictive Modeler

This seminar will provide a brief introduction to the use of SAS Enterprise Guide for graphical and data analysis. However, the focus will be on using SAS Enterprise Guide and SAS Enterprise Miner along with the Rapid Predictive Modeling component to build predictive models. Predictive modeling will be introduced using the SEMMA process developed with the introduction of SAS Enterprise Miner. Several examples will be used to illustrate the use of the Rapid Predictive Modeling component, and interpretations of the model results will be provided.View outline
Sunday, Oct. 7, 1-5 p.m