Statistical and Financial Considerations in Website Optimization

Interactions between the variables

It's important to understand that Google Website Optimizer reports on only the main effects within your Web page (the significance of the individual variable values). However, there is frequently significant interaction between the variables. Some interactions are unexpected while others are intentional. As a general rule, you want to create interactions that yield positive, not negative, synergies. The Cambridge University Press book "Nature's Magic" cited in the Reference elaborates on this subject in considerable detail.

With a knowledge of which independent variables, if any, interact, you can refine your experiment. To incorporate this interaction and/or other realities into your analysis, you can start by exporting the raw conversion rate data from your GWO report.

I use the CSV option to export the Combination Report to an Excel spreadsheet with the statistics application StatTools add-in. You can do everything that follows with the regression data analysis tool that comes with Excel, but, as I'll explain below (and in my next article), StatTools can make the job a good deal easier.

Regression analysis is a statistical technique that can be used to develop a mathematical equation showing how variables are related. In regression terminology, the variable that is being predicted is called the dependent or response variable. The variable or variables being used to predict the value of the dependent variable are called the independent or predictor variables. An important part of the regression analysis procedure should focus on the selection of the set of independent variables that provides the best forecasting model.

As presented in this article, regression analysis may seem simple; but, often, it is not: for example, issues such as the possible cyclical nature of data can make deciding the length of time required to collect enough data to make statistically significant (and helpful) decisions difficult.

In the equation below, Conversion Rate is the dependent variable and the s are the independent variables.

Using the report's estimated conversion rate and a prior knowledge of which variables are represented in each combination (including the original), the next step in the analysis entails using either StatTools (or Excel alone in very simple cases) to derive regression coefficients for a multivariate statistical model.

For each combination in the equations above, the s are the individual levels (choices) of the factors (sections) and product terms like, for example, , represent the interaction between the main factors .

StatTools (or Excel alone) then computes values for the coefficients ci, the magnitude of the contribution of each effect, which can be either positive or negative and which appear in the Coefficient column of the regression report shown in Figure 13. Further explanation of this computation and report will appear in my next article, Statistical and Financial Considerations in Website Optimization: Part 2.

Note: This explanation, like the regression report shown in Figure 13, has been simplified for the sake of clarity. In practice however, the choices you make for each section (e.g., image1, image2 or image3) are usually replaced by dummy variables as described in Chapters 7 and 8 of the S. Christian Albright book listed in References. A dummy variable is a variable with possible values 0 and 1. It equals 1 if the choice is in a particular section (e.g., image 2) and 0 if it is not.

In multiple regression analysis, we make the initial assumption that the effects of the independent variables on the dependent variable are additive. In short, we assume that the dependent variable can be predicted most accurately by a linear function of the independent variables. However, the effects of independent variables on a dependent variable are not always additive. We refer to the presence of non- additive effects as interaction (e.g., in the equations above). Interaction occurs whenever the effect of an independent variable on a dependent variable is not constant over all of the values of the other independent variables. Although interaction is a somewhat difficult concept to envision in the abstract, it is not difficult to conceive of situations that would entail interactions between variables: For example, temperature and rainfall on the number of orchids harvested.

The regression report contains three sections: Summary, ANOVA Table and Regression Table. An initial use of this report might entail little more that a quick look in the Summary section at R-Square (the closer to "1" its value the better), one estimate of how well the model accounts for scatter in the raw data; a quick look in the Regression Table section at the Coefficient values (the farther from "0" their values together the lower their corresponding p- value the better), for the constants ci in the equation of Figure 12 and the p-value in the middle section, ANOVA Table, which provides a test for whether the independent variables, as a whole, explain a significant percentage of variation in the dependent variable.

Note: Everything in this discussion on multiple regressions used to understand the factors that bring about conversion rates could be applied, equally well, to understand the factors that optimize other dependent variables such as Return On Investment (ROI). Moreover, the Xi's can also represent information about conditions or events outside of the Web page itself (indicated by the "External variable" term in the Regression Table section in Figure 13 : for example, the language, location, or Web browser of the viewers (information available from products like Google Analytics and Urchin), marketing campaign information, etc., in addition to information about the combinations of Web page elements used in your test.

I'll discuss the statistical aspects of Website optimization in greater detail in my next article, Statistical and Financial Considerations in Website Optimization: Part 2.

The Page Sections report

If you're running a multivariate test, you'll notice that you have two sub-tabs: reports by combination, as discussed above, and reports by page section.

In contrast to combination reports, page section reports focus on which variations of each page section performed best. Keep in mind that simply picking the best-performing variations for each page section may not be as effective as picking a winning combination, since there may be interactions among variations that the page section report doesn't capture. This concept is embodied in the following quotation of the legendary basketball coach John Wooden: "A player who makes a team great is much more valuable than a great player." His UCLA basketball teams had a record winning streak of 88 games and four perfect 30-0 seasons. They also won 38 straight games in NCAA tournaments.

Page section reports have the same columns as combination reports, plus one more: relevance rating.

Relevance rating shows how much impact a particular page section has on your experiment. For example, if your headline page section showed a relevance rating of 0, you'd know that the headlines you used did not significantly distinguish themselves. Alternatively, a relevance rating of 5 for your image page section would show that there were one or more images which significantly differentiated themselves from the others, and that the images page section is important for conversions.

Why might the relevance rating be 0 for all my page sections?

Seeing relevance ratings of zero for all your sections could suggest that the difference between your variations is subtle, and requires more data to become apparent, or that the content tested in your experiment isn't having a significant effect on your users.