In our last post on How to Optimize Email Send Times, we talked about the set-up and planning stages of running a marketing experiment. From choosing a basic methodology, to selecting your experiment group, to deciding on success metrics, you now have all of the basic tools to design and set up a marketing experiment.

In this blog installment, we’ll walk through building out and running the experiment, using one we ran for a client recently as an example.

First Thing is First… Or is it?

We recently helped a client, a major women’s cataloger & retailer, determine if changing the order in which they sent daily marketing emails could raise their per-email revenue. They sent three emails a day, each falling into one of three themes:

“A” — a “clearance” email that included a special promotion

“B” — a “Themed” email

and “C” — a “regular product” email

Our goal was to see if changing the specific time of day and the sending order would cause revenue to increase.

Without any hard data pushing us one way or the other, we decided that rather than formulating a hard hypothesis and testing it, we would instead change some of our designated variables and then run correlation analysis on the results to see what the outcome was. This approach freed the testing team from having to commit to an uncertain hypothesis and simultaneously allowed us to test multiple changes and tweaks simultaneously.

What To Test (and How To Test It)

Since these emails were direct sales emails that were meant to bring the consumer to a targeted landing page and convert them into active shoppers, we settled on click-through rate as the primary dependent variable.

There is a good reason why we chose this metric instead of something like revenue or sales: since we were only testing email, we wanted to remove as many interfering variables as possible, so that any action that took place after the subscriber left the email was beyond the scope of the experiment – at least as far as drawing initial conclusions was concerned (we ended up looking at overall sales and revenue numbers, but we’ll talk about that later). Had this been a branding or awareness campaign, we might have looked at other pre and post email metrics like repeat visitors, visitor frequency, and keyword searched instead.

Once we had selected CTR as our primary dependent variable, it was time to build a test sample. For our sample, we decided to use the “Openers” segment, that is people with a history of opening their emails, and break out 15% as our test segment.

It is critically important to choose the right segment for testing. If the majority of your email list is mainly inactive, you need to either account for the inactives when choosing your experimental percentage, or else cut inactives from the test entirely. Otherwise, you run the serious risk of under-sampling active email subscribers and readers and drawing false conclusions through sampling error.

The Experiment Itself

When it came time to run the actual experiment, we opted for a two week experiment window. With three emails per day and hundreds of thousands of list members, two weeks was ample time for us to collect enough data to make statistically valid analysis a possibility. If your mailing list is significantly smaller, or your email send is less frequent, you may want to extend the experimental period. Keep in mind that the more data points you collect, the more accurate any results you find will be. However, don’t make your experiment so long that by the time it’s finished, the initial need has passed, or interest from your team or management has waned. That’s equally important in the big picture!

Over the next two weeks, we kept the control group receiving the regular schedule of emails, while the experimental group received varying email types at varying points in the day. All data was carefully monitored to make sure that nothing out of the ordinary or unexpected popped up.

It’s a lot easier to troubleshoot outliers and odd results as they come in than to wait until all the data is collected and then have to think back to what could have happened to throw off numbers last week or last month.

It’s also important to make sure that the two groups (control and experimental) are kept properly separated. This is the kind of situation where a temporary suppression list can be used to make sure there is no crossing over without having to manually rebuild two separate lists.

Parsing and Analyzing

Once all the data is in, it’s time to make sense of it. If you have been following rigorous data collection standards, you should have a clear idea of what your dependent variable was (this is your success metric) and what your independent variables were (these are the variables you were testing; in this case, “time of day”). After checking through your data and sanitizing it of any clear and explainable outliers (did a thunderstorm knock out power for a day in your target test market? Did all your email actually go out as you had intended? etc.), it’s time to make some sense of all the numbers.

Correlation is a mathematical approach that compares the relationship between two variables and ranks it on a scale of “-1” to “+1” The closer to one, the stronger the correlation, and the more linear the relationship (note: there are alternate, non-linear correlation models, but unfortunately we don’t have the time to talk about them here).

There is a truism particularly popular in online marketing that correlation does not indicate causation. This is true, but is not nearly the detriment that some would make it. While correlation doesn’t imply causation, it DOES imply a relationship exists. A correlation between 0.3 and 0.5 implies a moderately strong linear relationship, and anything above a .5 is a very strong signal that two variables are linked.

Once you know that a relationship exists, you can then design an experiment to test the causality of that relationship, hence the hybrid testing methodology.

In the case of the client we did this email marketing optimization for, we saw a clear correlation between the new time of day/email type combinations and increased CTRs. What’s more, we saw a sharp increase in revenue generated per email for the test segment. Clearly, a relationship existed, but was it caused by the change in times or something else entirely?

This is where the hypothesis-testing methodology comes in.

We now had a solid, workable hypothesis: changing the time of day for the various groups of emails and changing them to send in a new order would positively impact CTRs and revenue per email.

We again segmented the list into an experimental and control group and ran an experiment, this time keeping everything in the test segment identical to the control group EXCEPT that the order of the emails and send time was changed to follow the new hypothesis schedule.

Validation and Results

After running the experiment for two more weeks, we were pleased to find that our hypothesis was validated.

What started off as a hunch – that the order of the emails could be optimized – was thoroughly tested and found to be true.

Furthermore, by using multivariate testing we were also able to identify the optimal sending order for all three of the daily email types.

Using a rigorous statistical approach to marketing optimization, in four weeks we were able to significantly grow email list revenue without wasting months and risking alienating list subscribers by experimenting directly on the master list.

We were also able to attribute cause directly and clearly to our optimization, and this is the true benefit of a controlled experiment: it removes the guesswork and allows you to go to your client or management with hard data that says “Yes, I did this, and it drove hard increases in revenue on the same effort and email volume.” – not a bad career move, right?

Good luck, and happy emailing.

P.S. Look for a thorough and detailed white paper on Time of Day optimization coming out soon.