Sunday, March 06, 2011

[This is an article for people working in leisure travel technology / ecommerce online conversion who visit this blog, although many of the take-home points are transferable to other industry verticals.]

Data is big, and getting bigger. The more we track and log, the more storage is needed to warehouse it, and the more CPU horsepower is needed to mine it to answer questions posed by the business. As an aside, everyone is facing this issue and it's sink or swim, with the swimmers sure to get a competitive advantage over the sinkers. In this article, I'll examine the main data feeds that matter in leisure travel, and propose an architecture to collect, manage and mine them for business benefit. The end goal is to propose a vision, explaining why and how to collect data to better inform and drive business decisions that improve ecommerce performance.

But why now - hasn't this always been an issue? Yes, but now more than ever, leisure travel is poised on the cusp of another big game-changer. Companies like Google and Microsoft are clearly already focusing more on travel as a segment, and their data gathering and mining capabilities are considerable. But tour operators and online travel agencies (OTAs) have a significant competitive advantage over pure play technology companies as we'll see a little later.

Important data sources in leisure travel ecommerce

First, let's examine the primary data sources that affect leisure travel ecommerce. There are some obvious entries in the table that follows, and some less so.

Tripadvisor is the poster child here, but user generated content (UGC) can be in-house too - but it must be perceived as unbiased by the consumer, otherwise it becomes a negative.

9

Meta data

Both

Yes

Every business tags its own data - timestamps, version numbers, # revisions, author, approver, when last yielded. The more meta data you have the merrier - it often helps to tie disparate data sources together and enriches the overall data pool

10

Search, cost, book funnel

Internal

Yes

Traditionally the core of any ecommerce strategy - measures the complete search, cost and book journey. Needs to be fully instrumented to collect data so that A/B and multivariate testing can be used to fine-tune performance over time. Google Analytics does this very, very well.

11

Offline (shop) interactions

Internal

Yes

Few businesses try to tie shop activity back to online activity, but for a bricks and mortar plus clicks business, this is an opportunity missed

12

Online advertising (SEO)

Internal

Partially

SEO can be thought of as PPC you don't pay for! Critical to making cost of acquisition online as efficient as possible. Only partially controllable due to businesses being at the mercy of search engine scoring (which both Google and Microsoft (Bing) keep as a black box algorithm)

13

Online advertising (PPC)

Internal

Yes

Where Google makes its money!.. PPC has pride of place in every well-constructed ecommerce campaign, but the cost and effectiveness should be continuously monitored, challenged and tuned. CSV exports out of AdWords provide a good way to do this

14

Personalisation

Internal

Yes

Personalisation - both anonymous and known, is a great way to learn what kind of holiday / vacation people want to buy from you and how they want to find and buy it. Just don't try to build personalisation before you have (10) working well - personalisation needs a really solid foundation to work well..

15

Social media

External

No

The rising star that no-one really knows how to handle. The Facebook API contains a lot of potential for travel ecommerce

16

Offline / traditional advertising

External

Yes

The efficacy (or not) of ad spend must extend to traditional / offline as well as the more easily measurable online variant, otherwise you don't know where all of your marketing £s / $s / €s are going

17

Post-booking interactions

Internal

Yes

ecommerce data source, but savvy businesses are now looking at post-booking amendments, cancellation rates etc. to identify patterns that can feed back into the search experience

18

Customer Relationship Management (CRM)

Internal

Yes

Both pre and post travel - it's key to have a good view of what the customer experiences on holiday and feed that back into what holidays are sold going forward. Is that picture of the pool misleading - change it! If the service is great, promote it more!

Two important characteristics of data are whether you control it or not (and hence can change it if you need to) and whether it is sourced from an internal system or an external system (and thus how trustworthy / accurate the data is and whether it is unique to you or if other business entities can see it too). We have added these two characteristics to the table above for clarity.

What should be obvious to the reader is that a holistic picture of ecommerce performance requires multiple data sources, some of which traditionally would not be seen as impacting the effectiveness of a leisure travel ecommerce system. Gone are the days of simply looking at the web logs to see how effective (or leaky) the conversion funnel is! In fact, there are probably some sources that I've inadvertently omitted, and indeed as new systems come on stream, new sources will be added to this table / taxonomy.

Finally, it's interesting from a barrier to entry perspective to note that only the well-placed tour operator or OTA actually has the wherewithal and access to collate data from all of the sources noted in the table. Other new entrants simply do not have access to many of the sources listed. The data itself is now a valuable commodity (and is increasing in value), and an asset that leisure travel businesses would do well to guard jealously.

What we need - Systems and Data working together

At present, I contend that the average tour operator / OTA is collecting some, but not all of the data sources identified, and that no tour operator or OTA has yet constructed a system that provides a holistic, joined-up view of the data back to the business function to inform decision-making activities. Why not? Because it's not easy to do! The IT estate behind these data sources is fragmented (core res system, yielding system, multiple content management systems, external systems, separate booking repositories / agency management systems, Google Analytics, Google AdWords, Excel spreadsheets), often owned by different companies and wasn't designed to provide with the kind of view that is now needed. Ominously, new entrants into the space do not have a lot of the legacy baggage that incumbents do, meaning their velocity of implementation and ongoing change creates a hard-to-ignore imperative for all sellers of leisure travel to innovate quickly and learn from their data, or be left behind.

The technical challenge is four-fold:

1. Collection and storage - gather and store as much data as possible for each data source in the table, with that data being as clean and structured as possible (and in the real world, every data set will have some noise to it)

2. Build a holistic, joined-up data set - identify ways to link the data sources together - version number, unique keys, foreign keys, link backs, tagging etc. The more your data sources are joined up, the more holistic a view of the business you are building (and can provide back to the business). Conversely, disconnected data sets (data islands) are of much less value to the business and introduce the risk of an incomplete / inaccurate view of what's really happening now being used to influence what's going to happen next

3. Answering the questions - provide a mechanism to answer questions over this corpus of data in near real-time to allow the business to modify its behaviour and focus to maximise profits, yield and margin

4. Suggesting the questions - once the above three points have been implemented to a mature and repeatable level, the final logical step is for the data function to actually suggest areas of improvement and further exploration based on emergent patterns in the data, using techniques such as artificial neural network and self-organising maps (SOM) analysis

Putting it all together - a suggested framework

There are many ways to construct a view over the data sources identified in the previous section. And in fact, multiple views are encouraged depending on the goal of the business. Here however, a hybrid of time and business function is selected in order to select a reasonable framework to hold the data. This framework is depicted in the following diagram.

Figure 1. High-level schematic of the big data system for leisure travel ecommerce.

A concrete implementation of the framework

The question naturally arises - how would this system be constructed, not just initially but also maintained and extended going forward?

Some natural candidates already exist, chief among them Cassandra and Hadoop. In the author's opinion, a hybrid architecture of Cassandra's data storage and innate simplicity and high availability, coupled with the MapReduce framework from Hadoop offers the best blend of performance, scalability, availability / resilience, querying and extensibility. A separate follow-on instalment to this article is warranted to provide a detailed technical treatise on the underpinnings of the system outlined here.

Conclusion

The dominant data sources that impact the effectiveness of a leisure travel ecommerce strategy are identified, named and classified. Developing this classification further, a model is used to create a framework to house the data sources and a concrete implementation suggested.

About the author: Humphrey is the Chief Technology Officer for Comtec Group, a company that specializes in leisure travel technology.

(1) has been a long time coming and it's good to see the log jam moving. Simply shipping JDK 7 is good in its own right but it also means that the team will move onto working on JDK 8, which contains some key language features omitted from JDK 7 so that the team could JGIOTFD (Just Get It Out The (reader exercise to complete the acronym)).

(2) looks to be Oracle really making the JEE stack cloud-based / cloud-friendly by default rather than a technology stack that merely facilitates cloud computing. This dynamic should see Oracle formalising exactly what constitutes "JEE in the cloud" via a JSR and thus wresting that intellectual responsibility back from Google's App Engine platform, which is pretty much the de facto standard for "JEE in the cloud" at present.

Looking beyond JEE 7, JEE 8 looks to be embracing Big Data / NoSQL systems like Hadoop and Cassandra, although we can expect to have seen significant consolidation in this space by 2013, making the integration and platform support task easier to accomplish.

All in all, two nice moves, and good news for the Java eco system / economy. You might or might not like Oracle, but they are getting stuff out the door in a way that Sun kind of forgot how to do.