What would you do with 5 million award search data points?

I love good data. Taking huge chunks of information and trying to distill trends, patterns and links has always been interesting to me. And so I find myself wondering this afternoon just what to do with a massive batch of data related to airline award searches. See, for about the past year (probably longer, actually) I’ve had a tool available online to allow people to search for awards on Star Alliance carriers. And those searches each return some collection of data. Over time the data collected added up and I now realize that I have more than 5 million rows of search results available.

And now I cannot help but wonder what I should do with it. Also, I’m not entirely sure I know how to tease the data out into something useful.

Are there trends in when seats are released or booked? Are certain months or routes really more likely to have seats available? More likely to be searched on?

What else? What types of information do you want me to try to pull out of the data?

No promises, as I’m not entirely sure I know where to begin with the analysis, but I’m definitely willing to give it a shot if anyone has a suggestion of something that seems like a useful query to run.

I’ve read a few recent posts about specific carriers tightening availability, particularly in premium cabins, and tightening industry-wide in general – something along those lines would be interesting. % of queries returning an available option by month and operating carrier?

The problem with seeing “what is searched” is that your tool is a very non-representative sample. People have to know about your tool, then choose to use it over other options like airline websites or award nexus. That wouldn’t provide much information.

Availability information, on the other hand, is much more useful. Saying “what percentage of queries from the USA to NRT showed availability” is helpful. Again, it’s not going to be spot-on representative because of who uses your site. That said, the Switchfly study that everyone quotes is so horrible that pretty much anything from you would be an improvement.

I think a chart showing award availability % starting 331 days out up to day before……….and it is probably equally important to break out FC and business……….and finally can you group regional availability rather than specific routes which are sometimes not that helpful……

I’m curious about most common destinations searched. I’m convinced most frequent flyers look for tickets to Europe and Hawaii (again, not sure if your users are representative of most frequent fliers).

I’m sure its worth more then any of those award travel reports published online.
You get real people, looking for real flights with real result. Great deal of information on where/when people want to travel and who makes it happen with seats available.

Please do NOT make it publicly available unless you want to see consistent patterns of inventory to dry up like LH/LX F which used to be a gimme. Great job you’ve made the set but keep it to yourself. It’s in your own self-interest.

Does your privacy policy cover the collection of this data and how it can be used or shared? Is it compliant with each individual country’s laws? Just curious. Wouldn’t want to see this data collection die on the vine due to legal/privacy concerns.

The irregular frequency of which your data is sampled poses a unique problem. If it’s possible, I think it may be beneficial to assume a much lower sampling frequency and decimate the extra data points between to have some periodicity. Then you could do some neat things(and more quickly). You can always look at the higher resolution samples later if need be.

The irregular frequency of which your data is sampled poses a unique problem. If it’s possible, I think it may be beneficial to assume a much lower sampling frequency and decimate the extra data points between to have some periodicity. Then you could do some neat things(and more quickly). You can always look at the higher resolution samples later if need be.

I like the open source idea. At least a few of us have training in statistics and might have some good ideas.

I’d start with a high-level analysis along the lines of North America to Asia. Look at the average rate of successful searches. If N.A. to Asia has high variation and N.A. to Europe has low variation, then breakdown N.A.-Asia into individual routes and see which ones are more successful than others. Put N.A.-Europe on hold.