Oracle Blog

Thoughts, Tips, Rationale

Monday Jan 04, 2010

When asked about the accuracy of RTD's data mining algorithms I often find myself explaining the reasons behind my belief that as a system RTD is much more accurate than any offline data mining system in most cases. One of the reasons for the enhanced accuracy is the capability of directly measuring reality rather than trying to reconstruct it from disconnected data sources.

For example, assume that you are studying the acceptance of offers in a call center. One of the inputs that may be interesting is the length of the queue at the time of the call. In an offline exercise you would have to obtain the logs from the telephony queue, hope that they are kept at enough accuracy, hope that the clock in the systems is synchronized and then query the log using a time based query for sorting the log records. The same thing in RTD is accomplished by simply querying the telephony queue for its current length, at the time of the call. There is no need to hope for data being collected properly, at the right granularity and with synchronized clocks. As we are dealing with reality as-it-happens, we do not care if the clocks are all wrong.

The end result of the difficulty in reconstructing reality is that typical offline data mining studies have much narrower inputs than those typically seen in RTD implementations. The difference in data availability in many cases more than makes up for possible accuracy improvements gained from a manually crafted data mining model.

Just to complete the picture I have to point out that I said "many cases" or "most cases" but not "all cases". The reason for that is that there are many good reasons to perform off-line data mining and it is worth investing in getting the data and complex queries right. Examples include retention, life-time value and in some cases product affinity models. There are also many areas for which RTD algorithms are not applicable, like data exploration, visualization and clustering.

Nevertheless, for predictive data mining applied to process improvement it is hard to beat the real time data collection capabilities or real time analytics systems.