Wednesday, December 3, 2008

Reading Peter's last couple of posts got me thinking about a great TED conference presentation made by former Broadway pianist and NY Times tech columnist David Pogue. A couple of years ago he talked at TED about the design of technology and the importance of simplicity in design.

A lot of what we do as BI developers is design ways for people to mess around with, and learn from, information. One one of the key principles behind a doing this well is to ensure simplicity. A lot of what Tufte and other data visualisation experts talk about can be seen to derive from this principle, and as Peter said, despite the experimental and anecdotal evidence to support it, it's something the vendors often don't do well. One of the reasons for this is that it's just plain hard, and probably something that most software engineers are not very good at doing. Simple, elegant and intuitive interfaces for BI apps are not just aesthetically pleasing, they lead to better understanding on the part of decision makers, and creative uses of the tools that can lead to unexpected insights - which sounds awfully like the vendors' own jargon. I wish they'd listen to people like David Pogue a bit more.

It's a good time of year to be an academic, nearly all the marking and teaching related administration is done for the year, though next year is approaching fast - we do have some time to fully devote our attention to research. We have a lot of projects that are finishing up, which means it time for us to get out and start collecting data for the next round of case studies and investigations. We are also doing some tiding up of our infrastructure. We have had to move out of a room we had devoted to project related activities but that has given us an chance to throw some stuff out and generally get our "house" in order. For example we been have updating and sorting out our files hosted on various servers. None of that has any direct impact on this blog, except we have run out of excuses not to extend our blog related activities a little. Shortly, we'll have a podcast featuring presentations and interviews by and with staff from the Centre. Another thing we will start to do is talk a bit more here on the blog about our published research. So lets start that right now ...

(The file is hosted on Science Direct and they own the copyright, so sorry if you can't access it. If you are on the Monash network, you'll be able to view it, or if you have a Monash authcate, try using the VPN. If you are at another Uni. you'll probably have a subscription)

I know that sounds a bit technical, and that its not of interest if you aren't into forecasting or that worried about exponential growth, but actually, its interesting beyond those areas. The paper presents and experiment we conducted where we asked subjects to forecast growth in iPod sales - which have been exponential. We conducted a similar study years ago, but used made up data, we thought it would be better to use a real exponential data series, so we re-did the study this time using iPod sales as the data series to forecast. Now, it turns out humans are poor at forecasting exponential growth - there is a cognitive bias at work related to the anchoring and adjustment heuristic - which means we just don't pick up on the exponential nature of a data series and forecast growth as a straight line and as a result under estimate growth of exponential data.

In our experiment, we gave the subjects some historical quarterly data, and asked them to forecast 2 quarters out (we knew that "actuals" for the periods we were asking them to forecast).

The idea we designed the experiment to test is a simple one. If you take a log of exponential data, you get a straight line. Humans are good at doing straight line forecasting so we reasoned that if you take a log of exponential data, forecast based on that, you'll get a better forecast than if you just have the data in its 'raw' state. The conversion to log data and back is something a computer system - a DSS - can do nicely, so that's the basic shape of the experiment. All the detail is in the paper - as you'd expect there is a control group using a paper based version of the data, but we built a nice little tool to perform the forecast. You can click on a chart to make a forecast - and it shows you the number, or type in a number and it shows where that number is on the chart. One version of the tool had just the raw data, the other showed both the raw data and the log data.

So, to the results ... the computer supported forecasts were better. Phew, often in these types of studies the DSS is of no help. In our case it was. However, the simpler version of the system, did better than the version that had the log data - the opposite of what we expected. Our explanation is that the simple version encouraged experimentation, letting the users think a bit more about their forecast - exactly what you want a DSS to to. However, rather than helping, the more complex system with the log data, intimidated the users, stopping them from experimenting and as a result they made poorer forecasts.

So forget about forecasting and exponential growth, the main lesson from this study is keep the interface simple.

Friday, November 21, 2008

We really like the folks at Intalign and are doing a couple of research related activities together with them on an ad hoc basis. Right now they are doing a survey of Australian organisations to understand what they are doing with BI. Martin Kratky, one of the Directors at Intalign, just sent out this update and reminder to people on his list about the survey:

"So far, more than 50 major Australian companies have participated in this important study, held in conjunction with Monash University. To gain the broadest possible cross section of prominent Australian organisations, we would be very appreciative of your confidential input.

As a thank you for participating in this survey (max. 10 mins), we would like to offer you:

A free copy of the survey results to benchmark your BI processes including reporting, software, budget allocations and more against best practices and other survey participant's organisations.

A $150 voucher to be used for a one day executive workshop

We are still inviting professionals that are in charge of Business Intelligence processes in their organisation (with an annual turnover of at least $20 million) to take part in the survey. To participate, please send an email to: survey@intalign.com."

If you fit the demographic - an Australian organisation with turnover > $20 mil, and have some spare 10-15 minutes, why not drop them an e-mail and get the link to the survey.

Friday, November 14, 2008

Now I need to qualify this post - which is in response to a vendor presentation I went to during the week - by saying, that the vendor - Cognos - is in my opinion one of the good ones. Their technology is pretty good, maybe still a few performance issues with large numbers of users but generally in great shape. I really like the new report meta-data feature (lineage they call it) in version 8.4 of their core reporting tool. That will be quickly copied by the other vendors.

But, and I mean BUT, I saw a demo of a guided report writing tool - intended to help end-users build their own reports without IT support. All the vendors offer something like this, and none of them work very well.

I counted clicks during the demonstartion. The presenter went through a standard demo, just building a simple financial report with revenue, target and variance shown. It took 31 clicks and 7 drag and drop actions to the build the report. The presenter didn't make a single mistake, knew the software backwards and knew where the data was, and it still took 31 clicks and 7 drop and drags. NO. That is not and end-user report creation tool.

Arrrgggghhhhh.

PODP.S. They also had 3D pie charts with the middle taken out to make them look like donuts. Why?

Tuesday, October 28, 2008

I have an iPhone and I love it. I hate phones, and I hate mobile phones in particular .. but my iPhone is different. Really. the reaction I get to it, when people see it reminds me of the "old" days when Windows was young and DOS ruled and Macs' were the only platform that really implemented a WIMP interface (only when Windows became better accepted - really after the release of Windows95 did mainstream people start calling the bit-mapped, mouse driven graphic desktop-metaphor based interface a GUI).

DOS folks (and I was one once too!) would laugh at me and complain that I was letting the computer make decision for me, I was missing out on the power of commands like "Copy *.*" - look at all the clicks you have to do to copy all your files, ... and on it would go.

Many early EIS systems were Mac-based. To write for DOS mean't the vendor's had to create their own GUI - a non-trivial exercise - so the Mac was a preferred platform. (In 1990 we evaluated Pilot, Comshare and Holos - all had Mac versions that were heaps better than the DOS versions.

Along came Windows 3.11 and then Windows 95 and the game changed.

The Desktop metaphor was created at Xerox Parc in the 60s. It is still with us. In a less defined and less consistent way our interfaces are now also dominated by a document metaphor (partly due to the take up of the Web).

The reason why the iPhone is so cool - and so interesting to me - is not that it's a phone, or that is has a touch screen, or GPS, or a camera or whatever other feature it has - all of those are neat, but as my Windows Mobile/Blackberry/Palm/HTC loving friends point out there are other phones with similar or better technical feature sets. The thing is its easy to use - and it not because it has a great screen, and a touch interface. The interface is based on a "new" metaphor. Information is displayed on a surface - the information becomes the interface. That's the revolution. Many of my friends don't get that.

Sadly, at this stage, it seems very few BI vendors get it either. This metaphor - and the technology that supports it - is perfect for BI systems. Watch this video from Edward Tufte - he's talking about the iPhone as a phone - forget that and just think about what he says about the interface. He's describing an almost perfect way of presenting and exploring data. As he explains it, the iPhone is a dramatic new paradigm. It will be copied and probably improved upon - but not by phone vendors who think squeezing a menu based system onto a touch screen is an appropriate interface for a mobile device.

I hope that one day the BI vendors will understand too. I wonder - if as before - we have to wait for Microsoft (who are playing with this new metaphor with their surface product) to build a new version of Windows before the BI worlds makes a real change for the better.

Wednesday, July 16, 2008

Seth Godin, professional agent provocateur, has caused something of an outcry amongst data visualisation aficionados in a blog post lambasting the use of bar charts in presentations. Although his heart is in the right place, Godin's somewhat simplistic prescriptions for improving communication effectiveness miss the mark. Perhaps his greatest crime is to advocate the use of pie charts, which, as we know, are as "professional as a pair of assless chaps." The boys at Juice Analytics and Stephen Few have the scoop.

Friday, May 30, 2008

A while ago in a previous post, I referred to an interview with Neil Raden entitled "Is Business Intelligence Stuck in the Past?" where he talked about how most business users today are more tech-savvy than the IT department. Peter O'Donnell just sent me through a link to a Gartner Voice interview with Peter Keen (one of the founding fathers of DSS/BI) where he makes essentially the same point:

IT is in danger of becoming the technology laggard.

Tune in here to listen to the interview. You can also subscribe to the Gartner Voice podcast in iTunes (the interview with Keen is the most recent episode).

Thursday, April 17, 2008

Peter O'Donnell and myself are currently supervising an honours student who is looking at the issue of data warehouse security, with a view to doing a survey of DW security practices in Australian companies. It's still early days, but one of the things that Justin has found is that there is very little literature (academic or otherwise) talking about the issue (either highlighting problems, or outlining best practice). This is both good and bad news: it means that Justin will be making a real contribution, but he's going to have trouble writing the literature review section of his thesis!

To give you some idea of where our thinking is at, here's a generic architecture for the flow of information through a data warehouse:

Each component of the diagram above is a potential security problem. Just the ETL process, for example, poses problems of massive amounts of data moving around a network, taken out of what is presumably an initially secure environment. We've found very little that talks about securing the individual components of the architecture, or of taking an holistic view and securing the whole process, end-to-end. On the flip-side, security often poses a problem from a functionality or performance perspective - what can we do to make the whole thing as responsive and functional as possible while still protecting an important organisational assett?

Any thoughts, war stories, pointers to resources or comments would be appreciated!

Wednesday, March 19, 2008

Last year I co-authored a book chapter with two other colleagues, Peter O'Donnell and David Arnott, on the use of data warehouses for decision support, and it's just recently been published. The book is called Handbook on Decision Support Systems edited by Frada Burstein (another Monash colleague) and Clyde Holsapple. One section of the chapter that I wrote looked at current trends in DW practice, and I thought, as I wrote it in late 2006, that it would probably be better as a blog post, than part of a chapter in a (hopefully long-lived) book. Here's the excerpt. I'd be interested to hear what other people think are the big trends in DW and where it's headed.

Current Trends and the Future of Data Warehousing Practice

Forecasting future trends in any area of technology is always an exercise in inaccuracy, but there are a number of noticeable trends which will have a significant impact in the short-to-medium term. Many of these are a result of improvements and innovations in the underlying hardware and database management system (DBMS) software. The most obvious of these is the steady increase in the size and speed of data warehouses connected to the steady increase in processing power of CPUs available today, improvements in parallel processing technologies for databases, and decreasing prices for data storage. This trend can be seen in the results of Winter Corporation's "Top Ten Program," which surveys companies and reports on the top ten transaction-processing and data warehouse databases, according to several different measures. Figure 11 depicts the increase in reported data warehouse sizes from the 2003 and 2005 surveys (2007 data has not yet been released):

The data warehousing industry has seen a number of recent changes that will continue to have an impact on data warehouse deployments in the short-to-medium term. One of these is the introduction by several vendors, such as Teradata, Netezza and DATAllegro, of the concept of a data warehouse 'appliance' (Russom, 2005). The idea of an appliance is a scalable, plug-and-play combination of hardware and DBMS that an organization can purchase and deploy with minimal configuration. The concept is not uncontroversial (see Gaskell, 2005 for instance), but is marketed heavily by some vendors never-the-less.

Another controversial current trend is the concept of 'active' data warehousing. Traditionally, the refresh of data in a data warehouse occurs at regular, fixed points of time in a batch-mode. This means that data in the data warehouse is always out of date by a small amount of time (since the last execution of the ETL process). Active data warehousing is an attempt to approach real-time, constant refreshing of the data in the warehouse: as transactions are processed in source systems, new data flows through immediately to the warehouse. To date, however, there has been very limited success in achieving this, as it depends on not just the warehouse itself, but performance and load on source systems to be able to handle the increased data handling. Many ETL processes are scheduled to execute at times of minimal load (eg. overnight or on weekends), but active warehousing shifts this processing to peak times for transaction-processing systems. Added to this are the minimal benefits that can be derived from having up-to-the-second data in the data warehouse, with most uses of the data not so time-sensitive that decisions made would be any different. As a result, the rhetoric of active data warehousing has shifted to "right-time" data warehousing (see Linstedt, 2006 for instance), which relaxes the real-time requirement for a more achievable 'data when it's needed' standard. How this right-time approach differs significantly in practice from standard scheduling of ETL processing is unclear.

Other than issues of hardware and software, a number of governance issues are introducing change to the industry. One of these is the prevalence of outsourcing information systems - in particular the transaction-processing systems that provide the source data for warehouse projects. With many of these systems operated by third party vendors, governed by service level agreements that do not cover extraction of data for warehouses, data warehouse developers are facing greater difficulties in getting access to source systems. Arnott (2006) describes one such project where the client organization had no IT staff at all, and all 13 source systems were operated off-site. The outsourcing issue is compounded by data quality problems, which is a common occurrence. Resolution of data quality problems is difficult even when source systems are operated in-house: political confrontations over who should pay for rectifying data quality problems, and even recognition of data quality as a problem (in many cases, it's only a problem for data warehouse developers, as the transaction processing system that provides the source data is able to cope with the prevailing level of data quality) can be difficult to overcome. When the system is operated off-site and in accordance with a contractual service level agreement that may not have anticipated the development of a data warehouse, they become even more difficult to resolve.

In addition to the issues of outsourcing, alternative software development and licensing approaches are becoming more commonplace. In particular, a number of open source vendors have released data warehousing products, such as Greenplum's Bizgres DBMS (also sold as an appliance) based on the Postgres relational DBMS. Other open source tools such as MySQL have also been used as the platform for data warehousing projects (Ashenfelter, 2006). The benefits of the open source model are not predominantly to do with the licensing costs (the most obvious difference to proprietary licensing models), but rather have more to do with increased flexibility, freedom from a relentless upgrade cycle, and varied support resources that are not deprecated when a new version of the software is released (Wheatley, 2004). Hand-in-hand with alternative licensing models is the use of new approaches to software development, such as Agile methodologies (see http://www.agilealliance.org) (Ashenfelter, 2006). The adaptive, prototyping oriented approaches of the Agile methods are probably well suited to the adaptive and changing requirements that drive data warehouse development.

The increased use of enterprise resource planning (ERP) systems is also having an impact on the data warehousing industry at present. Although ERP systems have quite different design requirements to data warehouses, vendors such as SAP are producing add-on modules (SAP Business Warehouse) that aim to provide business intelligence-style reporting and analysis services without the need for a separate data warehouse. The reasoning behind such systems is obvious: since an ERP system is an integrated tool capturing transaction data in a single location, the database resembles a data warehouse, insofar as it's a centralized, integrated repository. However, the design aims of a data warehouse that dictate the radically different approach to data design described above in Sections 3.1 and 4 mean that adequate support for management decision-making requires something other than simply adding a reporting module to an ERP system. Regardless, the increased usage of ERP systems means that data warehouses will need to interface with these tools more and more. This will further drive the market for employees with the requisite skill set to work with the underlying data models and databases driving common ERP systems.

Finally, Microsoft's continued development of their Microsoft SQL Server database engine has produced a major impact on Business Intelligence vendors. Because of Microsoft's domination of end-user's desktops, it is able to integrate its BI tools with other productivity applications such as Microsoft Excel, Microsoft Word and Microsoft PowerPoint with more ease than their competitors. The dominance of Microsoft on the desktop, combined with the pricing of SQL Server, and the bundling of BI tools with the DBMS means that many business users already have significant BI infrastructure available to them, without purchasing expensive software from other BI vendors. Although SQL Server has been traditionally regarded as a mid-range DBMS, not suitable for large-scale data warehouses, Microsoft is actively battling this perception. They recently announced a project to develop very large data warehouse applications for an external and an internal client, to handle data volumes up to 270 terabytes (Computerworld, 2006). If Microsoft are able to dispel the perception that SQL Server is only suited for mid-scale applications, it will put them into direct competition with large-scale vendors such as Oracle, IBM and Teradata, with significantly lower license fees. Even if this is not achieved, the effect that Microsoft has had on business intelligence vendors will flow through to data warehousing vendors, with many changes being driven by perceptions of what Microsoft will be doing with forthcoming product releases.

Friday, February 29, 2008

Just when you thought you'd seen every stupid data visualisation trick out there, someone invents the "magic pie chart" and the rotating "statistical lazy susan." The US cable news networks are outdoing themselves during the US presidential primaries, and breaking just about every rule in the data visualisation book. BI vendors, eat your hearts out! Check out this gem from The Daily Show with Jon Stewart: