Latest opinions from one of Europe's foremost authorities on BI and Data Management

Main menu

Featured

Welcome to my Blog on Building the Smart Business. This blog looks at all the areas that need to be addressed so that companies can transition themselves to being an agile event driven optimised business. To do this we need several building blocks.

Big Data Platforms and Complex Analytics

Data Governance and Enterprise Information Management

Master Data Management

Process oriented Operational BI via On-demand Analytics

Event Processing and Automated Decisioning Management

Collaborative, Social and Mobile BI

CPM Strategy Management

Cloud Computing

I will discuss all of these areas in my blogs and ask you to comment on how you are using these technologies in your organisation.

For more information on Intelligent Business Strategies and how we can help you, please click here

The arrival of Big Data has seen a more complex analytical landscape emerge with multiple data stores beyond that of the traditional data warehouse. At the centre of this is Hadoop. However other platforms like data warehouse appliances, NoSQL graph and stream processing platforms have also pushed their way into the analytical landscape. Now that these new platforms together form the analytical ecosystem it is clear that managing data in a big data environment has become more challenging. It is not just because of the arrival of new kinds of data store. It is the characteristics of Big Data like volume, variety and velocity that also make big data more challenging to manage. The arrival of data at very high rates and the volume of big data means that certain data management activities need to be automation instead of manual. This is especially the case in data lifecycle management.

In any data life cycle, data is created, shared, maintained, archived, retained and deleted. In a big data environment this also occurs. But when you add characteristics like volume, variety and velocity into the mix then the reality of managing big data thoughout its lifecycle really challenges the capability of existing technology.

In the context of big data lifecycle management a key question is what is the role of Hadoop? Of course Hadoop could be several roles. Popular ones include:

Hadoop as a landing zone

Hadoop as a data refinery

Hadoop as a data hub

Hadoop as a Data Warehouse archive

Hadoop as a landing zone and Hadoop as a data refinery fit well into the data lifecycle between CREATE and SHARE. With big data, what happens between CREATED and SHARED is a challenge. Before data is shared it needs to be trusted. That means that data needs to be captured, profiled, cleaned, integrated, refined, and sensitive data identified and masked to prevent shared sensitive data being visible to unauthorised users.

In the context of Hadoop as a landing zone, big data may arrive rapidly. Therefore technologies like stream processing are needed to filter data in motion to only capture the data of interest. Once captured, volume and velocity could easily make it impossible to profile captured data manually as would be typical in a data warehouse staging area. Therefore automated data profiling and relationship discovery is needed so that attention is drawn quickly to data that needs to be cleaned. Data cleansing and integration also needs to exploit the power of Hadoop MapReduce for performance and scalability on ETL processing in a big data environment.

Once big data is clean we can enter the data refinery which is of course when we see the use of Hadoop as an analytical sandbox. Several analytical sandboxes may be created and exploratory analysis performed to identify high value data. At this point auditing and security of access to big data needs to be managed to monitor exactly what activities are being performed on the data so that unauthorised access to analytical sandboxes and the data within is prevented. If master data is brought into Hadoop in the refining process, to analyse big data in context, then it must be protected and sensitive master data attributes like personal information needs masked before it is brought into a data refinery to add context to exploratory analysis tasks.

Once data refining has been done then new data can be published for authorised consumption. That is when Hadoop takes on the role of a data hub. Data can be moved from the Data Hub into Hadopp HDFS, Hadoop Hive, Data warehouses, Data Warehouse Analytical Appliances and NoSQL databases (e.g. graph databases) to be combined with other data for further analysis and reporting. At this point we need built-in data governance whereby business users can enter the Data Hub, and subscribe to receive data sets and data subsets in a format they require. In this way even self-service access to big data is managed and governed.

Going further into the lifecycle, when data is no longer needed or used in data warehouses, we can archive it. Rather than doing that to tape and taking it offline, Hadoop gives us the option to keep it on-line by archiving it to Hadoop. However, if sensitive data in a data warehouse is archived, we must make sure masking is applied before it is archived to protect it form unauthorised use. Similarly if we archive data to Hadoop, we may need to archive the data from several databases (e.g. data warehouses, data marts and graph databases) to preserve integrity and make a clean archive.

Finally there will be a time when we delete data. Even with storage being cheap, the idea that all data will be kept is just not going to happen. We need policies to govern that of course. Furthermore, similar to archive, these policies need to be applied across the whole analytical ecosystem with the aded complexity that big data means we need these tasks to be executed at scale. These are just some of the things to think about in governing and managing the big data life cycle. Join me on IBM’s Big Data Management Tweet chat on Wednesday April 16th at 12:00 noon EST to discuss this in more detail.

Two weeks ago I attended the popular Strata conference in Santa Clara California where, frankly the momentum behind Big Data was nothing short of unstoppable. 3100 delegate poured into the Santa Clara Convetion Centre to see a myriad of big data technologies. Things that stood out for me include the massive interest in the new Hadoop 2.0 Apache Spark in-memory framework that runs on top of YARN and the SQL-on-Hadoop wars that broke out with every vendor claiming they were faster than everyone else. The momentum behind Spark was impressive with vendors including Cloudera and Hortonworks now running Spark on their Hadoop distributions.

The tables below show the SQL on Hadoop initiatives

Some of the sessions on the SQL on Hadoop were a little disappointing as they focussed on far too much on query benchmarks rather than the challenges of using SQL to access complex data such as JSON data, text data or log data. Log data is of course very much in demand at present to provide insight into on-line behaviour. In addition what about multiple concurrent users access Hadoop data via SQL? It is clear that the in-memory Shark on Spark (Hive on Spark) initiative coming out or AMPlab at UC Berkeley is looking to address this.

Pureplay Hadoop vendors Cloudera, Hortonworks and MapR were all out in force with like In addition to the above there were new self-service data management tools like Paxata and Trifacta who are aiming there products at data scientists and business analysts. This is fuelling a trend where users of these tools and self-service BI tools are now getting the power to clean and prepare their own data rather than using enterprise data management platforms from vendors like IBM, Informatica, SAP, SAS, GlobalIDs and Oracle. I have already blogged about this in my last blog. Also new visualization vendors like Zoomdata dazzed everyone with the ‘Minority Report’ demo and virtual reality demos. Then of course there are the giants. IBM, Oracle, SAP, Microsoft. Microsoft has integrated Excel with it HDInsights Hadoop distribution and its HDInsights on Windows Azure. Meanwhile IBM’s recent Watson Announcement shows Big Blue’s commitment to not just run analytics and BI tools against big data but to move beyond that into cognitive technologies on top of its big data platform with the emergence of Watson Foundations, Watson Explorer, Watson Analytics and Watson Discovery Server.

With all this technology (apologies to other vendors not mentioned), it is not surprising people are feeling somewhat overwhelmed when it comes to putting together a big data strategy. Of course it is not just about technology. Questions about business case, roles, skills, architecture, new technology components, data governance, best practices, integration with existing technology, pitfalls to avoid and much more all need answered. Therefore please join me on Twitter on Wednesday March 5th on IBM’s Big Data Tweetchat at 12:00 ET to discuss “Creating a Big Data Strategy”

There is no doubt that today self-service BI tools have well and truly taken root in many business areas with business analysts now in control of building their own reports and dashboards rather that waiting on IT to develop everything for them. Using data discovery and visualisation tools, like Tableau, Qlikview, Tibco Spotfire, MicroStratagy and others, business analysts can produce insights and publish them for information consumers to access via any device and subsequently act on. As functionality deepens in these tools, most vendors have added the ability to access multiple data sources so that business analysts can ‘blend’ data from multiple sources to answer specific business questions. Data blending is effectively lightweight data integration. However this is just the start of it in my opinion. More and more functionality is being added into self-service BI tools to strengthen this data blending capability and I can understand why. Even though most business users I speak to would prefer not to integrate data (much in the same way in which they would prefer not to write SQL), the fact of the matter is that the increasing number of data stores in most organizations is driving the need for them to have to integrate data from multiple data sources. It could be that they need data from multiple data warehouses, from personal and corporate data stores, from big data stores and from a data mart or some other combination. Given this is the case, what does it mean in terms of impact on enterprise data governance? Let’s take a look.

When you look at most enterprise data governance initiatives, these initiatives are run by centralized IT organizations with the help of part time data stewards scattered around the business. Most enterprise data governance initiatives start with a determined effort to standardize data. This is typically done by establishing common data definitions for core master data, reference data (e.g. code sets), transaction data and metrics. Master data and reference data in particular are regularly the starting point for enterprise data governance. The reason for this is that this data is so widely shared across many applications. Once a set of common data definitions has been defined (e.g. for customer data), the next step is to discover where the disparate data for each entity (e.g. for customer) is out there in the data landscape. This requires that data and data relationship discovery is done to find all instances of the same data out there across the data landscape. Once the disparate data has been found, it becomes possible to map the discovered disparate data back to the common data definitions defined earlier, profile the discovered disparate data to determine its quality and then define any rules needed to clean, transform and integrate that data to get it into a state that is fit for business use. Central IT organisations typically use a suite of data management tools to do this. Not only that, but all the business metadata associated with common data definitions and the technical metadata that defines data cleansing, transformation and integration is typically all recorded in the data management platform metadata repository. People who are unsure of where data came from can then view that metadata lineage to see where a data item came from and how it was transformed.

Given this is the case, what then is the impact of self-service BI given that business users are now in a position to define their own data names and integrate data on their own using a completely different type of tool from those that are provided in a data management platform? Well it is pretty clear that even if central IT do a great job of enterprise data governance, the impact of self-service BI is that it is moving the goal posts. If self-service BI is ungoverned it could easily lead to data chaos with every user creating reports and dashboards with their own personal data names and every user doing their own personal data cleansing and integration. Inconsistency could reign and destroy everything an enterprise data governance initiative has worked for. So what can be done? Are we about to descend into data chaos? Well first of all self-service BI tools are now starting to record / log a users actions on data while doing data blending to understand exactly how he or she has manipulated data. That is of course a good thing. In other words self-service BI tools are starting to record metadata lineage – but in their repository and not that of a data management platform. Also reports are available on lineage in some self-service BI tools already. A good example here is QlikView who do support metadata lineage and can report to show what has happened to data. However there appears to be no standards here for metadata import from existing data management platform repositories to re-use data definitions and to re-use transformations (other than XMI for basic interchange and even then there is no guarantee that interchange occurs). Other self-service BI tool users may be able to re-use data transformations defined by a different user but, as far as I can see, this is only possible when the same tool is being used by all users. The problem here is that there appears to be no way to plug self-service BI into enterprise data governance initiatives and certainly no way to resolve conflicts if the same data is transformed and integrated in different ways by central IT on the one hand using a data management toolset and by business users on the other using a self-service BI tool. If the self-service BI tool and the data management platform are from the same vendor you would like to think they would share the same repository but I would strongly recommend you check this as there is no guarantee.

The other issue is how to create common data names. I see no way to drive consistency across both self-service BI tools (where reports and dashboards are produced) and centralised data governance initiatives that use a business glossary especially if both technologies are not from the same vendor. Again, even if they are, I would strongly recommend you check that integration between a vendor’s self-service BI tool and the same vendor’s data management tool suite is in place.

The third point to note is that BI development is now happening ‘outside in’ i.e. in the business first and then escalated up into the enterprise for enterprise wide deployment. I have no issue with this approach but if this is the case then enterprise data governance initiatives starting at the centre and driving re-use out to the business units are diametrically opposed to what is happening in self-service BI. Ideally what we need is data governance from both ends and the ability to share common data definitions to get re-use of data names and data definitions as well as common data transformation and integration rules to get re-use across both environments. However, in reality it is not yet happening because stand alone self-service BI tools vendors have not implemented metadata integration across BI and heterogeneous data management tool suites. Today I regrettably have to say that in my honest opinion it is not there. This lack of integration spells only one thing ….re-invention rather than re-use. Self-service BI tools vendors are determined to give all power to the business users and so like it or not, self-service data integration is here to stay. And while these tools vendors are rightly recording what every user does to data to provide lineage, there are no metadata sharing standards between heterogeneous data management platforms (e.g. Actian (formerly Pervasive), Global IDs, IBM InfoSphere, Informatica, Oracle, SAP Business Objects, SAS (formerly Dataflux)) and heterogeneous self-service BI tools. If this is the case it is pretty obvious that re-use of common data definitions and re-use of transformations is not going to happen across both environments. The only chance is if both the data management platform and the self-service BI tools are out of the same stable but if you are looking for heterogeneous BI tool integration then it is not guaranteed as far as I can see. All I can recommend right now is if you are tackling enterprise data governance – go and see your business users and educate them on the importance of data governance to prevent chaos until we get metadata sharing across heterogeneous self-service BI tools and data management platforms. If you want to learn more about this please join me for my Enterprise Information Management masterclass in London on 26-28 February 2014

The arrival of Big Data is having a dramatic impact on many organizations, in terms of deepening insight. However it also has an impact in the enterprise that drives a need for data governance. Big Data introduces:

New sources of information

Data in motion as well as additional data at rest

Multiple analytical data stores in a more complex analytical environment (with some of these data stores possibly being in the cloud)

The data landscape is therefore becoming more complex. There are more data stores and each big data analytical platform may have a different way to store data sometimes with no standards.

Despite this more complex environment there is still a need to protect data in a Big Data environment but doing that is made more difficult by the new data characteristics of data volume, variety and velocity. Rich sets of structured and multi-structured data brought into a big data store for analysis may easily attract cyber criminals if sensitive data is included. Data sources like customer master data, location sensor data from smart phones, customer interaction data, on-line transaction data, e-commerce logs and web logs may all being brought into Hadoop for batch analytical reasons. Security around this kind of big data may therefore be an issue. In such a vast sea of data we need technology to automatically discover and protect sensitive data. It is also not just the data that needs to be protected. Access to Big Data also needs to be managed whether that be data in analytical sandboxes, in distributed file systems, NoSQL DBMS or analytical databases, Knowing that information remains protected even in this environment is important. In MapReduce applications, low level programming APIs need to be monitored to control access to sensitive data. Also, access control is needed to govern people with new analytical tools so that only authorised users can access sensitive data in analytical data stores . Compliance may dictate that some data streams, files and file blocks holding sensitive data are also protected e.g. by encrypting and masking sensitive data.

In addition a new type of user has emerged – the data scientist. Data scientists are highly skilled power users who need a secure environment where they can explore un-modelled multi-structured data and/or conduct complex analyses on large amounts of structured data.

However sandbox creation and access needs to be controlled, as does data going into and coming out of these sandboxes

If we can use software technology to automatically discover and protect big data then confidence in Big Data will grow.

I shall be discussing Big Data governance in more detail at my up and coming Big Data Multi-Platform Analytics class in London on October 17-18. Please register here if you want to attend.

In my last blog I looked at big data governance and how it produces confidence in structured and multi-structured data that data scientists want to analyse. I would like to continue that theme in this blog by looking at what is happening in the area of Big Data governance in a little more detail. Over the last two years we have seen several data management software vendors extend their products to support Big Data platforms like Hadoop. Initially this started out as supporting it as both a target to provision data for exploratory analysis and as a source to move derived insights from Hadoop into data warehouses. However several vendors have evolved their data cleansing and integration tools to exploit Hadoop by implementing ELT processing on that platform much like they did on data warehouse systems. Scalability and cost are major reasons for this. This has prompted some organizations to consider loading all data into a Hadoop cluster for ELT processing via generated 3GL, Hive or Pig ELT jobs running natively on a low cost Hadoop cluster. (see below)

Several vendors have now added new tools to parse multi-structured data as well as MapReduce transforms to run data cleansing and data integration ELT processing on large volumes of multi-structured data on Hadoop.

Extending data governance and data management platforms to exploit scalable Big Data platforms not only allows customers to get more out of existing investment but also improves confidence in Big Data among data scientists and business analysts who want to undertake analysis of that data to produce valuable new insight. Of course developers could do this themselves using Hadoop HDFS APIs. However, given that many data cleansing and integration tools also provide support for metadata lineage, it means that data scientists and business analysts working in a Big Data environment have access to metadata to allow them to see how Big Data has been cleaned and transformed en route to making that data available for exploratory analysis. This kind of capability just breeds confidence in the use of data. In addition there is nothing to stop Data Scientists making use of these tools by exploiting pre-built components and templates. Having workflow based data capture, preparation and even analytical tools available in a Big Data environment also improves productivity when programming skills are lacking.

I shall be discussing Big Data governance in more detail at my up and coming Big Data Multi-Platform Analytics class in London on October 17-18. Please register here if you want to attend.

Exploratory analytics is at the heart of most Big Data projects. It involves loading data from multi-structured sources into ‘sandboxes’ for exploration and investigative analysis, often by skilled data scientists, with the intent on producing new insights.

Data being loaded into these sandboxes may include structured, modeled data from existing OLTP systems, data warehouses and master data management systems as well as un-modeled data from internal and external data sources. This could include customer interaction data, web log data, social network interactions, sensor data, documents, rich media content and more.

Because Big Data is often un-modeled, the schema of this data is not known – it is schema-less. The argument there is that data scientists need freedom to explore the data in any way they like to acquire it, prepare it, analyse it and visualize it.

Yet Big Data needs data governance to protect sensitive information made available in these exploratory environments. Given that governance imposes control and accountability, how then can the two seemingly opposing forces of freedom and control co-exist in a Big Data analytical environment? Is this not a tug of war? Is Big Data Governance a straight jacket for Data Scientists? How can the freedom to conduct exploratory analysis proceed if data being analysed is subjected to governance policies and processes that need to be applied?

The answer is obvious. Confidence. Big Data Governance is not about restricting data scientists from doing exploratory analysis. It is about extending the reach of data management and data governance technologies from the traditional structured data world into a Big Data environment to:

Protect sensitive data brought into this environment

Control who has access to Big Data files, tables and sandboxes

Monitor data scientist and application activities in conformance with regulatory and legislative obligations

Provide business metadata in a Big Data environment to data scientists and business analysts

Provide the ability to assign new business data definitions and descriptions to newly discovered insights produced by data scientists before moving it into traditional data warehouses and data marts

Provide metadata lineage in a Big Data environment to data scientists and business analysts

Handle Big Data lifecycle management

All of this is about raising the bar in quality and confidence. Having high quality data before exploratory analysis takes place improves confidence in the data and also in the results. Therefore Big Data Governance is not an opposing force to free form exploratory analytics. On the contrary, it fuels confidence in Big Data analytical environments

I shall be discussing Big Data governance in more detail at my up and coming Big Data Multi-Platform Analytics class in London on October 17-18. Please register here if you want to attend.

Back in September last year, I presented at BiG Data London (the largest Big Data interest group in Europe), looking at multi-platform Big data analytics. In that session I looked at stand-alone platforms for analytical workloads and asked the question “Is it going to stay that way?’. Is it that Hadoop analytical workloads would remain separate from graph analytics and from complex analysis of structured data in addition to the traditional data warehouse? The answer in my eyes was of course a resounding no. What I observed was that integration was occurring across those platforms to create a single analytical ecosystem with enterprise data management moving data into and between platforms and in addition we had to hide the complexity from the users. One way of doing that was Data Virtualisation through connectivity to different types of relational and NoSQL data sources. However for me Data Virtualisation is not enough if it doesn’t also come with optimisation and to be fair to vendors like Cirro, Composite, Denodo and others they have been adding optimisation to their products for some time. The point about this is that if you want to comnect to a mix of NoSQL DBMSs, Hadoop and Analytical RDBMSs as well as Data Warehouses, On-Line Transaction Processing Systems and other data then you very quickly start to need the ability to know where the data is in underlying systems. A global catalog is needed so that software knows that it needs to invoke underlying MapReduce jobs to get at Data in Hadoop HDFS, or that it accesses it directly by bypassing MapReduce via Impala for example. The point here though is that the user is still shielded from multiple underlying data sources and just issues SQL – a relational interface.

However the next option I looked at was the Relational DBMS itself. Step by step over the years Relational DBMSs have added functionality to fend off XML DBMSs e.g. IBM DB2 can store native XML in the database – no need to shred it and stitch it back together. Oracle and all other RDBMSs added user defined functions to fend off Object DBMSs and so put pay to them also. Graph databases are an emerging NoSQL DBMS and now IBM DB2 and SAP HANA have added graph stores built into the Relational DBMS. I asked the question then “Is Relational going to Consume Hadoop?” It got one heck of a reaction opening up discussion in the break. Well let’s look further.

Teradata acquired Aster Data and now integrates with Hadoop via SQL-H to run analytics there or to bring that data into the Teradata Aster Big Analytics Appliance and analyse it there using SQL MapReduce functions. Hadoop vendors are adding SQL functionality. Hive was the first initiative. Since then we have had Hadapt, then Coudera announced Impala and just recently HortonWorks announced Stinger to dramatically speed up HIve.

Then came a new surge from the RDBMS vendors with the rise and rise of external table functions. Microsoft announced Polybase last November at its PASS conference. A really good article by Andrew Brust covers how Microsoft SQL Server 2012 Parallel Data Warehouse (PDW) will use Polybase to get directly at HDFS data in Microsoft’s HDInsight Hadoop distribution (its port of Hortonworks) bypassing MapReduce. So SQL queries come into PDW and it accesses Hadoop Data in HDInsight using Polybase (which will be released in stages).

And now today, EMC GreenPlum announced Pivotal HD which goes the whole hog and pushes the GreenPlum Relational DBMS Engine right into Hadoop directly on top of the HDFS file system and so its new PivotalHD Hadoop distribution announcement has a relational engine bolted right into it. MapReduce development can still occur and external table functions in GreenPlum database can invoke MapReduce jobs in Hadoop all inside the Pivotal HD Cluster ultimately with a GreenPlum MPP node instance lining up on every Hadoop data node. In short, the GreenPlum DBMS will use Hadoop HDFS as a data store. That of course presents a challenge to cater for all file formats that can be stored in HDFS but it is clear that it is not only EMC GreenPlum that is taking on the challenge.

In this case, as in the case of Microsoft (and the other major RDBMS vendors) the trend is obvious. We are seeing a new generation of optimizer – a cross platform optimizer that can figure out the best place to run the analytical query workload or part of an analytical query so that it exploits the RDBMS engine and/or the Hadoop cluster, or both to the max . That optimizer is going inside the RDBMS as far as I can see and the question will be what part of the execution plan runs in the relational engine accessing tabular data and what part of the execution plan pushes down analytics right into HDFS. It is already evident that MapReduce is getting bypassed. Impala does it, Polybase is going to do it and clearly EMC GreenPlum, IBM and Oracle will also while leaving to option to still run MapReduce jobs. We are in transition as the Big Data world collides with the traditional and RDBMSs push right down onto every Hadoop node to get parallel data movement between every Hadoop node and every MPP RDBMS node as queries execute. RDBMSs are going to pull HDFS data in parallel off the data nodes in a Hadoop cluster and/or push down query operators into Hadoop to exploit the full power of the Hadoop cluster and so push data back into the relational engine. It seems that Relational is going as close to Hadoop data as possible. Meanwhile everybody is beating a trail to self-service BI vendor’s doors like Tableau to simplify access to the whole platform. In this new set-up, data movement is going to have to be lightning speed! So where does ETL go?….Well…into the cluster with everything else right? Move it in parallel and exploit the power to the max.

Whether it be data virtualization with a SQL or web service interface or MPP RDBMS, one thing is clear. Just running connectors between a RDDBMS and Hadoop (or any other NoSQL DBMS for that matter) is not where this ends. These technologies are going nose to nose with tight integration in a whole new massively parallel engine and a next generation cross platform new optimizer to go with it. It’s not all about relational. Big Data has brought a whole new generation of technology and spawned a major phase in transition. It’s all going to be hidden from the user as relational opens its mouth and samples the best this new generation of technology can feed it. Who said relational was dead.

Lack of basic understanding of core master and transaction data and where it is used in their business. This plus insufficient understanding of how core business processes work and how these processes cut across multiple departments and applications mean that people don’t understand the impact of bad or inconsistent data. IT in particular often have very limited understanding of business processes and therefore cannot see how lack of information management impacts business performance. Therefore they find it difficult to create a business case. For these reasons they do not see how data problems can impact:

Operational costs – data defects increase cost of operating

Speed of process execution

Data defects slow down process execution

This can impact on customers if customers are waiting on a product

Can also make it difficult to scale the business without imposing high operational costs

Decision making

Data defects impact on timeliness of decisions or the ability to make a decision at all

Data defects impact on accuracy of decisions

Data defects may mean event patterns that require action are not seen

Reporting

Data defects cause reconciliation problems

Inability to see across the value chain

Inability to report on financial performance

Risk management

Data defects can increase risk if risk cannot be identified due to lack of availability of information or lack of accuracy

Compliance

Data security breaches cause brand damage and can lose customers,

Regulatory reporting errors that result in penalties

Damage to share price that impacts executive pay

For example, Customer master data is needed in Sales, marketing, service, finance, and distribution. It is not just a CRM problem. In addition, IT need to learn more about business to help to understand how to build a business case. I say ‘Follow your processes from end to end and see how they currently work”. This teaches you where data governance and MDM can make a difference and the business impact it can have.

4. What are the factors that cause failure or delays in a MDM initiative?

Lack of basic understanding of:

How core business processes work

Core master data entities used in their business

Where the master data is located (i.e. what operational and BI systems)

Who currently maintains it

How it flows across applications

How it is synchronised

Impact on business performance from poor master data

Inability to recognize that master data is not owned by an application and should not be associated with just one application

Last week in London I spoke at the IRM Data Warehousing and Business Intelligence conference on a variety of topics. One of these was Big Data which I looked at in the context of analytical processing. There is no question the hype around this topic is reaching fever pitch so I thought I would try to put some order on it.

First, I am sure like many other authors in this space I need to define Big Data in the context of analytical processing to make it clear what we are talking about. Big Data is a marketing term and not the best of terms at that. A new reader in this market may well assume that this is purely about data volumes. Actually this is about being able to solve business problems that we could no solve before. Big data can and more often than not include a variety of ‘weird’ data types. In that sense big data can be structured or poly-structured (where poly in this context means many). The former would include high volume transaction data such as call data records in telcos, retail transaction data and pharmaceutical drug test data. Poly-structured data is more difficult to process and includes semi-structured data like XML and HTML and unstructured data like text, image, rich media etc. Graph data is also a candidate.

From the experiences I have had in working in this area to date, I would say that web data, social network data and sensor data are emerging as very popular types of data in big data analytical projects. Web data includes web logs and e-commerce logs such as those generated by on-line gaming and on-line advertising data. Social network data would include twitter data, blogs etc. These are examples of interaction data which is something that has grown significantly over recent years. Sensor data is machine generated data from ’An Internet of Things’. It is something we have only seen the beginning of in my opinion as much of it remains un-captured. RFIDs are probably the most written about of sensors. However these days we have sensors to measure temperature, light, movement, vibration, location, airflow, liquid flow, pressure and much more. There is no doubt that sensor data is on the increase and in my opinion it is something that will dwarf pretty well everything in terms of volume. Telcos, utilities, manufacturing, insurance, airlines, oil and gas, pharmaceuticals, cities, logistics, facilities management and retail…..they are all jumping on the opportunity to use of sensor data to ‘switch on the lights’ in parts of the business where they have had no visibility before. Sensor data is massive but we don’t want it all – it is the variance we are interested in. Many Big Data analytical applications are/will emerge on the back of sensor data. These include analytical applications for use in:

Supply chain optimisation

Energy optimisation via sustainability analytics

Asset management

Location based advertising

Grid health monitoring

Fraud

Smart metering

Traffic optimisation

Etc., etc.

Text as I already mentioned is also a prime candidate for big data analytical processing. Sentiment analysis, case management, competitor analysis are just a few examples of a popular types of analysis on textual data. Data sources like Twitter are obvious candidates but tweet stream data suffers from data quality problems that still have to be handled even in a big data environment. How many times do you see spelling mistakes in tweets for example.

There is a lot going on that is of interest to business in big data but while all of it offers potential return on investment, it is also increasing complexity. New types of data are being captured from internal and external data sources, there is an increasing requirement for faster data capture, more complex types of analysis are now in demand and new algorithms and tools are appearing to help us do this.

There are several reasons why big data is attractive to business. Perhaps for the first time, entire data sets can now be analysed and not just subsets. This is now a feasible option whereas it was not before. So it is making enterprise think can we go down a level of detail? Is it worth it? Well to many it most certainly is. Even a 1% improvement brought about by analysing much more detailed data is significant for many large enterprises and well worth doing. Also schema variant data can now be analysed for the first time which could add a lot of valuable insight to that offered up by traditional BI systems. Think of an insurance company for example. Any insurer whose business primarily comes from a broker network will receive much of its data in non-standard document format. Only a small percentage of that data finds its way into underwriting transaction processing systems while much of the valuable insight is left in the documents. Being able to analyse all of the data in these documents could offer up far more business value that could improve risk management and loss ratios.

At the same time there are inhibitors to big data analysis. These include finding skilled people and a real lack of understanding around when to use Hadoop versus when to use Analytical RDBMS versus NoSQL DBMS. On the skills front there is no question that the developers involved in Big Data projects are absolutely NOT your traditional DW/BI developers. Big Data developers are primarily programmers – not a skill often seen in a BI team. Java programmers are aften seen at big data meet ups. In addition, the analysis is primarily batch oriented with map / reduce programs being run and chained together using scripting languages like Pig Latin and JAQL (if you use the Hadoop stack that is)

Challenges with Big Data

There is no question that big data offers up challenges. These include challenges in the areas of:

Big data capture

Big data transformation and integration

Big data storage – where do you put it and what are the options?

Loading big data

Analysing big data

Over this and my next few blogs we will look at these challenges. Looking at the first one on big data capture, the issues are latency and scalability. Latency needs change data capture, micro batches etc. However I think it is fair to say that if Hadoop is chosen as the analytical platform, it is not geared up for very low latency. Very low latency would lean towards stream processing as a big data technology which I will address in another blog. Scaling data integration to handle Big Data can be tackled in a number of ways You can use DI software that implements ELT processing i.e. exploits the parallel processing power of an underlying MPP based analytical database. You can make use of data integration software that has been rewritten to exploit multi-core parallelism (e.g. Pervasive DataRush). Alternatively you can use data integration accelerators like Syncsort DMExpress or exploit Hadoop Map/Reduce from within data integration jobs e.g. Pentaho Data Integrator. Or you could use specialist data integration software like Scribe log aggregation software (originally written by Facebook). Also vendors like Informatica have also announced a new HParser to help with data in a Hadoop environment.

With respect to storing data, there are a number of storage options for analysing Big Data. They range from:

Let’s dispel a myth right away. The idea that relational database technology cannot be used as a DBMS option for big data analytical processing is plain nonsense. Any analyst opinion claiming that should be ignored. Teradata, ExaSol, ParAccel, HP Vertica, IBM Netezza are all classic examples of analytical RDBMSs that can scale to handle big data applications with some of these vendors having customers in the Petabyte club. Improvements such as solid state disk, columnar data, in-database analytics and in-memory processing have all helped Analytical RDBMSs scale to higher heights. So it is an option for a big data analytical project perhaps more so with structured data.

Hadoop is an analytical big data storage option that has often been associated more with poly-structured data. Text is a common candidate. NoSQL databases like Neo4J or InfiniteGraph graph databases are candidates particularly in the area of Social Network influencer analysis. So it depends on what you are analysing.

Going back to Hadoop, the stack includes HDFS - a distributed file system that partitions large files across multiple machines for high-throughput access to application data. It allows us to exploit thousands of servers for massively parallel processing which can be rented on a public cloud if needs be. To exploit the power of Hadoop, developers code programs using a programming framework known as Map/Reduce. These programs run in batch to perform analysis and exploit the power of thousands of servers in a shared nothing architecture. Execution is done in two stages. Map and Reduce. Mapping refers to the process of breaking a large file into manageable chunks that can be processed in parallel. Reduce then processes the data to produce results. Hadoop Map/Reduce is therefore NOT a good match where:

Low latency is critical for accessing data

Processing a small subset of the data within a large data set

Real-time processing of data that must be immediately processed

Also Hadoop is not normally a RDBMS competitor either. On the contrary it expands the opportunity to work with a broader range of content and so Big Data analytical processing conducted on Hadoop distributions is often upstream from traditional DW/BI systems. The insight derived from that processing then often finds its way into a DW/BI system. There are a number of Hadoop distributions out there including Cloudera, EMC GreenPlum HD (a resell of MapR), Hortonworks, IBM InfoSphere BigInsights, MapR and Oracle Big Data Appliance. Hadoop is still an immature space with vendors like ZettaSet bolstering the management of this kind of environment. To appeal to the SQL developer community Hive was created with a SQL like query language. In addition Mahout supports a lot of analytics than can be used in Map/Reduce programs. It is an exciting space but by no means a panacea. Vendors such as IBM, Informatica, Radoop, Pervasive (TurboRush for Hive and DataRush for Map/Reduce, Hadapt, Syncsort (DMExpress for Hadoop Acceleration), Oracle, and many others are all trying to gain competitive advantage by adding value to it. Some enhancements appeal more to Map/Reduce developers (e.g. Teradata, IBM Netezza, HP Vertica connectors to Cloudera) and some to SQL developers (e.g. Teradata AsterData SQL Map/Reduce, Hive). One thing is sure – both need to be accommodated.

Next time around I’ll discuss analysing big data in more detail. Look out for that and if you need help on a Big Data strategy feel free to contact me

While there is a lot of hype around collaborative BI today, this concept is not new. First attempts at introducing collaborative functionality into BI environments happened as far back as eight years ago or more when vendors of Corporate Performance Management (CPM) products in particular added collaborative functionality to their products to allow users to annotate scorecards and comment on performance measures. In addition being able to email links to report also appeared. While a lot was marketed about these kinds of features, they only achieved limited success. A key reason for this in my opinion was because collaborative functionality was ‘baked into’ BI and CPM tools. In other words vendors brought collaboration to BI. However the MySpace and Facebook generation taught us a different approach. What these collaborative and social networking environments showed was that it is much more natural to publish content to collaborative workspaces to elicit feedback and to share that content with others who are interested in it.

In the context of BI, this turned the first generation collaborative BI tools on their head and said rather than take collaboration to BI it is far more effective to take BI to collaborative platform where the range of collaborative tools available offers a lot more power. Lyzasoft was a pioneer of this new generation of modern social and collaborative BI technologies. Also new releases of more widely adopted BI platform products are now being integrated with mainstream collaborative platforms such as Microsoft SharePoint and IBM Lotus Connections. Even cloud based collaboration technologies from vendors like Google are getting in on the act. Mobile BI technology is taking this further by allowing people to collaborate on BI from mobile devices.

However, I (and others) would argue that we are still seeing only one side of the coin here with respect to BI and collaboration. That side is the classic approach of formal integration of data from multiple sources into a data warehouse, the producing of intelligence and the publishing of BI artefacts (dashboards, reports, etc.) into social and collaborative environments where it can be shared with others, rated and collaborated upon for joint decision making. But what about innovation, what about when innovative business users want to experiment, get some data and ‘play’ with it in a sandbox environment to figure out what business insight might be useful or to figure out what new metrics that would be useful to the business? Do we not need collaboration here also? Another probing question is whether this innovation should be ‘upstream’ from a data warehouse? In other words let them play with the data until there is consensus as to what is useful and then feed this into a more classic approach of data integration, storage, analysis and sharing. I am comforted by the fact that it is not only me asking this question. Others like my good friend Barry Devlin are also talking about the use of collaboration and sharing of business insight produced in an innovative environment. I know Barry will be speaking about this here. The point is that in my opinion ( and it is only opinion admittedly) there is a place for collaborative and social BI in an innovative sandbox environment where BI is not yet ‘hardened’. We need this capability in many industries. I have come across it in both retail banking and in manufacturing for example. However, what must be controlled is the release of newly formed innovation into production. This is where governance comes in. Data governance would allow newly created metrics to be published in a business glossary to be used by multiple BI tools in a hardened production environment for example. Also at this point, new data sources may be declared to a more formal production DW/BI environment for data acquisition. Therefore we have two sides to collaborative BI, the innovation cycle which needs to share ‘experimental’ information and elicit feedback from other as well as the more formal production BI/DW environment where well polished business insight is shared across the enterprise for people to use and act on. One feeds the other, typically because innovators also need to collaborate with IT to take the innovation and move it into the mainstream environment.

Let me know what you are doing with social and collaborative BI. I would be grateful for your comments.

Today, my former employer (many moons ago – I left 17 years ago!) Teradata announced it is to acquire Aster Data effectively bolstering its position in the BIG DATA marketplace (See the announcement here) . Aster Data has made its mark in the big data market with its well crafted integration of Hadoop Map/Reduce and the SQL query language allowing SQL developers to execute massively parallel Map/Reduce analytical functions on the Aster Data platform and leverage the power of Hadoop. Aster Data also has a IDE tool to make it easier for developers unfamiliar with Map/Reduce to generate Hadoop M/R applications (e.g. analytic functions) that can then be automatically deployed in a Aster Data nCluster database and invoked via SQL. Furthermore AsterData nCluster also supports both row AND column based storage. Of course Teradata already has a relationship with Hadoop vendor Cloudera to serve up data from Teradata to Map/Reduce applications running on Cloudera’s CDH platform. It is also working on interfacing Teradata with Cloudera’s Sqoop (part of the Cloudera Enterprise offering) to move data into HDFS via the Teradata Hadoop Connector .

Adding Aster Data to the mix means that Teradata now can potentially integrate with Hadoop deployments in both directions rather than one-way as with the Cloudera partnership. For example, organisations could access Hadoop (Cloudera’s CDH and other other offerings) from from analytical queries running SQL M/R on Aster Data nCluster or indeed I would assume in the future on the Teradata DBMS itself.

There is no question this is a good move for Teradata. It gives them columnar capability and also Aster Data has a rich library of pre-built map/reduce analytic functions to speed up M/R development and these functions can be invoked from SQL M/R on nCluster. I would have to assume that Teradata would also want to open the Aster Data IDE and the M/R functions up to Teradata developers to deploy these M/R functions inside of Teradata. That is a no brainer in my opinion. You would also have to say that this takes Teradata in-database analytics to a new level of depth opening up the door for more sophisticated analytic applications. While the Teradata/SAS partnership is a successful one but adding Aster Data will potentially give Teradata much more power in the in-database analytics area. This is an area that really matters in big data environments. It will also give them more to compete with against IBM whose acquisition of Netezza (particularly with its TwinFin iClass appliance) and SPSS has given IBM much more competitive muscle recently especially against Teradata and Oracle (Exadata). Besides competing with IBM, Aster Data will also give Teradata much more to compete with against Oracle’s Exadata. We will have to wait to see what HP does with Vertica.

In addition, with the Tsumani of sensor data coming over the horizon this acquisition will help Teradata move into the world of Sensor Data Analytics which, by the way, is a battle still to be fought for(see my blog on this from last year). Aster Data will help Teradata in accommodating the onslaught of data being generated by organisations increasing the instrumentation of their business operations with sensor networks. However in this space, adding a CEP vendor technology to the Teradata portfolio would be a good move as sensor data event correlations need to be acted upon BEFORE that event data is stored in a data warehouse. CEP, Active DW and SQL/MR. Hmm… now that is a combination worth having. It will be interesting to see what is offered across the family of Teradata Appliances and if Teradata decide to rollout nCluster on any of them. I would also think that Teradata will make sure they carefully protect the Aster Data customer base if they bring the DBMS technologies together gradually.

My only question is who will acquire Cloudera who have partnerships with BI platform vendors and other appliance vendors. That acquisition would pull the rug from under a lot of players.

My take on this is that configuration management and workload management are critical to large scale EDWs making it to the Cloud. I am thinking more about PRIVATE cloud than public cloud at this point as their are already an abundance of DW/BI PaaS and SaaS offerings on public clouds outside the firewall. However on-premise private cloud deployment is still very young. There is no doubt that small data marts are already moving but large scale EDW are not. Why not? I believe the reason is simple – no one has any experience how to configure virtual resources to make production EDW (Data integration, DBMS and BI platform) maximise it’s use of underlying hardware.

As a member of the Boulder BI Brain Trust (BBBT), I sat in on a session given by Pervasive Software Chief Technology Officer (CTO) and Executive Vice President Mike Hoskins last week. The session started out covering Pervasive financial performance of $47.2 million revenue (Fiscal 2010) with 38 consecutive quarters of profitability before getting into the technology itself. Headquartered in Austin, Pervasive offer their PSQL embedded database, a data an application exchange (Pervasive Business Xchange) as well as their Pervasive Data integrator and Pervasive Data Quality products which can connect to a wide range of data sources using their Pervasive Universal Connect suite of connectors. They also offer a number of data solutions. Pervasive has has success in embedding its technology in ISV offerings and in SaaS solutions on the Cloud. However, what caught my eye in what was a very good session was their new scalable data integration engine DataRush.

Having just got back from the MicroStrategy World Conference in beautiful Cannes, I thought I would cover what was announced this week at the event. CEO Michael Saylor launched MicroStrategy Mobile for iPhone, iPad and Blackberry describing it as “the most significant launch in MicroStrategy history”. In his opening keynote he talked about mobile as “the 5th major wave of computing” starting with mainframes, then mini-computers, then personal computers, desktop internet and now mobile internet. Their vision here is a good one – BI all the time, everywhere and for everyone. Mobile device access to BI has been around for a while in some offerings but I was impressed with the work MicroStrategy have put into the mobile user interface on touch sensitive ‘gesture’ devices like Apple iPhones and iPads. They have taken advantage of the full set of Apple gestures and also added BI specific gestures including Drill down and Page By. They have also released an Objective C software development kit (SDK) for MicroStrategy Mobile. This allows developers to build custom widgets and embed them in the MicroStrategy Mobile application or embed MicroStrategy Mobile in your own application.

As I research more and more into the world of Cloud-based BI, it is becoming pretty evident where we are headed. In my opinion we are moving down the road to an iTunes model for BI. Yesterday I spent some time with Actuate in London looking at their BIRT On-Demand platform as a service (PaaS) solution (which is very easy to use). It was only a matter of minutes before I was up and running with a Mashboard. A few weeks back in New Orleans I used Dundas Dashboard to quickly build a dashboard from pre-built components. Similarly Microsoft SQL Server 2010 has the ability in ReportBuilder 3.0 to quickly build up a library of components that can be dragged and dropped into a report. The more I use these products to understand their capabilities the more I see a similarity to what is happening in the information management world. Looking at cloud-based data integration solutions like Boomi, Informatica and SnapLogic for example, you can see that what these vendors are trying to do is to create a development platform for Information as a Service. In other words you build data integration jobs and then make the results available on subscription such that companies can subscribe to information which is supplied to them by cloud based data integration workflows running on the net. So now apply this idea to the BI produced on cloud-based PaaS solutions. Once your reports and dashboards are built then the next thing people are going to want to do is to publish these artifacts as on-demand BI services assuming the intelligence is of business value to others.

Last week I was in Munich to present at the annual TDWI (The Data Warehouse Institute) conference on “Business Intelligence and Data Management in a Cloud Computing Environment”. It was a very well attended conference with some great speakers and sessions. My session focused on the following:

What is Cloud Computing and why use it as a deployment option?

Why Cloud BI? – What are the requirements for a public cloud or externally hosted BI system?

Just over a week ago I spent a day at SensorExpo in Chicago to present on Complex Event Processing (CEP) discussing how CEP engines, Predictive Analytics, business rules can be used to analyse sensor emitted event data in-motion to facilitate business optimisation. This was a very busy conference. I estimated at least 2000-3000 people on the exhibition floor with maybe 400 on the conference. I found around 100 vendors with all kinds of sensor devices on show exhibiting their products and services. To my surprise however I had only heard of 2 of the vendors. IBM and Texas Instruments. The floor was heaving with people looking to instrument their business operations to measure everything from movement, temperature, energy consumption, stress, heat, fluid volumes, pipeline flows and RFIDs. There were analog devices and digital devices. When taking to the vendors the big common denominator was that they are all trying to collect the data from sensor networks and RFIDs to analyse it. Yet other than IBM there was not a single BI vendor in sight. Not even a single complex event processing (CEP) vendor in sight. I was shocked because this market is clearly booming. What was even more surprising was that I could not find an IT professional anywhere. 99.9% of all delegates and speakers were engineers.

Just over a week ago I was invited to attend an analyst briefing at the Microsoft BI conference in New Orleans that was running alongside the Microsoft TechEd conference. The conference itself was very well attended with several thousand delegates. Several things were on show at this event including SharePoint 2010, SQL Server 2008 R2, Office 2010, PowerPivot, PerformancePoint services 2010. Also on show was SQL Server Data Warehousing Edition (also known as the Madison project) – the massively parallel edition of SQLServer that will be shipped later this year.

Having read David Linthicum’s blog on MDM and Cloud computing about the impact on data of applications moving off premise, I have to say that I couldn’t agree more with him. What David is pointing out is that the fracturing of data caused by the adoption of cloud computing raises the importance of MDM in keeping disparate data synchronised.

First of all let me apologise to all my readers for not having blogged for a while. This year has turned out to be manic – crazily busy. I also confess to having become addicted to twitter – a “tweetaholic” where I have been micro-blogging. If you want to see my tweets you can do so here. So I return to my blog the day after “Arthur Day” – 250 years ago yesterday a certain young Irishman named Arthur Guinness started a beer making company in Dublin.

My topic today is that exciting topic of Enterprise Data Governance. From research I did in a survey it was clear that many companies at the end of 2008 were not fully underway with Enterprise Data Governance in terms of getting their data under control and into a trusted, well managed state. Many had more to do in terms of organising themselves together with getting the necessary technology and processes in place to do this. But the question I get asked the most is how do you know how well or poorly your company is governing its data? There are a few questions you can ask that will give you a good inkling.

Most of you by now have probably found it difficult to avoid the hype around Software as a Service (SaaS). For many of us today this is already a reality in our business. You only have to look at the huge uptake of Saleforce.com by small and medium size businesses (SMBs) to realize that there is certainly a place for this in many companies. With respect to the BI market there is no doubt that there is also considerable growth in BI as a Service (BIaaS)and it would appear that many BI vendors are eagerly setting out the stall on the net to jump into this market of hosted BI Services. Given that many BI products are already service enabled and also that many BI vendors have BI portal products there is no doubt that they are technically ready. They are however missing one thing – data, your data. Either they point their tools at you databases and access them over the net or they will need a ready supply of data from any BIaaS subscriber. If you already use SalesForce.com you can bet that all BI vendors entering this market will do so with an ETL adapter for SalesForce to get at your data on your behalf.

Recently I have noticed a lot of companies raising priority on Enterprise Information Management projects as it becomes clear that many companies realise that they do not have information under control. This includes both structured and unstructured data. It is clear that for unstructured data, enterprise content management, content authoring, tagging, search and taxonomy are all key. Also master data management has a part to play in deciding facets that can be used in taxonomies. Structured data needs to consider data naming and definitions, data modelling, data discovery, data mapping, data profiling, data cleansing, data integration, provisioning and data quality monitoring. There is a lot of work out there to do! The big question is how do you tie the two together. The secret is in MDM! What is your strategy for content authoring, content storage, content tagging, taxonomy, search, business glossaries, data integration ……Get in touch and let me know

Increasingly as I speak with my clients and CIOs I meet at various speaking engagements around UK and Europe, it is becoming clear that data federation can potentially offer rapid value to IT budget constrained companies that just can’t find the resources for another major database project. It may be that if you work in IT you are seeing increasing demand from business users for more reports requiring BI and non-BI information to help them manage their area of business responsibility in a more dynamic way.

In a recent paper by Jeremy Hope on Transforming Performance Management he states that “Most organizations want to adapt rapidly to changing events, but find that they are handicapped because of fixed budgets and poor forecasts. Adaptive organizations are able to respond more rapidly by switching resources dynamically to meet new threats and opportunities… “.

In order to do this there is no doubt that companies have to deliver information more rapidly irrespective of whether or not the data is in a BI system. What many cannot afford is going through a formal time consuming process of data warehouse change to bring in all information necessary. Data federation is capable of sourcing data from several places one of which would of course have to be a data warehouse or data mart. But with increasing amounts of valuable information residing outside BI systems (especially on the internet) it seems that data federation has a role to play as a delivery platform rather than having to change data models, ETL processes and creating another cube or relational data mart. My expectation is that over the coming year we will see an increase in demand for data federation software.

Several BI tools have been shipping data federation software as part of the BI tool bundle for some time to allow quick delivery of integrated information. Note that we are not talking about virtual data warehouses – on the contrary, data federation software rapidly integrates historical DW data with other data sources (operational data, internet feeds etc.) to deliver higher value information more rapidly. Therefore I see data federation software as complementary to BI systems. If you would like more information on data federation please see this article on how it works. There is also a white paper on Maximizing Business Value from Data Virtualization that talks about patterns and best practices to get the most out of this software. If you have already purchased data federation software or are considering it, let me know.

Having just got back from a presentation tour in mainland Europe, it seems that in the countries I have spoken that Data Governance came out with a thumbs down vote among CIOs present in my sessions. In particular in Belgium it would appear to be not on their radar. Having probed for feedback into what exactly is high priority among CIOs attending my sessions it is almost as if raw ’survival’ is taking hold. In other words, any IT project linked to business survival in this tough economic climate will get attention but not much else. Customer retention, self- service, cost reduction/containment and growth are high on the list. One CIO explained to me that his company’s priority over the next 12 months was to allow customers to customise the products and services they offer much more in the future. Therefore in addition to offering their own product lines on the web, they would be integrating their e-procurement with many back end e-suppliers so they can buy ‘on-demand’ to match what a customer wants. This means they want to allow customers to create their own custom ‘package’ before buying on-line and will stretch beyond their own products to stand out from the crowd. It seems to me that data governance and data quality to some extent are taking a back seat in favour of investment that will keep the revenue rolling in. I would be interested in your feedback. Is data governance a high priority in your organisation?

Blogging occasionally offers up opportunity to open up a good debate. So here goes! Over the last several years I have observed data models in many different BI systems across different vertical industries where so called ‘generic’ fact tables have been designed with only one ‘generic’ measure. The objective of the design approach is that the measure in the fact table is supposed to hold ANY metric. Often this ‘generic’ measure column is then accompanied by some kind of type field to indicate what the measure actually is (what it means) and some other attribute(s) to indicate the level(s) in various dimension hierarchies that the measure stored is associated with. This helps indicate the additive nature of the metric. Also if it is a monetary measure it may have a currency field and if it is a unit measure it may have a field to explain the kind of units used, e.g. centimeters, litres, cubic metres etc. The stated advantage of these kinds of approaches is flexibility. Adding new measures becomes easy to accommodate as no change to the design is necessary. It is a perfectly good argument and certainly appears widely practiced by designers.

When it comes navigating such designs to develop queries (or even generate them) it is often the case that IT professionals developing reports for the business can figure out how to use retrieve the information required (although even IT developers can struggle). However when it comes to business users developing their own ad hoc queries and reports I frequently see these users really struggling to navigate the ‘flexible’ design first trying to figure out what measures mean, if the measure(s) is/are additive and whatnot. More often than not I see this resulting in real frustration among business users who end up getting aggregations in reports wrong and then start to lose faith in their new BI system. Of course IT steps in to rescue the situation by building more snapshot tables, more materialised views etc. burying generic ‘complexity’ to make the job easier for the user. More often than not these users also often resort to switching back to Excel to hold data outside any data mart so that they can look at data in a form they understand.

Have you seen this in your organisation? If so I want your feedback. Is it the case that so called ’flexible’ design techniques are rope for end users to hang themselves with? My question is this. What is the best way that you see to design fact tables so that business users become productive and can easily understand how to get at the data when building their own reports? I am not so sure that being so generic is of business value. Sure it is flexible. But is it usable? What use is flexible design if a business user cannot understand it and make use of all that valuable data? Is it not better to have multiple metric attributes in a fact table (if multiple metrics are needed) with each attribute name saying what the measure actually is? Let’s have your input!

It’s pretty clear that Business Intelligence is becoming even more strategic in this tough economic climate as companies seek to have greater insight to help them stay profitable and keep making money. By default almost it would seem that trusted intelligence has got to be there for BI to be reliable and support confident decision making. Therefore on the eve of the Data Governance conference which starts next week (Feb 2-5) in London I thought I would put in a plug for Enterprise Data Governance. This is a fast growing topic and requires organisational change, processes and technology to manage it successfully. Structured data needs to be formally defined and named (a shared business vocabulary) and BI systems (data models, BI tool business views, reports etc) need to be changed to reflect these commonly understood data definitions.

In addition, data in disparate systems needs to be identified and mapped to the common definitions so as to gather knowledge on how to turn disparate data into trusted data. At this point you can then make sure that data integration is done in a fashion that creates trusted data for use in BI systems. Overall it is critical that you build modular data integration jobs (e.g. 1 data integration job per dimension) so that you can re-use data integration ‘services’ if needs be to guarantee trusted data every time. These days of course, data quality is built in to many data integration jobs and it is important to strive for this. When you have achieved this your data integration tool should provide a valuable set of metadata to support lineage should a user need to track data back to where it came from. Once your trusted data is available then you can monitor it to keep quality high and take action if quality deteriorates over time.

I’d be interested in your thoughts on enterprise data governance. Let me know what you are doing in this space.

So here we are at the end of 2008. Another amazing year in the area of BI. In the 18 years I have been specialising in this IT sector, I suppose you might wonder what else is there to do here. After all BI is a mature market, indeed many of my clients today could be classified as very mature users of business intelligence (BI). Some are on their second, third or even fourth generation of BI system implementations, with data warehouses and data marts, web enabled ad hoc query, reporting, and OLAP tools already deployed and well established across their user base.

Yet there is plenty more that can be done. A key question is how can companies with mature BI set-ups strengthen and evolve their existing investment? I still see lots of opportunity as we head into 2009.

There are several areas emerging to enhance and build on existing BI investment that can offer more value to a business. These include:

Integration of BI with Information Management infrastructure for trusted data

Integration of BI with Performance Management software to roll up metrics into higher level KPIs

Capturing of additional insight from unstructured content (e.g. customer emails) and from external information on the internet (e.g. about market intelligence and about what people are saying about your products and services)

Event driven and on-demand Operational BI – a hugely exciting area for 2009 to continuously monitor operations and deliver right -time BI in the context of process activities for continuous business optimisation

Integration of Enterprise Search with BI to open up broader access to intelligence via a search interface

Exploitation of appliances for lower total cost of ownership on specific workloads

Integration of BI with social software and collaboration workspaces to facilitate sharing and exploitation of knowledge in a collaborative environment. This is particularly relevant for those of you wishing to exploit products IBM Lotus Quickr quickplaces as well as Microsoft SharePoint workspaces. Integrating BI here will become increasingly important in 2009

These are just some of the ideas I will be discussing in the coming year and in a tough economic climate BI has never been more important. I wish all of you the best for the holiday season.

Well now – has the inevitable happened? Larry Ellison, Oracle’s CEO finally recognised the value of the hardware/software combination in the business analytics marketplace. This market, pioneered over 20 years ago by Teradata and now thriving with many other players including Netezza, ParAccel, Datupia, ExaSol, Vertica and others, has now become a target for Oracle who have clearly had enough of competitors eating away at the Oracle database with lower TCO DBMS offerings optimised for analysis and reporting. With the recent acquisition of DatAllegro by Microsoft, IBM with its Balanced Warehouse and now Oracle entering the database machine market it certainly seems that the DW Appliance market is now becoming a hot competitive battleground.

The newly announced Oracle Exadata DW Appliance is jointly developed by Oracle and HP and will be sold directly by Oracle. The Exadata server runs the Oracle parallel server on Oracle Enterprise Linux. It has 8-HP Proliant DL360 G5 database servers, with

My question on this announcement is given that HP are jointly in on the Exadata product offering with Oracle, what does this mean for HP’s own DW Appliance offering – the HP NeoView Appliance? This is also a parallel DBMS product that competes with Oracle. I assume that with HP playing in both markets (its own DBMS product on its own hardware plus the hardware behind the Oracle Exadata offering) that it is seeking to maximising the revenue it can take by covering all bases. Time will tell. In my opinion it is clear that with so many vendors now in the DW Appliance market it is going to take a lot more than just TPC-H benchmarks to get a differential. It certainly means customers will have to look closely at performance claims. Everyone will claim they are the fastest which could easily result in prospective customers demanding more to distinguish one vendor from another. For this reason I believe that analytic application appliances have to happen (analytic application pre-installed on a DW Appliance). Vendors who go deep on vertical analytic application appliances could carve out a very lucrative business when you combine this with the attraction of low TCO DW Appliance offerings.

It seems that everywhere I look the number of vendors gearing up for Operational BI is just massing as if awaiting an onslaught on the market. Among them include HP, IBM, InforSense, Progress, Tibco, SL, ThinkAnalytics and many more. Key to this is event driven data integration, in-memory data, predictive analytics and rules engines. Vendors like SL have even released a data cache for in-memory analytics with its RTView product. Progress is also pushing with its Aparma product and long term pioneer ThinkAnalytics are also doing well. Clearly the giants are also moving to get all their pieces in play. IBM’s acquisition of German rules vendor iLOG looks like it could also be used in the world of on-demand and automated decisioning. Also SAP Business Objects Labs have released a prototype on event-driven BI and seem to be partnering with SPSS on predictive analytics.

It’s all heating up, yet in the UK it staggers me the number of companies that are not seeing the benefit of these emerging technologies. Some verticals would reap massive benefits by exploiting this technology. If you are interested in Operational BI drop me an email at info@intelligentbusiness.biz

Why is it every time I take a week off on holiday something major happens in the BI market? I am of course talking about Microsoft’s Announcement to acquire the data warehouse appliance vendor DatAllegro. The message on this is certainly obvious, the scaling of SQL Server. It’s a very interesting announcement. With the exception of Netezza who have done well here in the UK and in Europe, many of the DW appliance vendors have been struggling. Almost all of them have been chasing business in the same vertical industries that have high volumes of data (e.g. Telco, Retail, Financial Services). Now the one giant that people were wondering about in terms of parallel DBMS scalability, has moved. However we are clearly going to have to wait to see how well SQL Server scales in this kind of setup. Even prior to this announcement the myth that SQL Server would not scale beyond 1 Terabyte has long been proved incorrect however. I have certainly had clients running single instance SQL Server BI system databases at around 13-15 terabytes for a number of years now. No doubt there are larger configurations than that out there. However this announcement will certainly lift Microsoft customer confidence that Redmond are serious about offering a scalable SQL Server option on non-proprietary hardware that starts to compete with the parallel DBMS offerings of IBM, Oracle and Teradata as well as other DW Appliance offerings. Time will tell what will happen and how competitive this will be. SQL Server Integration Services (the Microsoft ETL tool that ships with SQL Server) will also have to scale however to get larger data volumes on to a parallel SQL Server. So we have to see what Microsoft will do here. This announcement also offers up an interesting option for Kalido who recently announced support to generate for Microsoft SQL Server.

Yesterday Kalido announced the availability of Kalido Universal Information Director for Microsoft SQL Server which will speed up development of BI systems based on the Microsoft Platform. By using Universal Director (part of the Kalido Information Engine suite of technologies) it is possible to capture the metadata and automate the generation and population of Microsoft SQL Server® Analysis Services (SSAS) cubes with data from the Kalido Dynamic Information Warehouse Automatic generation of SSIS jobs, SSAS cubes needed speeds up Microsoft BI system development and enables rapidly constructed SSAS cubes to then be accessed from Microsoft Office Excel, PerformancePoint Server and SharePoint. Also 3rd party BI tools can access this data. Effectively this is like Kalido acting as a wrapper for the Microsoft environment.

Note that Kalido is supports automatic generation of SAP’s Business Objects Universes and IBM’s Cognos platform. If Kalido keeps adding heterogeneous BI platform support it could put itself in a powerful position with respect to being able to leverage common metadata to integrate BI systems built on multiple heterogeneous technology platforms across different lines of business in large enterprises. I have yet to determine if Kalido technologies can be deployed on the Eclipse platform. If so this would be even more promising. Also this announcement of Microsoft support may well also be welcomed by small and medium size businesses (SMBs) who struggle to afford, find and retain skilled resources to build BI systems. Price point will be all important in the SMB market however especially as open source BI offerings have also penetrated this market.

Microsoft have been somewhat quiet in the enterprise data governance arena since their acquisition of MDM vendor Stratature in 2007. Since then they took that product off the market which clearly indicated that Stratature was a piece of a bigger puzzle they are building. Of course they already have their SQL Server Integration Services data integration tool (part of the SQL Server offering). Yesterday they added yet another piece to their solution by announcing their intent to acquire Zoomix a data quality vendor. So it seems that Redmond is clearly building and end-to-end enterprise data management offering which now includes data quality, data integration and master data management. We still need to see their solution for automated discovery and mapping (could come from Zoomix), metadata management (including business vocabularies and lineage), taxonomy generation and the ability to publish information services and use them from within SharePoint, Windows Workflow Foundation, BizTalk and Microsoft Dynamics.

As we head down the road towards real-time event-driven computing and move gradually away from batch processing it is clear that the number of events in some vertical industries is going to be enormous. Not surprising therefore that if we are looking to analyse this data before it reaches a data warehouse then in-memory data is going to be essential to scale up complex event processing (CEP) and business activity monitoring (BAM). You could argue that as solid state disk starts to replace spinning disks that it will all be in memory in the not too distant future. In my opinion we are going to start to see parallel query processing and scoring happening in parallel on in memory data. A good example of this is the strategic partnership emerging between Teradata and SAS which offers deployment of scoring models developed in SAS in the Teradata DBMS. This kind of functionality is the beginnings of massively parallel scoring on in memory data. Also other database vendors such as IBM and Oracle are investing in in-memory extensions to their database products.

If you are looking at scaling CEP or BAM on large volumes of events then it would seem that in-support for handling memory data is going to be high on your shopping list

For those of you looking at Operational BI a very common question is how will this use of intelligence impact the classic BI system architecture? To answer this question requires that we first define operational BI. This is the use of intelligence in every-day business operations. We are not talking about running reports that access an operational system here. What we are talking about is much more comprehensive than that. Two major used of BI in operations are:

Access to on-demand BI in the context of a process activity

Event-driven automatic analysis and decision making

The on-demand use of BI in business operations is about right-time intelligence. This is where a user chooses to perform a specific activity as part of an every-day operational business process and is presented with only the BI that is relevant to help the user perform the task more effectively. It is sometimes referred to as a BI in context™ and is the most precise use of BI. To make that happen typically requires the deployment of BI systems in a service oriented architecture (SOA). In that sense it requires companies to bring together IT professionals responsible for business process management and SOA with IT professionals responsible for BI. Most companies are not yet organised in this way and so will need to mobilise to make this happen. However the impact on the BI system is relatively minimal in that the latest releases of many modern BI platforms today are already service enabled™ i.e. they have web service interfaces to allow on-demand invocation of reports and queries from composite applications, processes and portals. So in this case the BI system per se is really being extended to plug into a SOA and the emphasis is more on re-organising IT to make on-demand BI happen.

However event-driven automatic analysis is a different ball game. In this case we really are stepping outside the classic™ BI system architecture in the sense that this aspect of operational BI is about being able to detect operational events, automatically analyse data and take action well BEFORE the data reaches a BI system. This brings together new technologies such as data streaming, highly parallel in-memory data and rules engines with familiar technologies such as event driven data integration and scoring models built by power users using data mining tools. So if you have event-driven data integration tools already (most ETL tools support this capability) and you already have power users developing predictive models using data mining tools then there is nothing new in these areas. However it is the data streaming, in-memory database (both IBM and Oracle for example have extended their DBMS products by integrating them with in-memory databases) and rules engines that are new. Implementing operational BI in this way is about automating business optimization in every day operations. This kind of automation is operational performance management. So some BI system related technologies are used in this new form of operational performance management while other required technologies are brand new to us. Event-driven business monitoring therefore is a different architecture to a classic BI system because of the need to analyze stream data in memory before it ever reaches a data warehouse. Once analysis has taken place, decisions made and action has been taken it is only then than that event data may well find its way into a BI system via classic data integration processing. So we have to think differently about this. In that sense it would not surprise me if this form of real-time business event monitoring was initiated as a new project with a different architecture outside of a BI team.

In many cases it may well be that this kind of sponsored project is part of a business process management or an RFID initiative rather than being part of a BI program. If this happens it should in my opinion be quickly flagged up and brought under the management of the IT team responsible for BI and Performance Management otherwise the worlds of operational performance management and strategic performance management will never meet. This would be a very disappointing outcome. Frankly, they have to meet because business users need performance management to encompass both the classic strategic performance management (strategy management, scorecards, dashboards, budgeting, planning etc.) AND operational performance management. They need to see what is happening over time in their area of responsibility alongside what is happening right now. Therefore once again we need to keep our eye on the ball regarding project sponsorship, ownership and architecture so that the true benefits of on-demand and event driven operational BI becomes a major contributor to business performance and allows people to leverage intelligence to help them optimize every-day business operations.

The explosion of web feeds on the internet and now also within the enterprise rightly warrants the question “Is it possible to analyse feed data in real time?”. The answer has got to be a resounding yes and opens up new ways to use BI that go beyond traditional BI systems. Analysing feed data in real time may be of significant business value and pushes corporations into the new world of Complex Event Processing (CEP). New extensions to SQL are being developed by various vendors to allow querying of live event data. An example here would be Streambase.

Increasingly we are also seeing Enterprise Feed Servers (e.g. Newsgator) starting to be deployed inside the enterprise to aggregate RSS and ATOM feeds coming from internal and external sites so as to deliver relevant aggregated feed content to different people in different roles across the enterprise. Imagine if you could analyze feeds from news providers like Bloomberg and Reuters or if you could analyze messages as they flow over an enterprise service bus in a SOA. Sounds BAM doesn’t it? But is is more than that. Imagine RFIDs and the event clouds they generate. Looking for patterns in event clouds is what CEP is about. All this analysis happens way before data gets to any kind of data warehouse or data mart. This is a really high value return on investment opportunity to push BI systems into the world of ‘always on analysis’. The world of events and feeds are upon us. Time to get busy don’t you think? If you are doing anything in this area in your own organisation please share it with us, it would be great to hear from you.

The onslaught of virtualisation seems to be gaining momentum with vendors offering virtualisation of servers, PCs, applications and also the systems management tools to manage virtual server farms in a virtual data centre underpinned by grid computing. So the question is how does this apply to BI? BI tools are just effectively applications after all running on application servers and now as we deploy BI services it is clear that these can be deployed on virtual servers in a virtual data centre which could combine in-house BI services and BI as a Service (BIaaS) offerings.

So expect to see the BI virtualisation begin to appear. Already we have seen BI vendors like SAS stepping into the market with a VMWare offering and Vertica coining the term ‘cloud based’ analytics. I think this is the beginning of the flood gates opening on BI virtualisation. The challenge for most of us is how to best configure our BI systems to exploit virtualisation and how to manage a virtual BI environment to optimise performance, availability, reliability and scalability. If you are doing anything in your shop in this area please let us know. There is much to learn in this rapidly advancing area.

Following my recent series of articles on Web 2.0 and BI on the B-Eye-Network, it is exciting to see BI products pushing their way into the BI market embracing web 2.0. Several vendors such as Information Builders and SAP (Business Objects)< have released BI Mashup tools recently but the one vendor that caught my eye is Antivia, a small Australian company based in Sydney that is really embracing communities and social networking with their product.

As web 2.0 edges towards becoming mainstream in the BI market (probably 2009 timeframe) expect to see more adoption of richer interaction in user interfaces and more collaboration capability to share BI for joint decision making. If you are already using Web 2.0 in BI applications and tools in your organisation please share your experiences on what works and what doesn’t.

In 2008, some 18 years since I moved into the BI sector of the industry, you would think that this space would be exhausted. Yet here we are seeing more announcements from relatively ‘new generation’ vendors aggressively going after this market. I refer of course to the announcement yesterday between LogiXML and Vertica on integration between LogiXML’s Web-based BI platform with the Vertica Analytic Database. It seems clear that columnar data warehouse appliance vendors are climbing the popularity charts with Vertica, ParAccel and SybaseIQ all gaining ground. LogiXML is clearly also on a growth path with its web based BI platform. With consolidation still happening in the BI market, it would be no surprise to me if these kinds of partnerships ultimately go further in future but for now, companies should not assume that just because the software giants (IBM, Microsoft, Oracle and SAP) have made their moves into the BI market that the game is over. This kind of announcement really offers an attractive and competitive alternative.

Over the last year or so I have noticed a real surge in companies using or evaluating products to rapidly develop dashboards. In my consulting activities in this area, I have been amazed at the reliance on one particular primary source of data that users have latched onto in dashboard development. This is of course Excel data. While there is nothing unusual about Excel data, it is the trait that users almost ‘prefer’ to access Excel data (because they are familiar with Excel) that I find concerning. Many users seem to either just have these spreadsheets or are downloading data into Excel from a range of data sources including operational systems, flat files (perhaps supplied from some other department or system), data marts and data warehouses. Once data is ‘in the wild’ like this, it takes on a life of its own with people manipulating it and sending it to others via email attachment. It’s like data management just got left behind.

While Excel can never be ignored in any organisation, the increasing demand to analyse Excel data raises questions as to whether or not that data can be trusted especially if you have been sent this data in your email. It brings back the issue that has plagued many companies for years when it comes to Excel. Do you know where the data in the spreadsheet came from? How do you know you have the right version of the spreadsheet? Are spreadsheets managed? Is there other server side data sources that can be accessed from the dashboard tool that would give you more confidence in trusting the data?

With Office Excel 2007 increasing the maximum limit on the number of rows in a spreadsheet from 64000 to 1 million, my concern is that the increasing demand for dashboards will raise the likelihood of million row “spreadmarts” being created all over the organisation by business users rather than pointing dashboards at server side data in a BI system. Only time will tell however policy is clearly needed around spreadsheet management and dashboard development if we are to remain in control of the data and have confidence in it. I would be interested in hearing from many of you out there who are encountering this problem and what you are doing to manage it.

Following on from my blog at the beginning of the year entitled “Predictions for 2008″ which predicted that Complex Event Processing (CEP) would be a hot topic this year, IBM has moved already in to play in this market by their acquisition today of AptSoft. With the growth in SOA and business process management as well as some verticals such as manufacturing, logistics, retail and pharmaceuticals all investing in RFIDs we are set for an explosion of events on a scale we have never seen before in commercial business. Because of potentially significant business benefits in bottom line savings and revenue from being more responsive to events, CEP should not be ignored. This is a major emerging marketplace that offers automated business optimization and actions in a much more timely manner. There is no way business will be able to change applications at the pace required to keep up with demand to monitor business events over the next few years, There has to be a better way of doing this. That way is CEP – a declarative approach that involves no programming. CEP is the next generation beyond BAM. This announcement may well see IBM’s competitors move in on this market to compete considering the growth potential. While the backlog of IT systems requiring SOA integration is growing, companies should educate themselves in this field as they may well benefit from looking at CEP as a way to become more responsive to business events rather than building everything themselves. There is no doubt that the era of “Right Time” BI has begun.

Microsoft’s announcement that it is intending on acquiring FAST last week is certainly the first major shot fired in the enterprise search battle that could ensue among the software giants. However it is not so much enterprise search that interests me in this context but the investment that a vendor like FAST had made in pursuing the BI market. There is no doubt that Search and BI are going to be hot in 2008. Expect much more activity in this area in 2008 as these two exciting areas of technology increasingly collide. I will be researching the state of this emerging Search and BI market and will feed back to the B-Eye-Network later in 2H08 on it. Meanwhile it will be interesting to watch vendors like Autonomy , Endeca and even Google and their relationships with the software giants to see where this leads.

Happy New Year to all of you! It’s that time of year again when predictions are made so I thought I would throw my thoughts into the ring for debate. So here goes:

Consolidation in the BI market will continue now that we have seen the software giants make their moves in 2007. Oracle bought Hyperion, SAP bought Business Objects and IBM announced acquisition of Cognos. In order to compete with this, other vendors will consolidate to try to offer and alternative. So expect to see more mergers in 2008

The cost of the BI platform will continue to drop amid pressure from the software giants (Microsoft in particular), and open source alternatives (e.g. Pentaho, Jaspersoft, Talend et al). The money to be made is in Performance Management and Data Management.

Both Performance Management and Data Management technologies will separate from BI platforms (if they haven’t done so already) and become suites of tools in their own right

The growth in the size of the data management market is set to continue as companies try to standardise of a suite of tools for enterprise data management (an enterprise data management platform) which includes an end-user business vocabulary tool, data modelling tool, data discovery and mapping tool, data quality profiling, data cleansing, data integration (consolidation, federation and synchronisation). This data management platform will be used for data replication, data warehousing, data migration, master data management, data synchronisation and on-demand data management services published in a service registry and available on an enterprise service bus (ESB) in a service oriented architecture (SOA)

Complex Event Processing (CEP) will become mainstream in 2008 as companies try to analyse data and the business impact of events well before that data arrives in any kind of data warehouse or data mart. This is also known as business activity monitoring (BAM) except we are going to see monitoring of complex events (on the lookout for several events happening before triggering action) and not just single ones

2008 will be the year of massive growth in memory exploitation. We will see parallel query execution continue to run across multiple shared nothing nodes in MPP systems with multiple processors, and multiple disks (as is the case today in many parallel relational DBMSs). However the difference here is that we will see this happening against in-memory data on a massively parallel scale in 2008 and beyond. With the volumes of data about to climb higher, and demand for CEP on the increase, we need to access data in memory to respond more rapidly and keep performance optimal. Massively parallel memory is therefore inevitable and will arrive on the scene this year whether that memory be in a single cluster server or deployed over a grid in a virtual memory configuration

Performance Management is set to grow with BAM, process management, scorecards, dashboards, budgeting and planning and Business Intelligence all being integrated into a component performance management suite (enterprise performance management platform). Performance Management platforms will sit on top of BI platforms but will also integrate with other enterprise infrastructure software such as business process management, portals, enterprise content management systems and live collaboration tools.

Web 2.0 collaboration will push its way into Performance Management. In particular, socially networked performance management will start to appear so that end users can tag metrics, graphs and reports in order to organise BI and PM content. This user defined categorising of content via tagging is known as Folksonomies and is already heavily used on the public internet on sites like Facebook, MySpace, de.licio.us, Digg, Flickr, Jotspot etc. Now it is coming inside the enterprise and will be applied to BI and PM content as well as other unstructured content. This means that users can see other users’ profiles and the tags that they have used to annotate BI and PM content. From here it means that BI and PM ‘tag clouds’ will form showing popular BI and PM tems that lead to popular BI and PM content and metrics. Also by following BI and PM tags we will see the dynamic formation of BI social networks consisting of people within the enterprise that have similar interests in acting on BI to improve performance. People will also be able to share reports and collaborate with others (in real time – e.g. IM, threaded discussions etc.) in Web 2.0 collaborative workspaces. Wikis (group publishing) will also come together with BI so as to fuel rapidly forming BI and PM workspaces that will be of exceptional value to the business.

Search and BI are set to explode into popular use in 2008 as search opens the doors to mass access to BI content from a userbase that is not comfortable with BI tools

BI reports will be capable of being published in document management and records management systems

Master data management market size will continue to grow as companies try to wrestle with the complexity of their data and get it under control. Information and data architects will continue to be in demand with demand for such professionals potentially outstripping supply

Companies will have to invest again in data modelling and data modelling skills. There is no doubt that standards here are dropping, many companies still have no data modelling tools at all and also too few people are skilled in good data modelling practices.

Data management professionals will start to come together into integration competency centres so that people with skills in data cleansing, data integration, data modelling, master data management, enterprise content management, metadata management and ESB XSLT XML data translation are all co-located and can work together to solve the problem of enterprise data management

Metadata management will become a mission critical issue if it is not already. Business users need access to business metadata to understand what data means and where it came from. Holding this metadata in spreadsheets is no longer acceptable. It must be made available to both end users and shared across multiple technologies. 2008 will see companies looking to act to solve this problem.

Well that’s all I have for now. Let me know your thoughts. I would be most grateful for your comments on any of this. Best wishes for a happy and prosperous New Year!

Last month both Informatica and IBM, both long regarded as among the leaders in the data integration market, made further announcements to their products in attempts to keep their noses in front of the others in this market. Informatica announced their 8.5 release of PowerCenter and PowerExchange while IBM announced further extensions to their Information Server suite of data management tools. The Informatica 8.5 announcement includes the following:

Power Exchange Real-time change data capture

Integration of data quality services with SAP operational applications for on-demand data quality as you use the SAP applications (currently this is for name and address data only)

An overhaul of the Power Center Metadata Manager to provide search, filtering and personalisation capabilities. This also includes the ability for users to annotate metadata

Re-entrant data services and parallelised data quality for more scalability (this adds to the grid computing and the push down optimisation support added in release 8)

A Data Quality Assistant to allow Data Stewards to participate in data integration workflows so as to review and edit poor quality data records

Web based data quality reports and dashboards

Pre-built Data Migration tool on top of the Informatica platform to address this kind of data integration problem

Involving data stewards in the data quality process through the new Data Quality Assistant and the Pre-built Data Migration tool certainly stand out as differentiators. The latter of these is certainly the beginning of a trend among data management vendors in that it introduces the first of potentially multiple patterns on top of the Informatica data management platform.

Not to be undone, IBM responded with the following enhancements to Information Server:

A new look WebSphere Business Glossary

A new product WebSphere Business Glossary Anywhere to access business metadata from your mobile device

ParAccel’s new Massively Parallel Analytic Database runs on commonity hardware (first available release is Sun but other releases will follow on Dell, HP and IBM hardware) and comes in two flavours. These are:

Interestingly this product works on columns during its parallel query processing rather than rows which raises some observations. For a start, in many of the DW reviews that I have conducted over the years I have seen the well known practice of “Mini Dimensions” introduced to isolate popular columns in a large dimension (e.g. Customer) into their own separate Mini-dimension table. This is a performance ‘trick’ that can often speed up joins between large dimensions and large fact tables whereby if popular columns in the large dimension are selected in a query to qualify metrics in the fact table, then the join occurs between the mini dimension and the fact table instead of the much larger ‘real’ dimension and the fact table. As a result, join processing is faster.

The only problem with mini dimensions is that they complicate the key structures of fact tables, all BI tool business views (E.g. SAS Information Maps, MS Report Models, Business Objects Universes etc.) need to know about them in order to generate the right join SQL and tracking history across changes to columns in the main dimension and mini dimensions can be complex. The column approach to parallel query processing taken by ParAccel may well negate the need for mini dimensions as a performance tuning mechanism in many star schemas thereby simplifying design. If you are in the process of selecting a DW Appliance product it would certainly be worth investigating this point and worth a benchmark test.

In addition SQL advanced analytic aggregate functions are all column based. Once again therefore it would be worth benchmarking this in ParAccel Vs other DW Appliance products as there may well be a boost in performance here too for parallel processing at the column level. I might add that I have NOT personally benchmarked this but simply looking at the way this product is architected it would certainly warrant a look for specific analytic applications. Benchmarking is recommended however.

The ParAccel Analytic Database software is available now. ParAccel offers two licensing options

$1,000 per gigabyte for all-in-memory systems beginning at 100GB,
or $40,000 per node plus $10,000 per terabyte for disk-based systems beginning at 5 nodes.
Subscription licensing is also available starting as low as $5,000 per month.

Today IBM finally moved to plug the hole in its software product portfolio by announcing the acquisition of Cognos for $5bn USD http://www-03.ibm.com/press/us/en/pressrelease/22572.wss. This acquisition has been on the cards for a long time with several analysts including myself wondering how long it would be before we saw Big Blue move on Cognos (see my previous blog http://www.b-eye-network.co.uk/blogs/mt/mt.cgi?__mode=view&_type=entry&id=6976&blog_id=1) . This is almost a perfect fit for IBM given that it needed a BI vendor to provide tooling for reporting and analysis that had not much in the way of market share in the data integration space. The reason is that IBM is already the market leader in the data integration space with Information Server and does not want the ‘baggage’ of dealing with another product in that area. Cognos has never been a major player in data integration with its Data Manager product.

So you can expect IBM to immediately want to integrate Information Server with the Cognos product line. I also would expect that the WebSphere Metadata Server (bundled with Information Server) will be integrated with Cognos Framework Manager so that Cognos metadata ends up in the Metadata Server itself at some future release point. IBM also has had several joint development initiatives with Cognos and so there is a lot of integration already.

The only question mark will be in the area of performance management where Cognos has had a strong position for some time. IBM’s own Performance Management product IBM ActiveInsight will now have to come together with the Cognos Performance Management products. If this does not happen then the future of ActiveInsight as a product may well be in jeopardy. Time will tell.

So, the independent BI vendors are falling like nine pins as the software giants finally move in on the market. With Hyperion already falling to Oracle and Business Objects to SAP, it leaves only one large independent BI vendor left in the market – SAS. Of course SAS is privately owned and so will continue on that way until such time as its CEO (Dr Jim Goodnight) decides otherwise. MicroStrategy also remains independent also along with other smaller BI vendors. Expect further mergers and acquisitions to follow as the rest of the market consolidates to compete with the giants.

Over the weekend SAP announced there intention of purchasing Business Objects for a massive 4.8bn Euro. http://www.sap.com/about/press/press.epx?pressid=8360 This signals the fall of the second major independent BI vendor this year with the other being Oracle’s acquisition of Hyperion. This is a good move for SAP who have needed to strengthen their BI offering for some time. This acquisition gives them an ETL tool with strong SAP integration, a service oriented BI platform that fits with NetWeaver and several performance management applications that are in need of deeper integration. Business Objects has only completed the acquisition of Cartesis this year and ALG last year while SAP also acquired OutLookSoft in the PM space last year. The dust has obviously got to settle but there is no doubt that BI is now becoming a giants game as the four software giants (SAP, Oracle, Microsoft and IBM) start to flex muscle in a battle over this evergreen market. Such an acquisition has been on the cards all year with speculation rife as to whether Business Objects or Cognos would get acquired first, especially after Oracle made their move.

Now that Oracle and SAP have declared their hand the question is what happens next? I would be staggered if IBM continue to sit by and watch. Surely they must move as they have a real hole in their product portfolio on BI tools. This announcement only leaves two large independent BI vendors in the market – namely Cognos and SAS. With the latter of these the largest of the BI independents and also privately owned it seems inevitable that the spotlight will shine on Cognos as potentially the next candidate with IBM or Microsoft as a suitor.

JasperSoft this week announced availability of its BI suite on the MySQL database. This is an interesting announcement in that it means an open source BI platform on an open source DBMS. There is no doubt that this announcement may well catch the eye of many a developer or BI savvy CIO. It could appeal in the mid-market where MySQL is widely used especially if companies have relatively low data volumes.

Looking at the some of the latest enhancements coming out of BI vendors it would seem that the vendors are certainly getting to grips with Web 2.0. In particular the power of Javascript and AJAX is opening up new ways to distribute intelligence and to keep tabs on key performance indicators and metrics. A few examples here are (in alphabetical order) Business Objects and Information Builders. Business Objects Labs have been pushing out tools like BI Desktop and BI Masher. BI Desktop starts the ball rolling on “BI Widgets” which can leverage Asynchronous JavaScript and XML (AJAX) to access BI services that in turn give access to metrics data. Those familiar with Microsoft Vista will understand the Vista Gadgets that are on the desktop. This concept is the same with components on the desktop that users can configure to fit their needs. In addition web reports are also now exploiting Javascript to asynchronously request additional information from non-BI services to allow enrichment of information on the reports. This means that BI Mashups that combine BI and non-BI content on a report are on the way.

Not to be upstaged, Information Builders are spicing up reports with their Active Reports capability whereby the use of Javascript once again allows a much richer report to be made available to the user whereby much more report functionality and processing can be done client side in the browser . In addition Information Builders have also made it possible for data to be included in Active Reports so that users can conveniently take reports on the road with them in disconnected mode and carry on analysing. This they refer to as Active Reports with Quick Data.

Both of these are examples of how vendors are exploiting the power of Javascript and AJAX in BI technologies. I expect a lot more of this in the next 12 months. We may even see BI frameworks appearing whereby users can rapidly “compose” BI applications by leveraging pre-built visual components available in BI frameworks. The only hint of caution here is that if you are looking at this please don’t assume all browsers are the same. There are differences across IE, Firefox, Opera and Safari when it comes to processing scripting languages. So what works on one may not on another. Also don’t assume interoperability between the visual components if the components are all built using different tools.

Just like on the web today where there are hundreds of AJAX frameworks are available in abundance for rapid development there is a strong likelihood that several BI frameworks may well appear inviting end users to leverage BI “composition” tools to rapidly assemble visual BI and non-BI components into custom built rich interactive analytic applications (mashups). If multiple BI frameworks do appear we may well get caught up in the buzz of new technology and trailblaze building BI visual components galore without concern for the bigger picture of how these components all fit together on a page, how they are uniquely identified, how you deal with security when it comes to access them. or how they are managed. If you are building a personal dashboard then it may be OK because you are dealing with one technology but what about building sharable dashboards for the enterprise . In my opinion it depends where the visual components are aggregated into the page and what does this. A portal does this on the server by aggregating portlets to construct a page before serving up the page to the browser. Portlet interoperability is either proprietary to the portal product or standard via JSR 286 (successor to JSR 168 which does not support portlet interoperability) standard portlet interoperability.

However if BI vendors select Javascript as the mechanism to assemble (aggregate) BI visual components on to a page by using Javascript on the client (i.e. in the browser) to do this then I see no standard for visual component interoperability on the web page or between desktop widgets – it would have to be proprietary.

If you’re company has multiple BI tools (which is often the norm especially in large enterprises) you may well find business analysts composing BI dashboards without IT even knowing about it. The problem is how to bring it all under the wing of the enterprise when a business user wants to mix and match BI and non-BI visual components that are assembled client side when there are no standards. The portal is certainly not dead as it can handle this. Also AJAX is not an alternative to portals – AJAX components can in fact be integrated into portals as AJAX portlets. For this reason, portal vendors are bundling industry standard (JSR 168/286) portlet containers for free with application servers and not just as part of their portal server products so that you turn AJAX BI components into portlets without the need for a portlet development and still allow rich user interface interactions with portlet interoperability. As far as stand-alone AJAX interoperability is concerned, my suggestion is to watch what happens at the openAJAX consortium. Interesting that apart from the four software giants (IBM, Microsoft, Oracle and SAP) not a single independent BI vendor is a member! Seems to me its about time they joined.

Over the last several months in customer assignments associated with deployment of BI web services and Data Integration web services I keep running into the same problem that everyone seems to be struggling with when it comes to implementing a Service Oriented Architecture.

That problem is one of multiple integration stacks. As I consult on a Europe wide basis I find this problem everywhere. Companies want to standardise on a single common set of infrastructure software which includes common BI infrastructure (gradual evolution to common BI platform) and common Business Integration infrastructure (portal, process management, ESB, ESR, single sign-on etc.). Yet as they buy packaged applications particularly from SAP and Oracle and upgrade to Office 2007 only to find SharePoint coming in though another door, they find themselves with multiple software infrastructure stacks in the enterprise and additional complexity they had not planned for. Plugging together multiple stacks was not on the original agenda. Do you recognise this problem?

It seems strange that the attraction of SOA is simplicity and flexibility and yet as you try to get closer to that you end up with more complexity. BI and data integration services in a SOA are much sought after but I found myself recently having to deal with questions like “Which enterprise server registry should we use for managing BI services?” and also “should we define our business processes on our IBM WebSphere BPM software or the SAP Netweaver BPM software or the Oracle Fusion one?”

There are many more of these kinds of questions that companies are struggling with at present. It seems that vendors are forcing additional infrastructure on enterprises rather than offering flexibility to run on the infrastructure of choice that the customer wants. Yet the business case to executives is that SOA is needed because it facilitates consolidation of IT infrastructure to reduce complexity and total cost of ownership. As one executive said to me recently “I am struggling to see the benefit”. I must admit that I found it difficult to disagree with him.

Yesterdays announcement by IBM on its web site that it has signed an agreement to acquire DataMirror is more evidence that the software giants of the industry are starting to compete for more marketshare in the data management marketplace. In this announcement IBM stated that its intention is to use the DataMirror technology to strengthen its Information Server suite of data management tools particularly in the areas of real-time change data capture, heterogeneous data replication and synchronisation and also high availability.

The trend here is clear and that is that data management tools such as business vocabulary tools (aimed at business users), metadata discovery and mapping tools, data modelling, data profiling and cleansing as well as data integration are all converging into a common platform of tools that have been integrated on a common metadata repository. Why is this needed? The reason is of course that enterprises are pushing for a common toolset for any kind of data management whether it be data consildation for data warehousing and master data management, data federation for on-demand reporting, data replication or data synchronisation. It is a natural thing to expand the use of these tools beyond popular areas like data warehousing into other applicational uses across the enterprise. Even unstructured data integration is now possible. The world of XML data integration is also on the increase including RSS Feeds as a data source. All of this is demanded at lower and lower latency. Data and metadata services in a SOA is a trend but the challenge for most of use is to identify and prioritise business areas that need to exploit these services in order to improve data supply and data flow throughout the enterprise and between enterprises.

Also as companies increasingly invest in software as a service (SaaS) offerings it means that access to corporate data housed outside the enterprise is on the increase. Data in SaaS applications needs to be kept consistent with data inside the enterprise and brought back inside the enterprise to integrate with internal operational data if you have an internally hosted data warehouse.

Well folks, it had to happen. Earlier this evening Microsoft announced it had entered the MDM market with the acquisition of Stratature, a private MDM company based in Georgia USA. Although Microsoft is late into this market I am sure this is the opening play from our friends in Redmond Washington. I assume Microsoft will integrate Stratature with Microsoft SQL Server, Business Intelligence, PerformancePoint and Microsoft Office SharePoint Server to offer a co-ordinated MDM solution. I think for sure the master data definitions in any Microsoft MDM solution will end up in the SharePoint Business Data Catalog (BDC) as a way to get at master data from applications running on top of the SharePoint Server (e.g. Office). I have always felt that BDC on its own without common data definitions (which would be possible with MDM) means that users have to know the different data definitions for the same thing to clearly understand what they see via the portal or via Office applications. It was only a matter of time before Microsoft made a move into MDM. Now that it has happened we now have all four software giants (IBM, Microsoft, Oracle and SAP) in this market along with best of breed independents. So let the battle begin.

Having spent some time recently making use of new releases of collaboration tools from IBM and Microsoft it is pretty obvious that project management capability jumped head first into these products for use in collaborative environments. In a sense it is no surprise to see this as it is clearly a natural thing to want to collaborate with other team members on a project. Looking at IBMs new Lotus Connections for example, it is now possible to create activities and associate activities with goals. It is also possible to assign due dates to an activity and break it down further by creating TO DO lists. Furthermore I can then share these activities with others so that as a team you can collaborate to complete the activity and achieve the goal. Sound familiar? Well if you are working with Performance Management software I assume you should spot the similarity immediately.

You might say well so what? After all Microsoft can integrate MS Project with SharePoint (which has been possible for some time) to offer similar capability. BEA and indeed Vignette can also support so called “project” collaborative workspaces within their portal products and collaboration tools. For me as an analyst and consultant in Business Intelligence and Business Integration this synergy has been staring me in the face for ages and yet I see very few responsible for performance management in companies linking the two together. The strength of this synergy didn’t dawn on me until fairly recently when I was working with a client following what they do from CxO level down to operations staff on performance management. At executive level in my clients I often see the classic scorecards, dashboards, budgeting and planning that most mainstream BI vendors offer today typically based on balanced scorecard methodology. Budgeting and planning seem to still be stuck in the finance department in many cases. At operational performance management level I am starting to see more business activity monitoring (BAM) implementations typically associates with the Six Sigma methodology for process improvement. However I have many clients that now aspire to linked hierarchies of objectives, linked hierarchies of plans and linked hierarchies of targets so that from operation up to CxO level, everyone is contributing towards a common business strategy. I see this requirement everywhere these days. But what I also get asked about a lot is how companies can achieve enterprise wide budgeting and planning for all managers at all levels from CxO to operations. After all, why should budgeting and planning only belong at executive level? It should be enterprise wide. By this I mean the capability therefore for many to participate in this but that each manager’s plan and budget is tied to their bosses plans and budgets. This is critical if everyone is going to execute on a common business strategy. Not convinced? Then I recommend you read the recent book on Wikinomics – How Mass Collaboration Changes Everything by Don Tapscott and Anthony D Williams to realise why the wisdom of crowds is becoming so important.

So where is this discussion going? Well in a recent consulting assignment I was trying to understand planning from CxO down to operations. So I duly followed what was happening. When I started talking to operations staff and operations managers I asked where they kept their plans? In MS Project they said. If you look here you can see all our activities that need to be carried out to achieve our goals at operational level in our area. This is how we do our operations planning to implement initiatives to improve performance in our area. It turned out that there were lots of plans at operations level, all associated with project teams and all with activities and TO DO lists that need to be performed to achieve targets and objectives. Then the penny dropped in my head! If project management is now being integrated into collaborative workspaces then should it not be the case that Performance Management tools responsible for scorecards, objectives, targets, planning etc. need to integrate with project based Web 2.0 collaborative workspaces? The obvious answer is yes! Some Performance Management tools do support stand alone collaboration (proprietary capability) and can import plans from Microsoft Project today but it has to go a lot further. Performance Management and Collaboration tools have to deeply integrate. But open you minds further. A client CFO recently said to me that they needed cost based plans with costs to date assigned to plan activities to keep the plans on budget. I think this is an obvious ‘common sense’ requirement.

So, if you look closely at what we have you will see the following:

·Performance management software with scorecards, objectives, KPIs, targets, plans associated with initiatives that state what ACTIVITIES will be performed to help achieve objectives, budgets associates with these plans and dashboards

·Shared ACTIVITIES, objectives and due dates appearing in Web 2.0 releases of collaboration tools for project based collaborative workspaces

·ACTIVITIES in stand-alone project management tools

·Requirements for costs to be assigned to plans at ACTIVITY level

·ACTIVITY based costing as an analytic application offered by several BI vendors

Heaven help us if all these activities are scattered across all these technologies are all managed in different ways.

Hmmmm! Come on you vendors out there! Link them all together PLEASE so that we can get really get to the promise land of truly integrated performance management and enterprise wide execution of business strategy with everyone towing in the same direction. It’s got to happen. All I can say is watch this space for more mergers and acquisitions as it starts to come together!

Still today in this ‘modern’ world I wonder when common sense will prevail so that enterprises that demand real guidance from their BI investments will realise that one fundamental corporate barrier has to give way if true business benefit is to come from enterprise wide performance management. That barrier is the refusal of line of business to relinquish control over budget on data integration. At present many line of business BI projects cause re-invention of the wheel as the same data is integrated again and again for different stand-alone BI systems. If this continues, BI systems will continue to run the risk of inconsistency, duplicate development and overspend on data integration and BI tool technologies.

Big deal you might say. Business has to move and be agile and so can’t wait for enterprise wide agreement. Most businesses have done OK so far even with line of business BI systems. Well I cannot necessarily argue against all of that but one thing is clear, and that is that the explosion in interest in data integration is most certainly upon us.

Evidence of this was in abundance this week as I attended the jammed out CDI-MDM conference in London to do a joint talk on MDM with one of my clients and found a lot of business managers with a massive thirst for understanding data integration. So I duly networked to find out the reasons for the interest. What materialised was a consistent message coming back from person after person. We are here so that we can understand how to get a single view of our customers, our employees, our suppliers…etc. Thinking to myself that this must be Groundhog Day it took me back to distant memories of me on a stage with Bill Inmon some 18 years back talking about the very same subject that would be solved by Data Warehousing.

So why, I asked did they not think that data warehousing gave them this? The answer was crystal clear. “Who said we just needed a single version of the truth in analytical systems? What we need is integration of master data independent of applications so that this common data can be shared among operational systems as well as BI systems”. As one person put it “data integration should not be confined just to BI systems – it is an enterprise wide problem”. Another said “we have a business need to see across the enterprise to manage our operations”.

So I got the message. People are screaming for shared operational data accessible through shared master data services, But what about BI – are there problems there I probed? Yes came back the answers. Line of business data warehouses have partial duplication of data and data overlap. The result is that they still suffer from various levels of disagreement because projects develop their own data integration jobs which can cause reconciliation problems. Does any of this resonate with you? It certainly does with me – goodness knows I have lost count of how often I have seen this over the years.

As one CEO put it to me recently, “In the ’90s we bought into client/server and open systems. We took our mainframe applications, re-wrote them and distributed them across lots of servers and budget was devolved to each of my line of businesses to get on with building systems to meet their business needs. Line of business reigned supreme. While distributing the applications may have been OK, whose idea was it to distribute the data! What a crazy idea that was” !

Well I certainly have sympathy with that. If there is one area that has to become a CxO level backed ENTERPRISE investment it is surely enterprise data management and data integration. We need it in operations, we need it in BI and we need it consistent from the ground up. So MDM for me is unstoppable but it can’t stop there. If we integrate master data and hold it centrally then why on earth would you want to leave the transaction data where it is? Surely the MDM momentum will eventually drag transaction data consolidation kicking and screaming with it for shared operational use. I wonder….it might take a while but I certainly wouldn’t bet against it.

The movement in the Performance Management (PM) market of late seems to be heating up as mergers and acquisitions have seen some PM vendor consolidation as vendors start to beef up their PM product line. You can go back a few years to when all this started to happen with many vendors offering dashboards and scorecards on top of their BI platforms. Early PM products were somewhat lightweight starting out supporting just one thing e.g. scorecards or planning or budgeting. BI vendors focussed on strategic performance management and tried to battle it out with ERP vendors for CFO mindshare. Meanwhile operational performance management was ignored. Even today it is difficult to find PM products that allow you to declare the full business strategy to the software i.e. actually entering text for strategic business objectives, key initiatives, KPI owners, KPI targets, plans and activities, budgets and adding commentary on trends. Most were just associated with KPIs only and traffic lights.

Early PM products had a lot of limitations including separate databases from BI system data warehouses which meant that they could only show summary data and offer limited if any drill down into detail to understand why a traffic light has gone red. Key to most executives and particularly CFOs is planning and budgeting and so it is not surprising that we saw various acquisitions of Planning and Budgeting vendors to add to PM products that focussed on scorecards initially. Cognos acquisition of Adaytum comes to mind in this space.

More recently as process management has started to take hold as a key element of service oriented architectures (SOA), the areas of Business Activity Monitoring and Activity Based Costing have come under the spotlight as companies ask questions about their operational process efficiency and the cost of their operational activities which form the steps in their processes. Interest in Activity Based Costing has been steadily growing over the last few years to the extent now that the link between ABC, planning and budgeting is becoming clear. Companies want Performance Management software to thread together:

I have lost count how many clients of mine now want to see costed plans against budgets at the activity level. So it is no surprise to me that ABC is growing.

Looking at the market it may have been a visionary move by SAS when acquiring ABC Technologies to add to it’s PM offerings a few years back (or just plain good fortune). Also last year we saw Business Objects step into the Activity Based Costing market by acquiring ALG. Cognos acquired real-time Business Activity Monitoring (BAM) vendor Celequest fairly recently. Then Oracle stepped into the market by acquiring Hyperion to plug the PM gap in it’s product line and just last week Business Objects announced their intent to acquire French PM vendor Cartesis.

So what is happening here? The answer is that we are seeing the consolidation of various pieces of the PM puzzle to create Performance Management Platforms that will sit on top of and integrate with BI platforms (typically from the same vendor). By Performance Management Platform I mean a complete suite of integrated tools for managing the business. This PM Platform is also likely to be integrated with Office applications, portal products and collaborative workspaces for sharing performance.

It takes me back to when we had a reporting tool from vendor A, an OLAP tool from vendor B, a mining tool from vendor C and an ETL tool from vendor D. Now we have complete BI platforms from one vendor. A key point here is that PM platforms are not the same as BI platforms but sit on top of the BI platform. Secondly PM is growing and lucrative while BI platforms are coming down in price fuelled by Microsoft competitive pressure, Open Source and even the DW appliance market. Also ETL is separating from the BI platform and going enterprise wide under the guise of Enterprise Data Management.

There is still a lot more to come in the PM consolidation space. Also it has to grow beyond BI and go enterprise wide by integrating with business process management and SOA as well as collaboration tools and content management.

For example in many PM products today you cannot

·Associate a business objective in a scorecard with business process activities

·Associate a performance initiative defined in a scorecard with a plan and business process activities

·Link a process model to a scorecard strategy map

·Use performance management software to pin-point where in an business process to make improvements to optimise performance

Occasionally living in the UK has its disadvantages when it comes to state side technology announcements that happen during our evening time. Already some of my fellow US based experts on the B-Eye-Network have blogged the Oracle announcement on their purchase of Hyperion. This is living up to my blog at the beginning of the year on BI and Performance Management Trends for 2007 where I predicted that at least one of the major independent BI vendors would be acquired. Well, it’s only the beginning of March and we have 10 months still left in the year.

SAP have moved for a smaller BI vendor (Pilot) to beef up their analytic applications. Oracle have now gone further with its Hyperion acquisition. Oracle’s challenge is now to define a really clear roadmap on BI for their customer base and helping them shift from the older Discoverer and Reports products to the Enterprise Edition Siebel Analytics based tools and now Hyperion. Integration of BI tools will be key but Oracle have just got a lot more mindshare in Finance departments of major corporations around the world where Hyperion has a stronghold. With CFOs becoming increasingly powerful executives (at least in Europe) it could be a shrewd move by Oracle if they can deepen the integration of Hyperion with Oracle Financials, PeopleSoft and JD Edwards. While this is a battle for enterprise BI and Performance Management it seems to me that getting the support of the CFO is critical.

In my opinion IBM has to move soon if they are going to compete here and so I don’t think we’re done yet. As for Microsoft…..who knows.

If any of you have got teenage kids you’ll probably know what Myspace is or at least have heard them talking about it. Well try applying the concept inside a business and you begin to start to realise where we are heading here. Employees with their own workspaces and who are members of shared collaborative workspaces. The former is the employee personal workspace, the latter is where people working on projects together and people who are interested in the same information meet, share and collaborate.

Now add BI and Performance Management into the mix and you get “Intelligent Workspaces” be they collaborative or personal. In intelligent workspaces (some may call them performance workspaces) you can search for reports and metrics, get real-time alerts, subscribe to RSS feeds on BI Metrics and new reports that you am interested in, collaborate with others to share BI. Individuals have MyScorecard with MyObjectives, MyDashboards, MyKPIs, MyAlerts. BI cannot do this on its own. But yet this is what many companies are now demanding. So consider how you plan to integrate. Integrate CPM and BI with portals, collaboration tools and content management to give us role intelligent workspaces all in your browser or integrated into you office applications. If you are already there, please let me hear from you and please share your experiences. It would be great to hear from you. If not drop me a note and I’ll try to help out. As always you can get me on info@intelligentbusiness.biz

Amid this fast moving world of BI there are a lot of things on the agenda. BI integration with Portals, Operational BI, events, scorecards, text analytics, enterprise data management, master data management, SOA, process management….etc, etc. With so much on the ‘to do’ list it is not surprising that some of us might miss a little piece of the puzzle that doesn’t get a lot of air play. Any guesses? Well I would like to make a stand for rules engines. What about the business rules….we use them in deciding on business process behaviour, we use them to route messages on a service bus, we use them to decide how to present data on the screen and we use them for decision making. So where are they? For most of us they are locked away buried in our application logic, and for financial services perhaps in some batch decisioning systems. Yet if there is one thing we need to do it is to separate the rules from the systems that need them – especially in BI. The reason is because they change and in many cases quite often. If we introduce a rules server then we can define business rules that can test BI and help us make decisions.

As we enter the world of automated decisions to assist us in business automation this is one piece of technology we really need. But look beyond BI. You can use this SAME technology for dynamically changing process execution behaviour, for master data synchronisation, alerting and automated action taking and a whole host of other things. Some interesting vendors out there include Corticon, Fair Isaac Blaze Advisor, ILog, PegaSystems and SAS . In addition you can use also find open source rules engines such as the Drools Engine from JBoss. Combing rules engines with scoring models in a workflow can prove very powerful. While many of these vendors are focussed on process management their role in automated decision making is just as useful. If you are using rules in automated decision making to improve business performance and optimisation let me know at info@intelligentbusiness.biz.

The process of getting the EII tool to learn about a data source is called mapping. From this exposure to the underlying sources, you can use the EII tool to create a virtual schema, which will be used on data access. All EII applications will then see and use the single virtual schema.

The technical base of the data sources can be relational databases, packaged applications, file servers, web services and potentially numerous other data stores, operational and decision support, e.g. data warehouses. Indeed, this will be a major criterion of your tool selection. EII platforms differ somewhat in the data source types supported.

Generally, one of these EII instances per environment with multiple data sources is all that is necessary. If your users already use a portal to access information systems, the EII platform can become another of the underlying data stores accessed from the portal.

Applications built on the EII platform can support SQL, XQuery for XML, ODBC, JDBC, and other APIs to access the heterogeneous data sources hidden by virtual view(s) defined in the EII platform.

The implementation of EII in an environment is often the first time that an organization will be capable of providing users with access to an integrated view of both their operational and the post-operational environments. A big decision in your EII architecture will be if you want to expose this fact to the user and for what kind of use should you make such views available. Many organizations start out by leveraging their EII investment to rapidly produce operational and regulatory reports that require data from heterogeneous sources. More mature implementations of EII, in environments that have already been able to shield users from underlying architecture elsewhere in the data access environment, should successfully be able to continue this practice.

Earlier today, Cognos announced that they had acquired operational BI and performance management vendor Celequest in a non-disclosed deal. This is the first of the large independent BI vendors to step into the world of Business Activity Monitoring and Operational Performance Management. All I can say is that it is about time! This acquisition may well trigger similar acquisitions or announcements by other large BI vendors competing with Cognos to allow them to push into this fast growing market. The acquisition shows that Operational BI is beginning to become a mainstream issue pushing BI to the center of the enterprise plugged into core business operations and processes. This is signalling the shift from business intelligence to intelligent business

This blog entry is co-authored by Mike Ferguson and William McKnight (link) and is being cross-posted on our blogs.

Much discussion abounds about when not to use enterprise information integration (EII). This blog looks at some situations that are not particularly well-suited to the use of EII technology as a solution. Please note that these criteria should be taken only as guidelines.

Generally speaking EII is NOT suited for

·Complex transformations, fuzzy matching and integrating high volumes of data

·A replacement for data warehousing

·Business process management

·Frequent federated query processing with single federated queries integrating data across a large number of data sources. Performance may become an issue as more and more data sources are added to a data integration server. Several vendors do support caching in order to counteract this problem but nevertheless if you plan to integrate data from a wide range of data sources in a single query you would be well advised to benchmark products and compare results before making any purchase

·High volume transaction processing (insert, update and delete) is required to update virtual EII views of data in multiple underlying systems. This is because update processing via EII is still in its infancy and can be subject to product specific restrictions. Also concurrent access to EII servers may cause problems as workload management is missing from many tools. It is recommended that you investigate carefully how well EII vendors support transaction processing and if they support 2-phase commit distributed transaction processing.

·Transaction processing when integrity constraints across data sources are complex and could cause update processing to fail

·A complete solution to enterprise master data management. EII products can potentially provide a virtual view of integrated master data IF EII tools support global unique identifiers (and the mapping of disparate keys to the global one) and also the matching process to integrate data from multiple master data systems of entry does not require complex fuzzy matching. EII may be offered up as a component technology in an MDM solution but it will not provide everything needed for a full complete solution

This blog entry is co-authored by Mike Ferguson and William McKnight (link) and is being cross-posted on our blogs.

This is the first in a series of entries on Enterprise Information Integration (EII). EII is gaining traction for enabling data integration without the need for the physical instantiation of the integration. In other words, EII adds integrated reporting capabilities while minimizing impact on existing systems. We have been selectively adding EII to our data warehouse architectures. Today, we’ll look at those situations when EII makes sense for data integration requirements.

1.Connecting structured (as in data in a data warehouse) data in particular with unstructured data takes advantage of EII’s strength of leaving data in place that could dramatically increase overall storage requirements if duplicated

2.When immediate data change in response to the data view is desired (changing a copy of the data would not suffice)

3.When data transformation is relatively light or nonexistent and just getting the data together for integrated query is the biggest challenge

4.When the relatively worse query performance of EII query is acceptable (versus the obvious advantages of physically cohabitating all data for the query)

5.Some operational and regulatory reporting where the data needed is not completely integrated in one place

6.Integration of Performance Management software with multiple underlying line of business BI systems to allow a company to see performance management at the enterprise level (across line of business BI systems) using LOB metrics to calculate enterprise KPIs

EII is changing and some of the disadvantages and restrictions will lessen over time but it’s a chicken-and-egg situation. More end clients will need to incorporate small adaptations of EII in their decisioning environment to spur the growth. There may be opportunities and TCO propositions to federate your data acquisition requirements now using EII. Small sized, relatively smaller transformations, unstructured and interactive situations provide those opportunities.

Greetings all. I hope you have had a good holiday and I wish you all well for 2007. Given that it is this time of year I thought I would add some thoughts on trends in 2007 in the areas of business intelligence and performance management. So here goes!

Performance Management

1.Performance management (PM) will continue to grow rapidly with more and more companies realising that PM software needs to break free from just being deployed in finance departments and start to be deployed across many different areas of the business to monitor performance.

2.Companies will seek to integrate performance management scorecards and dashboards with enterprise portals, enterprise search and Office applications in order make it easier to get at personalised metrics, i.e. My KPIs, My Objectives, My Targets.

3.Many companies are realising that Performance Management is not just a BI problem. Business performance can also be improved by integrating and optimising business processes. Therefore the worlds of Performance Management and Business Process Management have to come together. BI vendors will start this by introducing / shipping formally defined management processes (e.g. the Planning process, the budgeting process etc.) that allow businesses to implement formal processes for managing their business by leveraging BI services. These ‘management’ processes will likely be modelled (diagrammed) using industry standard BPMN (Business Process Modelling Notation), stored in industry standard XPDL, run using BPEL (Business Process Execution Language) on industry standard BPEL process servers which are now being extended to support human tasks (BPEL4People). Alternatively they may run using document workflow such as Microsoft Windows Workflow Foundation or Adobe LiveCycle Workflow. BI vendors may add their own proprietary workflow capabilities to PM products but companies should insist upon industry standards for workflow being followed otherwise this will become a workflow quagmire of proprietary implementations that will not stitch together. This means that you may need business process management software from other vendors (e.g. BEA Systems, IDS Scheer, Intalio, IBM, Oracle, Microsoft, SAP, Tibco, WebMethods etc.) to implement management processes shipped by BI vendors. As long as the standards are followed by the BI vendors you should be able to pick and chose whatever standards compliant process management software you want.

4.Performance Management software vendors will start to partner with or acquire business activity monitoring (BAM) software vendors in order to integrate operational performance monitoring with strategic performance monitoring in order to offer both from a single management toolset.

5.Activity based costing (also known as Activity Based Management) will continue to push it’s way into the PM space, Both SAS (ABC Technologies) and Business Objects(ALG) have made acquisitions in this area and other PM vendors will have to follow to compete. It is obvious that ABC/ABM is needed as part of performance management because companies want to monitor strategic performance AND operational performance. Operational performance management includes monitoring the efficiency of processes with BAM and monitoring the cost of processes with ABC. Both BAM and ABC are needed to monitor the performance of operational business processes.

6.Performance Management Suites will emerge as a complete set of tools, applications and processes for managing the business at strategic, tactical and operational levels. These suites will include budgeting, planning, scorecards, alerting, on-demand recommendations, management rules, BAM, ABC/ABM, scorecards, dashboards and business processes. Performance Management Suites integrated with portals and processes will start to cause enterprise wide execution of business strategy (whereby everyone contributes) rather than just a few managers trying to execute the company strategy.

7.Further consolidation will occur in the market and we may see the acquisition of Performance Management vendors by business integration vendors looking to push further into the PM space and looking to bring together the worlds of PM and Business Process Management.

8.Packaged PM applications in vertical industries will start to be more prevalent and will be added to any packaged analytic applications to enrich solutions

Business Intelligence

1.Integration of BI with Enterprise portals and in particular (Microsoft SharePoint) will become increasingly popular. I am continually asked about this. Also recent presentations that I have given in this area have been heavily attended indicating that the need for personalised BI is upon us.

2.Predictive analytics will be back on the agenda. In particular, scoring and real-time decisioning (either event-driven or on-demand) is growing rapidly. Data mining is alive and well and will enjoy a new lease of life in real-time operational business optimisation when companies realise the power of deployed predictive models that can be leveraged on-demand as services.The return on investment is not difficult when you realise that data mining is not just about very few seriously smart business analysts building of mining models. The real payback comes when you deploy the built models in mainstream operations so that deployed predictive and scoring models can be being leveraged as web services on-demand. Predictive analytics needs to be combined with rules engines to get maximum value from the deployed predictive models built by power user business analysts.

3.Larger BI vendors that have no predictive analytics product may likely acquire smaller vendors that have been making hay in the world of operational performance management

4.BI in a Service Oriented Architecture will continue to grow as companies integrate BI into mainstream operational business processes

5.BI tools will continue to integrate with enterprise content management system (ECMS) to store reports in a managed environment as a managed document and/or as a managed record in a Records Management System. This allows BI to be combined with related unstructured content for greater knowledge and also BI compliance reports in particular to be formally managed.

6.The use of search engines on BI will grow to try and make it easier to find and access metrics and reports through a familiar easy to use interface

7.Text/Search analytics will be a hot growth area in 2007 to derive valuable intelligence from unstructured content.

8.The battleground for Packaged Analytic applications has already shifted into vertical industries. This will continue to deepen.

9.At least one major independent BI vendor may be acquired by a software giant (IBM, Oracle, Microsoft, SAP) in 2007. Speculation is already rife that IBM and Oracle are out ‘window shopping’!

10.The price of BI platforms is being pushed downward by several forces including the Microsoft SQL Server 2005 impact, the open source BI vendors (e.g. Jaspersoft, Pentaho), and the DW appliance vendors, e.g. Data Allegro, GreenPlum, HP, IBM BCU, Kognitio, Netezza and others. BI vendors with high price points and ‘rental pricing’ may have to re-think their pricing strategy to compete otherwise customers may find cheaper equally good alternatives.

11.Compliance is still a major issue and so reporting tools that leverage EII will be a differential to quickly producing the needed intelligence.

12.Companies that fail to get their data under control (data quality, common data names and definitions, common data integration tools etc.) will likely suffer at the hands of competitors that have solved this problem

Informatica’s (http://www.informatica.com) announcement today on it’s intent to acquire ItemField serves to re-enforce the trend that the two worlds of structured and unstructured data are coming together. While we have been using data integration infrastructure software for structured data for some time, most organisations have only tinkered with the world of using data integration software to integrate unstructured content with structured. Yet the amount of unstructured content that enterprises are struggling to manage continues to rapidly increase. While the use of content management systems to get control over unstructured content has increased significantly over the last few years many companies are struggling to get content into these systems just to manage it. In addition, dynamic integration of unstructured content to render into portals is also in demand.

It is often the case that companies want to associate unstructured content with structured data. An obvious example is in the area of records management whereby invoices, statements and other ‘operational documents’ need to be tied to customer data or supplier data. Insurance companies need to associate quote documents with customer or prospect data, banks need to associate account statements with customers etc. etc. All this bodes well for unstructured data integration specialist vendors such as Vamosa and Kapow who have been growing steadily and getting a lot of attention lately from IT professionals working with content management and portals technologies. These vendors may well start to get a lot of attention from structured data integration suitors over the next 12 months. The general trend here is towards enterprise data integration.

With more and more unstructured information not only on the public internet but also in the enterprise the need to manage this information and extract knowledge from it is increasingly in demand in commercial enterprises. There are lots of reasons why this is of interest including improving customer satisfaction by identifying customer concerns from customer conversations, equipment failures from field service technician notes, and even identifying fraud from bogus emails. In addition there is a need to link unstructured information to master data. An example here would be relating documents such as invoices to customers or emails with structured customer data. Also emails on product quality issues need to be related to product master data for example. Call logs in call centres is yet another example.

Business thirst for new information and to extract additional knowledge for greater business insight is still growing at a pace. From this the business need is to be able to integrate and analyse structured and unstructured information and so expand beyond traditional BI systems focussed on structured data into the world of unstructured analytics. This market is massive and it is clear that traditional BI vendors have still not fully switched on to the value of this market and so we see a lot of fairly unknown vendors (at least unknown to the traditional BI professional) pushing into the BI market and claiming market share. Vendors such as Attensity, Clarabridge, ClearForest, Endeca, Fast and InXight are all doing well in this space. All these vendors are in the world of Enterprise Search and text analytics. It is no surprise that BI vendors are starting to introduce partnerships with these relatively new kids on the BI block and you could easily see some acquisitions possibly occurring here in the next 12-18 months as large BI vendors expand their platforms to gear up for the sea of unstructured content they may be asked to analyse.

Text mining and search analytics are just a few examples which will make it easier to find information and to generate dynamic taxonomies from Search results that can then be used to rapidly zoom in on what you are looking for. It is also true that text analytics can generate XML documents. So for example if a customer writes an email along the lines of “My name is Mike Ferguson and yesterday I bought a new Panasonic HD TV from ABC Electronics in Manchester……” I can extract from this a customer name, a manufacturer name, a product name and a store name which could be brought together in an XML document about a sales transaction. Therefore customer detail could be taken from this and deeper insight extracted from content like emails, blogs, wikis, web chat, …..etc. Once this is extracted I can then start to analyse it with traditional BI tools and visualise new information and analyses. This extraction of terms from unstructured content can facilitate a richer more productive multi-faceted search experience which to any BI professional looks like OLAP for search. Here the user can drill down and slice and dice search results any way they like to help navigate quickly to the information they want.

With social network tagging also exploding on the scene, we can also analyze popular tags and help people identify dominant ways in which people are categorising relevant information that may be of interest to them.

This is a new area for BI and I expect it to grow very rapidly over the next year. I shall be writing an article on this in the next few months on the B-EYE-Network.

If you are already doing work in this area, please share your experiences. It would be great to hear some case studies.

Anyone browsing the web today would have to be blind to not notice the explosion of Web 2.0 technologies and in particular the rapid uptake of RSS (Rich Site Summary or now popularly known as Really Simple Syndication) and ATOM feeds that are springing up everywhere. RSS and ATOM keep us informed of any changes to content that we have registered an interest in i.e. subscribed to. Modern day browsers support the ability to regularly check RSS and ATOM sites that you have subscribed to on a regular basis. Even MS Office 2007 Outlook can organise feeds like your email boxes to keep you sane and keep you organised on any new information that you are interested in. If you want you can download popular feed aggregators such as NewsGator (http://www.newsgator.com). What is interesting about this is how such feeds could relate to Business Intelligence. After all, so many of us want to know when a new version of a report is available, or subscribe to whenever a metric changes so that we are automatically notified about it. Of course many BI tools already support alerting of users when metrics change. \this can be done by email or on a dashboard for example. However with reports and dashboards all being published to the web these days it seems to me to be inevitable that BI vendors will add support for RSS feed or ATOM feed generation based on changes to BI metrics and BI content etc.

Where it has also major significance is in performance management. After all, being automatically kept up to date when metrics change, allows people to notice changes in performance and therefore take action if necessary. Being able to subscribe to be automatically notified of metrics changes via such a universally accepted and fast growing mechanism can only help business users manage the business better. Not only that but support for RSS and ATOM in BI content should allow is to distribute intelligence more widely and reach users who have either not got the time to use BI tools or are struggling with usability fears and confidence on BI tools. The integration of BI into portals is also pushing on this area because several new releases of portal products are also now allowing portal administrators to make any portal page they like into an RSS feed. Therefore as soon as BI (in the form of reports, scorecards, dashboards etc.) is published to a portal page, all subscribing users (assuming they are authorised) will get notified. I can see no reason to stop RSS Items or ATOM entries in theses feeds being new/changed BI metrics or BI documents (such as reports or dashboards).

It is also the case these days that search engines can search RSS and ATOM feeds which means that BI available through RSS and ATOM could be searched. Several BI vendors e.g. Hyperion, SAS, Business Objects and Cognos already have support for search engine integration in particular with Google and IBM. Business Objects recently expanded on this with their announcement of an Open Search Initiative that opens up partnerships with several other search engine and text analytics vendors.

If you are experimenting with RSS or ATOM and Business Intelligence or have already got something up and running in production, I and the rest of the readers on the B-EYE-Network would really like to here from you and benefit from your experiences. Let me know how you are using this and what technologies you are using. I would also like to hear from vendors on this topic.

As I attend the massive IBM Information on Demand conference here in Los Angeles, IBM today announced their new IBM Information Server http://www-306.ibm.com/software/data/integration/info_server/architecture.html which combines data replication, data discovery, data federation, data integration, data quality, metadata management and more. This is a very powerful product with an ability to support ETL for data warehousing systems, EII, and support operational data integration and data management. I will write more about this announcements in up and coming blogs.

Over that last decade or more I have reviewed many different BI Systems for clients. During that time and still today, one thing has continually cropped up time and again irrespective of vertical industry is the problem of inconsistent metrics. You see it in different data marts, different BI tools and of course across spreadsheets that may access BI systems. This problem has plagued enterprises for years and despite many efforts is still there today and particularly acute in Excel spreadsheets that fly around the enterprise as people supply intelligence in spreadsheet form to others. Historically this problem has often arisen when companies have built different BI systems at the line of business level rather than at an enterprise level. You might think that this is a crazy idea but time and again it has happened as pressure from line of business executives has grown to deliver some kind of BI to support decision making in their particular area.

Take any large bank for example. Here you may find separate product based risk management BI systems for mortgage products, card products, loan products and savings products. Product based risk management BI systems are a classic set-up in many financial services organisations and it is only recently that many financial institutions are seeing the need to re-visit risk management at the customer levels to see risk exposure across all products owed by a customer.

Another example is when an IT department has built one or more BI systems and then given the user base total freedom to select their own BI tools. This is the ‘Field of Dreams’ approach to building a data warehouse, i.e. the ‘build it and they will come strategy’. Therefore the user base in this example may be choc full of different BI tools chosen by different user departments to access the data stores that IT have built. A third example is the parallel build approach whereby companies have spawned different BI teams around a large enterprise in order to speed up BI system development. These teams often end up building separate data marts as stand alone initiatives and so no common data definitions get used across BI systems.

Looking at these examples there is clearly been lots of opportunity over the years for inconsistency to creep into the BI implementations. This has not happened deliberately. It is more of an inadvertent consequence of stand-alone developments and lack of communication across BI development teams and line of business departments.

The consequence today is that is not uncommon to find different BI tools in an enterprise each offering up the same metric under different data names. You may also find that what looks like the same metric may have different formulae for different instances of it because of different user interpretations of it. Worse still is different metrics with the same name and different formulae. This is something that can cause significant confusion among business users often meaning that they are not clear on what BI metrics mean. In situations like this it is not uncommon to see users resorting to creating their own spreadsheets with their own version of the metrics they need and with their own version of formulae. Then of course they start emailing people attached spreadsheets and so the rot sets in and inconsistency starts to spread undermining all the investment in BI. I sense a few heads nodding out there, would I be right? So what can you do about it? Fundamentally a major problem with many BI tools is that they offer up what is marketed as ‘flexibility’ to the end user by giving them the chance to create their own metrics without policing this. So it is rare to see a BI tools either preventing users from creating duplicate metrics to those that already exist or to at least warn them that a metric already exists to avoid re-inventing it. This lack of policing is at its worst among the Excel spreadsheet ‘junkies’ that exist out there.

Amidst a climate of increasingly stringent compliance regulations this problem is beginning to seriously worry executives such as CFOs that often carry ownership of the compliance problem. What they want is common metrics definitions and for all tools to share access to these definitions. In particular, for reporting tools and spreadsheets to have access to such common metrics definitions and to prevent authorised users from inadvertently creating the sane metrics again and again without knowledge of what already exists. Is it any wonder therefore that common metadata is becoming increasingly important year on year.

What we want are “metrics cops” that prevent users from creating inconsistent metrics definitions. Therefore if a user tries to create a metric perhaps with a new name but with the same formulae that already exists elsewhere, a metrics cop would intercept this request and inform the user that a metric with the same formulae already exists and they should re-use the commonly defined metric if they need that metric in a report. Equally, if two users each try to create a metric with the same names but different formulae, a metrics cop should once again intervene and prevent this from happening so that clarity, consistency and compliance are all maintained. It was therefore with interest that I looked at the recent e-Spreadsheet 9 announcement from Actuate (http://www.actuate.com) which has the capability to generate spreadsheets for many different users while guaranteeing metrics re-use across all spreadsheets. No only that but if the same metric is needed in different spreadsheets, this product inserts the same formulae in all spreadsheets that need it, thereby guaranteeing consistency. Several other BI vendors are beginning to look at this problem but often only across their own BI tools. As we move towards shared metadata across all BI systems to enforce compliance, the need for metrics cop functionality in BI tools such as reporting and OLAP is becoming a must for many enterprises. In addition performance management tools and applications need to also enforce this.

Therefore look closely to see how BI vendors stack up in terms of policing metric creation in their BI tools and performance management applications to help prevent your users from inadvertently contributing to metrics inconsistency, reconciliation and compliance problems. Also look to see how BI tools can discover what metrics already exist in your BI systems and how they can import metrics definitions from other technologies where they are defined. Finally check to see how BI platforms allow you to share common metrics definitions across 3rd party BI tools. This is especially important when it comes to Excel.

In my blog on MDM Straightforward Implementation or Iceberg Project, I highlighted the difference between master data integration and enterprise master data management (MDM). A key difference between the two is that enterprise MDM involves the MDM system being the single system of entry (SOE) as well as a system of record (SOR) while master data hubs persist integrated master data and are SORs while existing operational applications remain SOEs. I also highlighted the significant effort involved in transitioning from master data integration to enterprise master data management, and it is here that I want to focus in this blog.

The reason for singling out this transition is because another technology often being implemented in IT presents an opportunity for companies to start switching off screens, form fields, etc., to do with operational application SOEs and move towards a single SOE enterprise MDM system. That technology is enterprise portal technology. Example enterprise portal products include BEA AquaLogic Interaction Portal, IBM WebSphere Portal Server, Microsoft Office SharePoint Portal, Oracle Fusion Portal and SAP NetWeaver Portal.

As companies implement enterprise portal software, one of the tasks that needs to get done is the integration of applications into a single, personalised and integrated user interface served up by portal technology. For many operational applications that are master data SOEs today, integrating them into a portal often means that their user interface needs to be redeveloped to become a portlet-based user interface with multiple portlets appearing on portal pages served up to users. If existing operational master data SOE applications are slated to have their UIs redeveloped to be plugged into a portal, then MDM developers should seize the opportunity to request user interface changes to those systems in order to decommission application-specific master data entry screens and master data attributes on line-of-business application forms. They can then introduce equivalent master data forms or screens that maintain master data as portlets in the portal. These new portlets would directly maintain master data in MDM data hubs rather than updating line-of-business operational application local data stores. Note that portlets associated with MDM systems can co-exist on a portal page alongside operational application portlets, and so the user still sees that they can maintain master data as they did before. The difference here is that data entry on some portlets may be transaction data held in the application data store, while data entry on other portlets maintains master data in the MDM system. By doing this, the transition to enterprise MDM can take place gradually and ‘piggyback’ the budgeted and planned redevelopment of application user interfaces as they get integrated into enterprise portals. This approach enables two things to happen at once: application UI redevelopment for integration into a portal and gradual switch to using MDM maintenance portlets as a single SOE method to maintain data in the MDM system. Enterprise MDM systems can then synchronise changes to master data with other applications hat make use of this data.

So don’t miss the portal opportunity as a vehicle to transition to enterprise MDM!

In response to the question from David Jackson (great question by the way) on federated MDM, I have several thoughts on this kind of approach. The question is how these non-overlapping views are managed at different levels. In other words are these virtual views rendered on-demand by an EII tool for example or are they views as per straight forward relational DBMS views on a persistent master data store (sometimes referred to as a hub) that has been established in the enterprise. That is the first question. The second is “are the views using the same data names as the underlying master data i.e. common data names that would be associated with master data.

Looking at the first example, if these non-overlapping MDM views are virtual and created ‘on-the-fly’ from disparate data sources by federated query EII tools, then EII technology would need to support global IDs and be capable of mapping disparate IDs associated with the master data in disparate systems to these IDs. I am somewhat sceptical of this approach mainly because of the limitations that some EII products place on enterprises. Staying with an EII federated query approach, the next question is what if you wanted to update these non-overlapping virtual views of disparate master data that have been rendered by EII. In that sense, the EII product has to support heterogeneous distributed transaction processing across DBMS and non-DBMS sources. This is supported by some EII tools but certainly not all of them. EII is still primarily use in a read only capacity. I would be very interested in any experiences of companies using EII on its own to manage master data. Please share with us what you are doing out there!! In that sense the registry approach MDM products (e.g. Purisma) may be a more robust way of dynamically assembling data from disparate systems on-demand but again the question is how is the data maintained i.e. what is the system of entry (SOE). Is the MDM system the SOE as well as a system of record or are the line of business operational systems still SOEs.

If the master data is already integrated and persisted in an MDM data hub, then providing views of this is potentially achievable via straight forward relational views. Again however the question of update comes to mind with view updateability. This is a very well documented topic that stretches way back to the ’80s with writings from leading relational authorities such as Dr E.F. Codd and Chris Date. So the processes around maintaining non-overlapping views, which system or systems remain master data SOEs and the technical approach taken to implement this all need to be considered.

For me the bigger question is data names in these MDM views. You could argue that views (virtual or otherwise) allow you to render master data using different data names in every view. To be fair, David’s question stated non-overlapping views. In my opinion the data names in these views should remain as the common enterprise wide data names and data definitions associated with the master data. After all this is MASTER data that should retain common data names and definitions if at all possible. I accept that when subsets of master data are pushed out to disparate applications then the subset of data once consumed by the receiving application may end up being described using application specific data definitions (because that is how data in the application specific data model is defined). But if we are to create views of master data at different levels of the enterprise, in my opinion we should insist on common enterprise wide data names in these non-overlapping MDM views to uphold consistency and common understanding. Any portals that present this data or new applications and processes that consume it should if at all possible retain those common definitions. Again, in my opinion, common understanding and enterprise wide data definitions (i.e. master metadata) is king.

In fact, I would argue that without common data definitions (which is sometimes referred to as a shared business vocabulary) a Federated MDM approach using non-overlapping views would fail because it is the common metadata definitions that hold the whole thing together. This brings up another point and that master data should be marked-up using common data names wherever it goes and that metadata management is just as fundamental to success in any MDM strategy as the data content itself. A shared business vocabulary (SBV) and understanding the mappings between SBV common definitions for master data and the disparate definitions for it in disparate systems is absolutely key.

Let me know what you think and thank you David for a truly excellent question.

Having travelled to San Francisco just over a week ago to speak at the Shared Insights (formerly DCI) Data Warehouse and Business Intelligence conference, the experience of going through Manchester and Heathrow Airport security was something I don’t think I’ll forget for a long time. I started on my travels on the Saturday morning just after the mid-week scare in the UK when everything went critical. It was the first time in my life that I have travelled on business without my laptop due to the policy of absolutely no hand baggage. I have to say I felt almost naked without my laptop, but I didn’t want to run the risk of losing it or having it crushed as it travelled through the checked baggage system given the fact that I have reached many a destination airport only to find that my luggage did not. So I took the safe route and took my presentation on EII In-Depth on a memory stick and packed that in my suitcase.

I have to say as an airline passenger I don’t think I have ever felt so safe given the fact that security searched everyone (and were fairly pleasant in the process) at Manchester and again at Heathrow. Belts, shoes, jackets, watches and just about everything else was x-rayed. Nothing other than your travel documents and prescribed medicine was allowed on the plane, and so I watched every video British Airways had on offer! Even when we got on board, there was a one-hour delay while details of everyone who boarded were sent to the U.S. for checking before we were allowed to take off. When I got to the U.S. and picked up my suitcase, I was relieved to know that I had retrieved my PowerPoint presentation on my memory stick that I packed in my suitcase and could fulfil my obligation of presenting at the conference. You might think I should have e-mailed the presentation in advance well, I did that too, but as many of you may agree, it is best to have a backup. Talk about secure having got through passport control, I was then selected at random to have another check by U.S. customs who looked through my luggage and asked me a few questions and then sent me on my way with “Welcome back to the U.S., sir.” By the time I got to my hotel, all I could think of was the seriously secure BI presentation that travelled with me through security on my memory stick. It was like I was being protected so well in order to get me safely to the conference to do my presentation. I dutifully then turned at the conference having requested a laptop in advance to present from which Shared Insights happily provided. Of course, I asked them if they had the presentation I emailed in advance, and the answer was NO! So the moral of this story is always take a backup, and thanks to airport security for looking after my memory stick!!

An article on EII In-Depth will be the feature article the UK Business Intelligence Network Newsletter. Be sure to subscribe (it is free!) so you can be one of the first to read all about EII. The newsletter will also feature a great article about the globalisation of business intelligence.

Let me know what you think and what you are doing with enterprise information integration (EII) and business intelligence here in the UK.

The announcement by Teradata (a division of NCR) that it is entering the Master Data Management marketplace will set heads turning in a fast-moving market that already has a number of vendors competing for business. Vendors already in this increasingly crowded market include:

Teradata’s first MDM offering is Teradata Product Information Management (PIM) solution. No doubt Customer Information Management will follow. The Teradata PIM solution is being rolled out after Teradata announced an agreement with I2 to roll its PIM solution onto the Teradata platform. There are obvious reasons why this makes sense for Teradata. Firstly their message has always been about building Enterprise Data Warehouses on the Teradata platform and single version of the truth. Adding a master data management solution to this platform allows a core dimensional data source to be on the same platform as a data warehouse making it less costly to manage (no separate server needed, existing administration skills for Teradata shops, etc.) and easy to supply data to analytical systems. In addition, Teradata has added event-driven support over the last several years via integration with popular messaging systems. While this was originally implemented to introduce real-time event driven data capture into a data warehouse, there is nothing to stop Teradata exploiting it as a two-way mechanism so that data changes to master data can be set out of the Teradata MDM platform to other operational applications to keep them synchronised.

An obvious question is transaction processing and MDM on a platform optimised for analysis and business intelligence. Even there, it seems that this should not be a problem for Teradata since they have long had workload management and the ability to fence off hardware to dedicate to specific tasks. Furthermore, Teradata could instantly analyse changes to master data on their platform even before moving that data into a data warehouse to reveal early behavioural shifts. Not only that, but with master data and the data warehouse on the same platform it would make it relatively easy to introduce versions of dimensions into an analytic data warehouse environment and quickly compare business metrics across different hierarchies over time.

All in all, the arrival of Teradata in the MDM market makes a lot of sense for them. The question is whether this MDM solution is implemented as a system of record only (i.e., changes to product data come though other operational applications which remain systems of entry) or whether Teradata can convince prospects and existing customers to make it both a system of record and a system of entry. Doing the latter would take Teradata into new territory — that of operational transaction processing. In a world whereby many companies are looking to simplify and consolidate data, this move may well trigger significant growth for a company with a long track record in data.

Increasingly here in the UK, I am getting asked about whether master data management (MDM) has a role in business intelligence (BI). The obvious answer is that master data management systems can play a significant role in BI by supplying the dimension data needed in BI systems. Of course, MDM systems also provide operational systems with this core data and with changes to such data.

On the surface, MDM looks fairly straightforward, and looks like it just involves integrating data into core data entity (e.g., product, customer and asset) data hubs and then managing change to such data. But is it that simple? After all, some vertical industries seem to be well down the road here (e.g., manufacturing) and are supplying dimension data to BI systems already from MDM data hubs. However, many people I talk to, across most vertical industries, are still trying to get to grips with the impact of MDM on their business. Why so I wondered? As I have researched more deeply into this problem I am not surprised at the hesitation as there is a lot more to this than first meets the eye. In the work I have been doing with clients, it is clear that introducing master data management hubs into a company is not the problem as long as these hubs are simply systems of record (SORs) and not systems of entry (SOEs). If they are just systems of record, then this in not really master data management; it is master data integration. In other words, changes to master data continue to occur in existing operational applications and these changes are trickle-fed to the master data hubs. However, if we want true MDM, then this means that the MDM system has to become a SOE, and to make that happen means that a company has to undertake a huge amount of business analysis work and application change management to wrestle the problem of master data management to the ground. I am already clear that this is a multi-year project.

While there is a thirst for master data management applications such as customer information management, product information management etc., companies have to think about the consequences of these applications on their business and on capital and IT resources needed to implement it. In my mind, there are all sorts of problems here. Looking at customer data, for example, the issue here is really to do with parties (people or companies, for example). A person can have multiple identities (e.g., an employee number, a customer number, a national insurance number, etc.) Also, in some cases, specific vertical industries such as banking assign even more identifiers to a party. In this case, they often use product numbers to identify customers (e.g., a credit card number, a mortgage roll number, etc.) such that if you have a two credit cards, a loan, a mortgage, and are an employee, you are identified differently across a range of line of business applications. Therefore, a party, for example, is awash with identifiers.

So the first problem is that you need to assign a Global Identifier to a master data entity and a mechanism to map all the other identifiers to the global one. Then there are the master data entity (e.g., customer, product, asset…) attributes that describe that entity. When looking at these attributes, the problem I run into is that many different line of business operational applications are systems of entry (SOEs) for these attributes. These existing disparate operational systems are therefore involved in distributed maintenance of that master data. In addition, in many cases, these disparate systems also double up as systems of record (SORs) for just the attributes they maintain. In other words, we have fractured master data with parts of an entity (subsets of its attributes) maintained by different applications. The consequence for many companies today is that they have had to put in place FTP and messaging solutions to move subsets of master data between systems in order to synchronise it in different applications. While this may have been manageable initially on a few systems, many companies are now at the point were they have FTP and message mania. We talk about XML messaging – well we spelt it wrong. It should take out the M and pronounce it EXCEL messaging!!

When I look at vendor offerings that talk about data hubs as a way to centralise master data, it is clear that these hubs can be systems or record but are a far cry from systems of entry unless you switch off all existing ways to update such data in other applications. Therefore, all I can see is that master data hubs can become the system or record while existing applications remain the systems of entry maintaining changes to master data attributes. Most people I talk to can get their head around that – i.e., line of business changes to master data are propagated to the master data hub. However, this is just master data integration. It is not MDM. MDM requires that the MDM system is the system of record and the system of entry and also that it manages and tracks all changes to such data over time. This includes things like product and customer hierarchies.

Companies therefore have to make sure that what they are buying can do all of this and not just master data integration. Even if an MDM solution can be both an SOR and an SOE, getting to that point is not easy. To implement this means that companies have to:

Identify and model all the business processes they have in place in order to know what process activities are associated with changing master data – i.e., all SOE activities

Identify and make a list of all functions in all existing operational applications that maintain specific master data attributes – i.e., that perform maintenance on that data

Identify all screens in all existing applications that allow users to maintain master data

Identify all duplicate functions across existing systems that are maintaining the same master data attributes so that potential conflicts in master data systems of entry are identified

Start a change management implementation program that systematically causes change to line of business systems in order to

Weed out duplicate or overlapping functionality to stop master data conflicts in existing systems of entry

Change line of business SOE applications to switch off data entry screens or to remove master data attributes from line of business application forms

Introduce equivalent forms or screens in the MDM system to directly maintain the master data in MDM data hubs (this is a change from many systems of entry to one system of entry)

Understand the impact of that kind of change on existing line of business applications (e.g., line of business SOE systems may update more than master data in a specific transaction in which case the business logic in the line of business application may also have to change)

Introduce data synchronisation from the MDM system of record back to the line of business systems to keep these applications in sync

Change business processes to cause change to master data using the new process

Introduce common master data services that applications should call to access and maintain master data in such data hubs

Change existing ETL jobs to start to take specific master data from the MDM system rather than line of business systems to populate BI system databases

Etc. etc.

Looking at this, it is clear that in solving the MDM problem, the move towards a true enterprise MDM system as opposed to just a master data hub that integrates master data, is going to be a long hard slog. This is a multi-year program that is more like an “iceberg” project where there is a lot more to it than first might seem necessary.

My concern here is that part of the way down this road, many companies could hit a major roadblock in the form of packaged applications that act as SOEs for master data. An example might be CRM applications maintaining customer data. How can companies remove/disable screens or alter on-line forms in these packaged applications to switch to a true MDM system. It assumes all packages can be customised. Maybe in first tier packaged applications (SAP, Oracle packages, etc.) this is possible, but in other applications reality tells me we’ll be lucky if it’s the case. So reality states, therefore, that you may not get all the way to the true MDM promise land. The best thing is to keep advancing systematically as far as you can go.

Welcome to my blog.It seems strange that I should be writing a UK blog from Italy! Rapallo, to be precise. What can I say, if it’s Friday and you can’t figure out why your ETL job is not working or if your metrics won’t calculate correctly on your OLAP server, just leave it, get out of the office and come to this stunning place for a weekend. You won’t regret it. Just fly to Genoa and get a cab – it’s about a 30 minute drive.

Anyway, I hope to help you stay in touch with hot topics and reality on the ground in the UK and European business intelligence markets and to provide content, opinion and expertise on business intelligence (BI) and its related technologies. I also would relish it if you too can share your own valuable experiences on the discussion forums and surveys. Let’s hear what’s going on in BI in the UK. If you’ve got a gripe, let’s hear it. If you think a product is cool, let’s hear that too, and most of all, let’s hear what your using BI for in your business and what you want to see covered. I’ll try to explore established and newly emerging technologies in BI including data integration, BI platform tools such as reporting, OLAP, dashboard builders, predictive analytics and performance management software such as scorecarding and planning. I’ll also look at data warehouse appliances, data visualisation, portals and operational BI and at BI applications.

I’ll start the ball rolling by looking at the platform market. Consolidation is upon us in the business intelligence market with large BI vendors and software giants battling it out for market share. Business Objects, Cognos, Hyperion and SAS are the largest of the independent BI vendors competing with IBM, Microsoft, Oracle and SAP for the BI platform crown in most enterprises. Of course, there is plenty of action elsewhere from may other BI vendors. An example of that is the increasing number of data warehouse appliance offerings on the market from start-ups and established BI vendors including (in alphabetical order) Calpont, DATAllegro, Greenplum, IBM BCU, Netezza and SAP BI Accelerator – with more to come I am sure.

Much talk over the last few years has been about the move to single vendor BI platforms. In the UK, I hear a lot of talk about it. Companies building BI systems for the first time are definitely going for single supplier BI platform solutions. However, companies with long established best-of-breed BI systems still need to be convinced but will most likely slowly move to fewer suppliers if there is value in doing so. I think the impact of Microsoft in the BI platform market with SQL Server 2005 BI platform tooling (SQL Server Integration Services, SQL Server Analysis Services, SQL Server Reporting Services, SQL Server Notification Services and SQL Server Report Builder) appears to be having an impact on BI platform pricing. Certainly many companies talking to me have been raising the issue of pricing. Emerging open source BI initiatives (e.g., Jasper, Pentaho, BIRT and Palo, to name a few) are also bound to add to that pricing pressure forcing BI platform pricing down over the next 18 months. If this does happen, it is obvious that vendors will compensate by shifting the value to performance management tools and applications, vertical BI applications and the explosion of interest in data integration and enterprise data management.

Enterprise data management seems to be taking on a life of its own with data integration now breaking free of the BI platform and going enterprise-wide. Here we have ETL, data quality, EII, data modelling and metadata management all coming together into a single tool set. Much is being written about EAI, ETL and EII (or as I like to call it EIEIO !!), and I’ll look into these areas in upcoming blogs and articles.