In my initial post about doing iterative development for BI/DW projects, I mentioned that one of the challenges was developing a solid foundation while doing iterative development, especially in the first few iterations. If you are started from scratch on a new BI initiative, there is often a lot of work to do in getting environments established, developing processes for moving data and code between environments, and exploring and validating the source data to be used. Unfortunately, most of this work does not result in deliverables that an end user would consider valuable. Since, as part of each iteration, you want to have the possibility of delivering working functionality to the stakeholders, this can present a problem.

Since most end users consider working functionality to be something that they can see in a nice user interface, you need to look at ways to minimize the development time required to present data to the end user. Some of the common time-consuming tasks in the first couple of iterations are:

Establishing the environments

Exploring and validating the source data

Developing ETL processes to move and cleanse the data

There are really no quick workarounds to setting up the environments. In fact, I’ve usually found that taking shortcuts on the environments leads to much bigger problems down the road. However, what can be effective is to minimize the number of of environments that you deal with in each iteration. While theoretically you should be able to deploy to production in the first iteration of a project, it’s rare that this is actually needed. So instead of creating a development, QA, and production environment, consider only establishing the development and QA environments. I do think that having at least two environments is important, so that you can begin validating your deployment procedures.

Exploring and validating the source data is definitely important. In the first couple of iterations, though, it’s often necessary to limit and restrict what you explore. For example, a project I was involved in recently had some very serious issues with data quality. The source database did not enforce referential integrity, so a large percentage of the data in the source was not related to the rest of the data correctly. Rather than derailing the current iteration to completely research and resolve the data quality issues, the project team and the stakeholders made the decision to only import data that was related correctly. This enabled the project team to still present a set of working reports to the stakeholders at the end of the iteration, rather than not being able to demonstrate any working functionality. The subsequent iterations were adjusted to better reflect the level of data quality.

ETL processes can be time-consuming to develop, particularly if the organization does not already have a framework in place for the ETL processes. In the first couple of iterations, an alternative approach is to load data in a more manual fashion, using SQL scripts or direct manipulation to get an initial set of data populated. This has a couple of benefits. One, it allows the time for building a full ETL process to be spread across multiple iterations. Two, it allows the end users to get a look at the data (in a friendly user interface, if you follow the advice above) and validate that it is correct.

A key part of developing the foundation, while still providing value in each iteration, is accepting that you can’t do it all in a single iteration. The foundation development will have to be spread across multiple iterations, and it is acceptable to build some scaffolding code in order to deliver functionality in each iteration. Clearly, you want to minimize that amount of code that can’t be reused. Generally, I’ve found that with the Microsoft toolset for BI, it’s pretty easy to build incrementally, with minimal rework. However, even if your tools don’t support this as well, in my experience the downsides of having some code that gets replaced in a later iteration are far outweighed by the benefits of being able to demonstrate tangible progress to the stakeholders in each iteration of the project.

As I mentioned in my previous post, defining an appropriate scope is difficult when you are attempting to do iterative development on BI/DW projects. The problems with this are not unique to BI/DW projects, of course. All projects have scope issues. Most of the same scope management techniques that work well on traditional application development projects also work for BI/DW projects. There are two unique aspect of scope for BI/DW projects, though. One is that there is often significant work required “behind the scenes” to deliver even small pieces of visible end user functionality. The other is that there can be significant “hidden” scope in properly cleansing the data and making it useful from a business perspective. This can be challenging because the end users may have a perception that they are not getting very much functionality, particularly in the first several iterations of the project.

What are some of the hidden aspects of BI/DW projects? ETL and data profiling are two of the most common. I consider these hidden because the end users of a BI application rarely are intimately involved in the ETL process development. They may be involved in the data profiling, but often are not involved in the data cleansing that usually accompanies it. These are time-consuming activities that the users only get to appreciate indirectly, so they often don’t put very much value on it.

How can this be addressed? I’ve found that there’s two parts to it. First, you need to do a certain amount of education with the stakeholder on what happens behind the scenes, so that they have a better understanding of the effort that has to be expended. The benefits that they get from this effort need to be explained as well. Telling them that ETL processes are time-consuming to implement isn’t very effective unless you also explain the benefits of well-implemented ETL: cleaner, consistent, conformed data, with appropriate controls to verify that the right data is going into the data warehouse, and the bad data is being excluded.

However, education is not enough by itself. The second part of it is to show them that they can get incremental benefits. Again, as pointed out in the previous article, each iteration should deliver something of value to the users. It’s important to do this from the first iteration on, and to continue to do it consistently. One effective way to determine what would be of value for an iteration is to ask the users to decide on one or two questions that they want to be able to answer. The scope of the iteration becomes delivering the information that allows them to answer those questions. But what if the questions are complex, and you don’t feel that they can be addressed in a single iteration? I’ve generally found that if the questions are that complex, you can break them up into smaller questions and negotiate with the stakeholders to deliver them over multiple iterations.

This does require that the development team on the project has an agile mindset, and is focused on meeting the deliverables for the iteration. It also poses a more significant challenge when a project is in an initial phase, and the infrastructure is still being put in place. I’ll discuss this challenge further in my next post.

In conclusion, scope management is important on all projects, not just BI/DW projects. However, perceptions of scope may be more challenging on BI/DW challenges, because of the hidden nature of some of the activities. It’s important to communicate the value of these activities to the project stakeholders, and to demonstrate that you can consistently produce incremental deliverables, while still carrying out these valuable activities.

I’m a fan of agile development. Prior to focusing on business intelligence and data warehousing, I architected and developed client server and n-tier applications, and I found that agile development techniques delivered better end results. I believe that, in large part, this came about because of the focus on smaller, functional iterations over a more traditional waterfall approach. However, this approach is still not regularly used on BI projects. Since it can have a big impact on the success of your BI initiatives, I’d like to list some of the challenges that prevent adoption of this approach. I’ll go into more detail on each of these in my next few posts, and also cover some of the benefits you can see if you overcome the challenges.

First, though, let me define what I mean by an iteration1. An iteration is a short development cycle that takes a specific set of requirements and delivers working, useful functionality. Generally, I think the duration of an iteration should be 2 weeks to a month, but I don’t consider that an absolute rule. More than 2 months for an iteration, though, really impacts the agility of the project, so I prefer to keep them shorter. Delivering working functionality in an iteration doesn’t necessarily mean that it has to go to production. It does mean that it could go to production, if the project stakeholders choose to do that. Delivering useful functionality means that what’s been developed actually meets a stakeholder’s need.

There are a number of challenges that you might encounter in doing iterative development for BI. Fortunately, you can work around these, though it’s not always easy.

ScopeMany BI/DW initiatives are large and involve multiple systems and departments. Even smaller BI projects often have multiple audiences with differing needs and wants. Managing the scope of the effort appropriately can be challenging in that environment. This means that defining and managing scope is critical to successful iterative development.

Foundation developmentParticularly when BI projects are getting off the ground, they need a lot of foundational setup – environments, software installations, data profiling, and data cleansing. This poses definite problems for the first couple of iterations, especially when you take into account the guideline of delivering working and useful functionality with each iteration. The foundation needs to be built in conjunction with delivering useful functionality.

Existing infrastructure and architecture Iterative BI development requires an infrastructure for development that is flexible and adaptive, and it makes it much easier if existing solutions were architected to be easily modified. Sadly, this is not the case in many organizations. Existing process and infrastructure tends to be rigid and not support or encourage rapid development. Existing data warehouses tend to be monolithic applications that are difficult to modify to address changing business needs. And many BI development tools do not provide adequate support for rapidly changing BI applications.

Changing requirements2Changing requirements is a fact of life in most IT projects, and it’s definitely the case in BI/DW projects. While agile and iterative development can help address changing requirements, it still poses a challenge. As requirements shift, they can have a ripple effect on the BI infrastructure – adding a new field to a report may require changes to the database and ETL processes in addition to the report itself. This can make seemingly inconsequential changes take much longer than expected, and lower the adaptability of the project.

Design 2Due to the scope and complexity of many BI/DW initiatives, there is a tendency to get bogged down in design. This is particularly true when you consider that the database is a key part of BI/DW projects, and many BI developers feel most comfortable having a complete and stable data model before beginning development (which is perfectly reasonable). However, design by itself does not produce working functionality, so an iteration that involves nothing but design doesn’t really meet my definition of iterative.

After looking at all these challenges, you may feel that it’s pointless to try iterative development on BI projects. However, I’m happy to say that all these items can be addressed (granted, some of them are much more easily fixed than others). If you do make some changes, you can get a number of benefits, including the ability to rapidly accommodate change, deliver working functionality more quickly, and facilitate emergent requirements and design. The end result? Happy project stakeholders who are more satisfied with the results of their projects, and less wear and tear on the BI developers. Everybody wins!

Over the next couple of months, I’ll be posting more information about the challenges, and how you can work around them. Please keep reading, and if you have questions or comments, don’t hesitate to get in touch with me.

Notes1. If you are familiar with Scrum, you will notice some similarities between what I describe as an iteration and a sprint. Scrum has influenced my thinking on agile methodologies quite a bit, and I’m a big fan of it. However, because I don’t follow the definitions in Scrum rigidly, I’ve found it better to not use the same terminology to avoid confusion. If I do refer to a sprint, though, I will be referring to the Scrum definition (a 30-day increment, resulting in potentially shippable product).2. Please don’t take the above to mean that I don’t believe in requirements or design – I feel that both are vital to iterative development. However, the approach that many BI practitioners take to requirements and design does not lend itself to iterative development.