Agile BI Development in 2016: Where Are We?

Agile development was meant to be a cure for everything. It’s 2016 and Tomas Kratky asks the question: where are we?

BI departments everywhere are under pressure to deliver high-quality results and deliver them fast. At the same time, the typical BI environment is becoming more and more complex. Today we use many new technologies, not just standard relational databases with SQL interfaces, but for example NoSQL databases, Hadoop, and also languages like Python or Java for data manipulation.

Another issue we have is a false perception of the work that needs to be done when a business user requests some data. Most business users think that preparing the data is only a tiny part of the work and that the majority of the work is about analyzing the data and later communicating the results. Actually, it’s more like this:

See? The reality is completely different. The communication and analysis of data is that tiny part at the top and the majority of the work is about data preparation. Being a BI guy is simply a tough job these days.

This whole situation has led to an ugly result – businesses are not happy with their data warehouses. We all have probably heard a lot of complaints about DWHs being costly, slow, rigid, or inflexible. But the reality is that DWHs are large critical systems, and there are many, many different stakeholders and requirements which change from day to day. In another similar field, application software development, we had the same issues with delivery, and in those cases, agile processes were good solutions. So our goal is to be inspired and learn how agile can be used in BI.

The Answer: Agile?

One very important note – agility is a really broad term, and today I am only going to speak about agile software development, which means two things from the perspective of a BI development team:

1. How to deliver new features and meet new requirements much faster

2. How to quickly change the direction of development

Could the right answer be agile development? It might be. Everything written in the Agile Manifesto makes sense, but what’s missing are implementation guidelines. And so this Manifesto was, a little bit later, enriched with so-called agile principles. As agile became very popular, we started to believe that agile was a cure for everything. This is a survey from 2009 which clearly demonstrates how popular agile was:

Source: Forrester/Dr. Dobb’s Global Developer Technographic, 2009

And it also shows a few of the many existing agile methodologies. According to some surveys from 2015, agile is currently being used by more than 80% or even 90% of development teams.

Semantic Gap

Later on, we realized that agile is not an ultimate cure. Tom Glib, in his famous article “Value-Driven Development Principles and Values” written in 2010, went a bit deeper. After conducting a thorough study of the failures, mistakes, and also successes since the very beginning of the software industry, one thing became clear – there is something called a semantic gap between business users and engineers, and this gap causes a lot of trouble. Tom Glib hit the nail on the head by saying one important thing: “Rapidly iterating in wrong directions is not progress.” Therefore, the requirements need to be treated very carefully as well.

But even with the semantic gap issue, agile can still be very useful. Over the last ten years the agile community has come up with several agile practices. They are simple to explain things that anyone can start doing to improve his or her software processes. And this is something you should definitely pay attention to. Here you can see agile practices sorted by popularity:

If you have ever heard about agile, these are probably no surprises for you. The typical mistake made by many early adopters of agile was simply being too rigid; I would call it “fanatic”. It was everything or nothing. But things do not work that way.

It’s Your Fault If You Fail

Each and every practice should be considered a recommendation, not a rule. Your responsibility is to decide if it works for you or not. Each company and each team are different, and if system metaphor practice has no value for your team, just ignore it like we do. Are you unable to get constant feedback from business users? Ok, then. Just do your best to get as much feedback as you need.

On the other hand, we’ve been doing agile for a long time, and we’ve learned that some practices (marked in red) are more important than others and significantly influence our ability to be really fast and flexible.

There are basically two groups of practices. The first group is about responsibility. A product owner is someone on your side who is able to make decisions about requirements and user needs, prioritize them, evaluate them, and verify them. It can be someone from the business group; but this job is very time consuming, so more often the product owner will be the person on your BI team who knows the most about business. Without such a person, your ability to make quick decisions will be very limited. Making a burndown list is a very simple practice that forces you to clearly define priorities and to select features and tasks with the highest priority for the next release. And because your releases tend to be more frequent with agile, you can always pick only a very limited number of tasks making clear priorities vital.

The second group of critical practices is about automation. If your iterations are short, if you integrate the work of all team members on a daily basis and also want to test it to detect errors and correct them as early as possible, and if you need to deliver often, you will find yourself and your team in a big hurry without enough time to handle everything manually. So automation is your best friend. Your goal is to analyze everything you do and replace all manual, time-consuming activities with automated alternatives.

What Tools To Use?

Typical tools you can use include:

1. Modern Version Control Systems

A typical use case involves a GIT, SVN, or Team Foundation Server storing all pieces of your code, tracking versions/changes, merging different branches of code, etc. What you are not allowed to do is use shared file systems for that. Unfortunately, it is still quite a common practice among BI folks. Also, be careful about using BI tools that do not support easy, standard versioning. Do not forget that even if you draw pictures, models, or workflows and do not write any SQL, you are still coding.

So a good BI tool stores every piece of information in text-based files – for example XMLs. That means you can make them part of a codebase managed by GIT for example. A bad BI tool stores everything in binary and proprietary files, which can’t be managed effectively by any versioning system. Some tools support a kind of internal versioning, but those are still a big pain for you as a developer and they lead to fragmented version control.

2. Continuous Integration Tools

You’ll also need tools like Maven and Jenkins or PowerShell and TeamCity to do rapid and automated build and deploy of your BI packages.

3. Tools for Automated Code Analysis and Testing

I recommend using frameworks like DB Fit at least to write automated functional tests and also using a tool for static code analysis to enforce your company standards, best practices, and code conventions (Manta Checker is really good at that). And do not forget – you can’t refactor your code very often without proper testing automation.

4. Smart Documentation Tools

In the end, you can’t work on the parts of your system you do not understand. The best combination of tools you can get is something like Wiki to capture basic design ideas and a smart documentation tool able to generate detailed documentation when needed in an automated way. Today there are many very good IDEs that are able to generate mainly control-flow and dependency diagrams. But we are BI guys, and there is one thing that is extremely useful for us – it is called data lineage, or you can call it data flow.

Simply put, it’s a diagram showing you how data flows and is transformed in your DWH. You need data lineage to perform impact analyses and what-if analyses as well as to refactor your code and existing data structures. There are almost no solutions on the market that are able to show you data lineage from your custom code (except MANTA of course).

And that’s it. Of course, there are some other more advanced practices to support your agility, but this basic stuff is, I believe, something that can be implemented quickly from the perspective of both processes and tools. I definitely suggest starting with a smaller more experienced team, implementing the most important practices, playing around a little bit, and measuring the results of different approaches. I guarantee that you and your team will experience significant improvements in speed and flexibility very soon.

MANTA is the central hub of all data flows in an organization, and with its lineage capabilities, it enables digital transformation. The platform allows information users to understand how data flows through all their systems and delivers actionable intelligence to boost governance efforts, accelerate development, shorten time-to-market, speed up the modernization process, ensure data quality, and enforce data security.