DICE - Towards the Development of Data-Intensive Applications with Iterative Quality Enhancements

Focus Area

Recent years have seen the rapid growth of interest for developing enterprise applications that use data-intensive technologies. However, quality assurance in the software engineering process for these applications is still in its infancy.

MDE often includes QA techniques to ensure that software systems meet performance, reliability, and safety requirements through quality-driven design and iterative enhancement based on operational data. The quality-aware MDE support for data-intensive software systems is a challenging target, since existing models and QA techniques largely ignore properties of data such as volumes, velocities, or business values, and are therefore difficult to apply to Big Data applications. Furthermore, QA requires the ability to characterize the behaviour of technologies such as MapReduce, NoSQL, and stream-based processing, which are still poorly understood from a modelling perspective.

Market sector targets

Data-intensive technologies are important in many application domains, from predictive analytics to environmental monitoring, from e-government to smart cities.

Since the software development market expects to be dominate by data-intensive cloud applications in the next years, there is an urgent need for novel, highly productive, software engineering methodologies capable of increasing the competitiveness of software vendors.

DICE action intends to offer to software vendors a quality-driven MDE tool-chain for developing data-intensive cloud applications. The action includes three demonstrators in the domains of news and media, maritime operations, and e-government, proving the versatility of the framework for a variety of end users.

Addressing new challenges for cloud, IoT, big data

The growing importance of Big Data applications now calls to extend MDE and QA methods to better support Big Data technologies, which raise specific challenges. Incorporating quality in complex data-intensive application involves major business and technical challenges:

From a business perspective. After the rush to enter cloud and Big Data markets, small and medium-sized software providers have now to cope with steep learning curves in order to understand and enhance the quality of their products. In fact, they suffer the shortage of skills in quality engineering, and additional difficulties come from the high costs and the complexity of quality testing. Moreover, they need to deliver architectural changes iteratively, when service-level agreement (SLA) constraints are not met.

From a technical perspective. Incorporating quality assessment for design enhancement requires developing the following assets at least: data-aware modelling abstractions, transformation methodologies, simulation and analysis tools, verification methods, anti-patterns methodologies. All of them must be coordinated for being capable of continuously assimilating runtime information about the data and its use by the application.

Data-intensive applications are often based on Hadoop/MapReduce, which implies that to model these applications and annotate at design-time performance, reliability and cost requirements, new abstractions need to be developed. These include, among others, models for data storage, replication and transportation, for components such as mappers and reducers, and for the direct acyclic graphs used to describe data transformations and data movements. These abstractions are important to provide a complete description of the design-space of a Big Data application and thus enable automated reasoning on the best architectural and deployment choices, taking into account the specificities of these software systems. Yet, the extension of QA tools to meet this goal is particularly challenging. For example, modelling of MapReduce performance and reliability requires for example to:

explicit model the synchronization of the map and reduce processing phases;

characterize the impact of network latencies during the shuffle phases;

statistically characterize the execution times of each phase and its memory and storage requirements, which depend on data properties such as volumes;

This puts a high barrier for use of these techniques by developers not explicitly trained in quality engineering.

We argue that addressing these issues making quality-aware MDE accessible to developers of Big Data applications requires the design of an automated tool chain that will rely on UML meta-models annotated with information about data, data processing and data movements. The QA tool chain should cover simulation, verification and architectural optimization through feedback analysis.

More precisely, the focus of the DICE action is to define a quality-driven framework for developing data-intensive applications that leverage Big Data technologies hosted in private or public clouds. A novel profile and tools for data-aware quality-driven development are needed, as well as a methodology distinguished by its quality assessment, architecture enhancement, agile delivery and continuous testing and deployment, and relying on principles from the emerging DevOps paradigm [2].

How can you turn big data into smart data?

We argue that novel models and annotations are needed to describe data and Big Data technologies with respect to quality issues of efficiency, reliability and safety. Moreover, delivering methods and tools should be designed to help satisfy quality requirements in data-intensive applications by iterative enhancement of their architecture design. That is, the data acquired during testing and operation will be deeply analysed to find quality pitfalls and outliers, which will lead to identify quality anti-patterns in the architecture and downstream design. Data will be then exploited to accelerate the application refactoring. It will do so using agile software development and a delivery approach inspired by DevOps.