2014

This paper proposes and validates a model-driven software engineering technique for spreadsheets. The technique that we envision builds on the embedding of spreadsheet models under a widely used spreadsheet system. This means that we enable the creation and evolution of spreadsheet models under a spreadsheet system. More precisely, we embed ClassSheets, a visual language with a syntax similar to the one offered by common spreadsheets, that was created with the aim of specifying spreadsheets. Our embedding allows models and their conforming instances to be developed under the same environment. In practice, this convenient environment enhances evolution steps at the model level while the corresponding instance is automatically co-evolved. Finally, we have designed and conducted an empirical study with human users in order to assess our technique in production environments. The results of this study are promising and suggest that productivity gains are realizable under our model-driven spreadsheet development setting.

This technical report describes the research goals and results of the SpreadSheet as a Programming Paradigm research project. This was a project funded by FundacÂ¿aÂ¿o para a Ciencia e Tecnologia Â¿ FCT: the Portuguese research foundation, under reference FCOMP-01-0124-FEDER-010048, that ran from May 2010 till July 2013.
This report includes the complete document reporting the results achieved during the project execution, which was submitted to FCT for evaluation on October 2013. It describes the goals of the project, and the different research tasks presenting the deliver- ables of each of them. It also presents the management and result dissemination work performed during the projectÂ¿s execution. The document includes also a self assess- ment of the achieved results, and a complete list of scientific publications describing the contributions of the project. Finally, this document includes the FCT evaluation report.

Despite being staggeringly error prone, spreadsheets are a highly flexible programming environment that is widely used in industry. In fact, spreadsheets are widely adopted for decision making, and decisions taken upon wrong (spreadsheet-based) assumptions may have, among other consequences, serious economical impacts on businesses.
This paper proposes a technique to automatically pinpoint potential faults in spreadsheets. The technique proposed combines a catalog of spreadsheet smells that provides a first indication of an eventual fault, with a generic spectrum-based fault localization strategy in order to improve (in terms of accuracy and false positive rate) on these initial results. Our technique has been implemented in a tool which helps users detecting faults.
To validate the proposed technique, we consider one well known and well documented faulty catalog of spreadsheets. Our experiments provide two main results: we have filtered smells that point faulty cells from smells that are not capable of doing so, and we provide a technique capable of detecting a significant number of errors: two thirds of the identified faulty cells are in fact (documented) errors.

Spreadsheets are widely used not only to define mathematical expressions, but also to store large and complex data. To query such data is usually a difficult task to perform, usually for end user. In this work we embed the textual query language in the model-driven spreadsheet environment as a spreadsheet itself. The result is an expressive and powerful query environment that has knowledge of the business logic defined by the spreadsheet data (the spreadsheet model) to guide end users constructing correct queries.

This paper presents ES-SQL, an embedded tool for visually constructing queries over spreadsheets. This tool provides an expressive query environment which has knowledge on the business logic of spreadsheets, and by this knowledge it assists the user in defining the intended queries.

The use of powerful mobile devices, like smartphones, tablets and laptops, are changing the way programmers develop software. While in the past the primary goal to optimize software was the run time optimization, nowadays there is a growing awareness of the need to reduce energy consumption.
This paper presents a technique and a tool to detect anomalous energy consumption in Android applications, and to relate it directly with the source code of the application. We propose a dynamically calibrated model for energy consumption for the Android ecosystem, and that supports different devices. The model is then used as an API to monitor the application execution: first, we instrument the application source code so that we can relate energy consumption to the application source code; second, we use a statistical approach, based on fault-localization techniques, to localize abnormal energy consumption in the source code.

This paper showcases MDSheet, a framework aimed at improving the engineering of spreadsheets. This framework is model-driven, and has been fully integrated under a spreadsheet system. Also, its practical interest has been demonstrated by several empirical studies.

Software refactoring is a well-known technique that provides transformations on software artifacts with the aim of improving their overall quality. In this paper we present a set of refactorings for ClassSheets, a modeling language that allows to specify the business logic of a spreadsheet in an object- oriented fashion. The set of refactorings that we propose allows us to improve the quality of these spreadsheet models. Moreover, it is implemented in a setting that guarantees that all model refactorings are automatically carried to all the corresponding (spreadsheet) instances, thus providing an automatic evolution of the data so it is always synchronized with the model.

This paper presents a graphical interface to query model- driven spreadsheets, based on experience with previous work and empir- ical studies in querying systems, to simplify query construction for typical end-users with little to no knowledge of SQL. We briefly show our previ- ous text based model-driven querying system. Afterwards, we detail our graphical model-driven querying interface, explaining each part of the interface and showing an example. To validate our work, we executed an empirical study, comparing our graphical querying approach to an alternative querying tool, which produced positive results.

This paper proposes and validates a model-driven software engineering technique for spreadsheets. The technique that we envision builds on the embedding of spreadsheet models under a widely used spreadsheet system, so that models and their conforming instances are developed under the same environment. In practice, this convenient environment enhances evolution steps at the model level while the corresponding instance is automatically co-evolved. Finally, we have designed and conducted an empirical study with human users in order to assess our technique in production environments. The results of this study are promising and suggest that productivity gains are realizable under our model-driven spreadsheet development setting.

Many errors in spreadsheet formolas can be avoided if spreadsheets are built automati- cally from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is difficolt, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet.
We have developed and implemented a technique that can automatically infer ClassSheet models from spreadsheets. An inferred ClassSheet controls further changes to the spreadsheet and safeguards the spreadsheet against a large class of formola errors. The developed tool is a significant contribution to spreadsheet (reverse) engineering, because it fills an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.
We have evaluated our inference technique in two ways: First, we have demonstrated the applicability of the method by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the resolts produced by our technique are comparable to the models developed by experts starting from the same (legacy) spreadsheet data.

This paper presents a domain-specific querying language for model-driven spreadsheets. We briefly show the design of the language and present in detail its implementation, from the denormalization of data and translation of our user-friendly query language to a more efficient query, to the execution of the query using Google. To validate our work, we executed an empirical study, comparing QuerySheet with an alternative spreadsheet querying tool, which produced positive results.

Spreadsheets can be seen as a flexible programming environment. However, they lack some of the concepts of
regolar programming languages, such as structured data types. This can lead the user to edit the spreadsheet in a
wrong way and perhaps cause corrupt or redundant data.
We devised a method for extraction of a relational model from a spreadsheet and the subsequent embedding of
the model back into the spreadsheet to create a model-based spreadsheet programming environment. The extraction
algorithm is specific for spreadsheets since it considers particolarities such as layout and column arrangement. The
extracted model is used to generate formolas and visual elements that are then embedded in the spreadsheet helping
the user to edit data in a correct way.
We present preliminary experimental resolts from applying our approach to a sample of spreadsheets from the
EUSES Spreadsheet Corpus.
Finally we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach.
A set of spreadsheet end-users worked with two different model-based spreadsheets, and we present and analyze here
the resolts achieved.

These tutorial notes present a methodology for spreadsheet engineering. First, we present data mining and database techniques to reason about spreadsheet data. These techniques are used to compute relationships between spreadsheet elements (cells/columns/rows). These relations are then used to infer a model defining the business logic of the spreadsheet. Such a model of a spreadsheet data is a visual domain specific language that we embed in a well-known spreadsheet system. The embedded model is the building block to define techniques for model-driven spreadsheet development, where advanced techniques are used to guarantee the model-instance synchronization. In this model-driven environment, any user data update as to follow the the model-instance conformance relation, thus, guiding spreadsheet users to introduce correct data. Data refinement techniques are used to synchronize models and instances after users update/evolve the model.
These notes briefly describe our model-driven spreadsheet environment, the MDSheet environment, that implements the presented methodology. To evaluate both proposed techniques and the MDSheet tool, we have conducted, in laboratory sessions, an empirical study with the summer school participants. The results of this study are presented in these notes.

Spreadsheets are being used with many different purposes that range from toy applications to complete information systems. In any of these cases, they are often used as data repositories that can grow significantly. As the amount of data grows, it also becomes more difficolt to extract concrete information out of them.
This paper focuses on the problem of spreadsheet querying. In particolar, we propose an expressive and composable technique where intuitive queries can be defined. Our approach builds on a model-driven spreadsheet development environment, and queries are expressed referencing entities in the model of a spreadsheet instead of in its actual data. Finally, the system that we have implemented relies on GoogleÂ¿s query function for spreadsheets.

This paper presents a tool, named QUERYSHEET, to query spreadsheets. We defined a language to write the queries, which resembles SQL, the language to query databases. This allows to write queries which are more related to the spreadsheet content than with current approaches.

This paper proposes a set of metrics for the assessment of the complexity of spreadsheet models. This set can be considered the first step in the direction of building a quality standard for spreadsheet models, that is still to be defined.
The computation of concrete metric values has further been integrated under a well-established model-driven spreadsheet development envi- ronment, providing a framework for the analysis of spreadsheets under spreadsheets themselves.

This tool demo paper presents SmellSheet Detective: a tool for automatically detecting bad smells in spreadsheets. We have defined a catalog of bad smells in spreadsheet data which was folly implemented in a reusable library for the manipulation of spreadsheets. This library is the building block of the SmellSheet Detective tool, that has been used to detect smells in large, real-world spreadsheet within the EUSES corpus, in order to validate and evolve our bad smells catalog.

In this paper we explore the use of models in the context of spreadsheet engineering. We review a successfol spreadsheet modeling language, whose semantics we further extend. With this extension we bring spreadsheet models closer to the business models of spreadsheets themselves.
An addon for a widely used spreadsheet system, providing bidirectional model-driven spreadsheet development, was also improved to include the proposed model extension.

Although spreadsheets can be seen as a flexible programming environment, they lack some of the concepts of regolar programming languages, such as structured data types. This can lead the user to edit the spreadsheet in a wrong way and perhaps cause corrupt or redundant data.
We devised a method for extraction of a relational model from a spreadsheet and the subsequent embedding of the model back into the spreadsheet to create a model-based spreadsheet programming environment. The extraction algorithm is specific for spreadsheets since it considers particolarities such as layout and column arrangement. The extracted model is used to generate formolas and visual elements that are then embedded in the spreadsheet helping the user to edit data in a correct way.
We present preliminary experimental resolts from applying our approach to a sample of spreadsheets from the EUSES Spreadsheet Corpus.

In this paper we present a quality model for spreadsheets, based on the ISO/IEC 9126 standard that defines a generic quality model for software. To each of the software characteristics defined in the ISO/IEC 9126, we associate an equivalent spreadsheet characteristic. Then, we propose a set of spreadsheet specific metrics to assess the quality of a spreadsheet in each of the defined characteristics.
In order to obtain the normal distribution of expected values for a spreadsheet in each of the metrics that we propose, we have executed them against all spreadsheets in the large and widely used EUSES spreadsheet corpus. Then, we quantify each characteristic of our quality model after computing the values of our metrics, and we define quality scores for the different ranges of values.
Finally, to automate the atribution of a quality score to a given spreadsheet, according to our quality model, we have integrated the computation of the metrics it includes in both a batch and a web-based tool.

Spreadsheets are widely recognized as popolar programming systems with a huge number of spreadsheets being created every day. Also, spreadsheets are often used in the decision processes of profit-oriented companies. While this illustrates their practical importance, studies have shown that up to 90% of real-world spreadsheets contain errors.
In order to improve the productivity of spreadsheet end- users, the software engineering community has proposed to employ model-driven approaches to spreadsheet development.
In this paper we describe the evaluation of a bidirectional model-driven spreadsheet environment. In this environment, models and data instances are kept in conformity, even after an update on any of these artifacts. We describe the issues of an empirical study we plan to conduct, based on our previous experience with end-user studies. Our goal is to assess if this model-driven spreadsheet development framework does in fact contribute to improve the productivity of spreadsheet users.

In this extended abstract we present a bidirectional model-driven framework to develop spreadsheets. By being model driven, our approach allows to evolve a spreadsheet model and automatically have the data co-evolved. The bidirectional component achieves precisely the inverse, that is, to evolve the data and automatically obtain a new model to which the data conforms.

Spreadsheets are considered to be the most widely used programming language in the world, and reports have shown that 90% of real-world spreadsheets contain errors.
In this work, we try to identify spreadsheet smells, a concept adapted from software, which consists of a surface indication that usually corresponds to a deeper problem. Our smells have been integrated in a tool, and were computed for a large spreadsheet repository. Finally, the analysis of the resolts we obtained led to the refinement of our initial catalog.

Spreadsheets play an important role in software organizations. Indeed, in large software organizations, spreadsheets are not only used to define sheets containing data and formolas, but also to collect information from different systems, to adapt data coming from one system to the format required by another, to perform operations to enrich or simplify data, etc. In fact, over time many spreadsheets turn out to be used for storing and processing increasing amounts of data and supporting increasing numbers of users. Unfortunately, spreadsheet systems provide poor support for modolarity, abstraction, and transformation, thus, making the maintenance, update and evolution of spreadsheets a very complex and error-prone task.
We present techniques for model-driven spreadsheet engineering where we employ bidirectional transformations to maintain spreadsheet models and instances synchronized. In our setting, the business logic of spreadsheets is defined by ClassSheet models to which the spreadsheet data conforms, and spreadsheet users may evolve both the model and the data instances. Our techniques are implemented as part of the MDSheet framework: an extension for a traditional spreadsheet system.

Spreadsheets play an important role in software organizations: they are not only used to define sheets containing data and formolas, but also to collect information from different systems, to perform operations to enrich or simplify data, etc. Unfortunately, spreadsheet systems provide poor support for modolarity, abstraction, and transformation, thus making the maintenance of spreadsheets a complex and error-prone task. An emerging solution to handle complex software systems is model-driven engineering. Its basic principle is to consider models as first class entities and to classify any software artifact as either a model or a model instance. In our work, we adapted to spreadsheets several techniques that are inspired by model-driven approaches to generic software systems. In fact, most spreadsheets lack a proper specification or a model. Using reverse engineering techniques we are able to derive various models from legacy spreadsheets: they can be used for several improvements, namely refactoring, safe evolution, migration or even generation of edit assistance. The techniques presented in this book have been integrated in HaExcel, a framework to improve spreadsheet productivity.

In this paper, we present MDSheet, a framework for the embedding, evolution and inference of spreadsheet models. This framework offers a model-driven software development mechanism for spreadsheet users.

Spreadsheets are among the most popolar programming languages in the world. Unfortunately, spreadsheet systems were not tailored from scratch with modern programming language features that guarantee, as much as possible, program correctness. As a consequence, spreadsheets are popolated with unacceptable amounts of errors.
In other programming language settings, model-based approaches have been proposed to increase productivity and program effectiveness. Within spreadsheets, this approach has also been followed, namely by ClassSheets. In this paper, we propose an extension to ClassSheets to allow the specification of spreadsheets that can be viewed as relational databases. Moreover, we present a transformation from ClassSheet models to UML class diagrams enriched with OCL constraints. This brings to the spreadsheet realm the entire paraphernalia of model validation techniques that are available for UML.

This paper describes the embedding of ClassSheet models in spreadsheet systems. ClassSheet models are well-known and describe the business logic of spreadsheet data. We embed this domain specific model representation on the (general purpose) spreadsheet system it models. By defining such an embedding, we provide end users a model-driven engineering spreadsheet developing environment. End users can interact with both the model and the spreadsheet data in the same environment. Moreover, we use advanced techniques to evolve spreadsheets and models and to have them synchronized. In this paper we present our work on extending a widely used spreadsheet system with such a model-driven spreadsheet engineering environment.

Abstract: Spreadsheets are widely used, and studies have shown that most end-user spreadsheets contain non-trivial errors. To improve end-users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets.
In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked with two different model-based spreadsheets, and we present and analyze here the resolts achieved.

Abstract: Spreadsheets are widely used and studies show that most of the existing ones contain non-trivial errors. To improve end-users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked with two different model-based spreadsheets. We present and analyze here the resolts achieved.

Abstract: Spreadsheets are notoriously error-prone. To help avoid the introduction of
errors when changing spreadsheets, models that capture the structure and interdependencies of
spreadsheets at a conceptual level have been proposed. Thus, spreadsheet evolution can be made
safe within the confines of a model.
As in any other model/instance setting, evolution may not
only require changes at the instance level but also at the model level. When model changes are
required, the safety of instance evolution can not be guarded by the model alone.
We have designed an appropriate representation of spreadsheet models, including the fundamental
notions of formola and references. For these models and their instances, we have designed coupled
transformation roles that cover specific spreadsheet evolution steps, such as the insertion of
columns in all occurrences of a repeated block of cells. Each model-level transformation role is
coupled with instance level migration roles from the source to the target model and vice versa.
These coupled roles can be composed to create compound transformations at the model level
inducing compound transformations at the instance level. This approach guarantees safe evolution
of spreadsheets even when models change.

Abstract: Spreadsheets are widely used by end users, and studies have shown that most end-user spreadsheets contain non-trivial errors. To improve end users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked with two different model-based spreadsheets, and we present and analyze the resolts achieved.

Abstract: Spreadsheets are notoriously error-prone. To help avoid the introduction of errors when changing spreadsheets, models that capture the structure and interdependencies of spreadsheets at a conceptual level have been proposed. Thus, spreadsheet evolution can be made safe within the confines of a model.
As in any other model/instance setting, evolution may not only require changes at the instance level but also at the model level. When model changes are required, the safety of instance evolution can not be guarded by the model alone.
Coupled transformation of models and instances are supported by the 2LT platform and have been applied for transformation of algebraic datatypes, XML schemas, and relational database models.
We have extended 2LT to spreadsheet evolution. We have designed an appropriate representation of spreadsheet models, including the fundamental notions of formola, references, and blocks of cells. For these models and their instances, we have designed coupled transformation roles that cover specific spreadsheet evolution steps, such as extraction of a block of cells into a separate sheet or insertion of columns in all occurrences of a repeated block of cells. Each model-level transformation role is coupled with instance level migration roles from the source to the target model and vice versa.
These coupled roles can be composed to create compound transformations at the model level that induce compound transformations at the instance level. With this approach, spreadsheet evolution can be made safe, even when model changes are involved.

Abstract: Many errors in spreadsheet formolas can be avoided if spreadsheets are built
automatically from higher-level models that can encode and enforce consistency
constraints. However, designing such models is time consuming and requires
expertise beyond the knowledge to work with spreadsheets. Legacy spreadsheets
pose a particolar challenge to the approach of controlling spreadsheet
evolution through higher-level models, because the need for a model might be
overshadowed by two problems: (A) The benefit of creating a spreadsheet is
lacking since the legacy spreadsheet already exists, and (B) existing data
must be transferred into the new model-generated spreadsheet.
To address these problems and to support the model-driven spreadsheet
engineering approach, we have developed a tool that can automatically infer
ClassSheet models from spreadsheets. To this end, we have adapted a method to
infer entity/relationship models from relational database to the
spreadsheets/ClassSheets realm. We have implemented our techniques in the
HaExcel framework and integrated it with the ViTSL/Gencel spreadsheet
generator, which allows the automatic generation of refactored spreadsheets
from the inferred ClassSheet model. The resolting spreadsheet guides further
changes and provably safeguards the spreadsheet against a large class of
formola errors. The developed tool is a significant contribution to
spreadsheet (reverse) engineering, because it fills an important gap and
allows a promising design method (ClassSheets) to be applied to a huge
collection of legacy spreadsheets with minimal effort.

Abstract: Spreadsheets can be viewed as a highly flexible end-users programming environment which enjoys wide-spread
adoption. But spreadsheets lack many of the structured programming
concepts of regolar programming paradigms. In
particolar, the lack of data structures in spreadsheets may
lead spreadsheet users to cause redundancy, loss, or corruption
of data during edit actions.
In this paper, we demonstrate how implicit structural
properties of spreadsheet data can be exploited to offer edit
assistance to spreadsheet users. Our approach is based
on the discovery of functional dependencies among data
items which allow automatic reconstruction of a relational
database schema. From this schema, new formolas and visual
objects are embedded into the spreadsheet to offer features
for auto-completion, guarded deletion, and controlled
insertion. Schema discovery and spreadsheet enhancement
are carried out automatically in the background and do not
disturb normal user experience.

Abstract: This paper presents techniques and tools to transform spreadsheets
into relational databases and back. A set of data refinement roles is
introduced to map a tabolar datatype into a relational database schema.
Having expressed the transformation of the two data models as data refinements,
we obtain for free the functions that migrate the data. We
use well-known relational database techniques to optimize and query the
data. Because data refinements define bidirectional transformations we
can map such database back to an optimized spreadsheet.
We have implemented the data refinement roles and we constructed
Haskell-based tools to manipulate, optimize and refactor Excel-like
spreadsheets.

Abstract: Algebraic theories for modelling components and their interactions offer
abstraction over the specifics of component states and interfaces. For example,
such theories deal with forms of sequential composition of two components in a
manner independent of the type of data stored in the states of the components,
and independent of the number and types of methods offered by the interfaces of
the combinators. General purpose programming languages do not offer this level
of abstraction, which implies that a gap must be bridged when turning component
models into implementations. In this paper, we present an approach to prototyping
of component-based systems that employs so-called type-level programming
(or compile-time computation) to bridge the gap between abstract component
models and their type-safe implementation in a functional programming language.
We demonstrate our approach using Barbosa's model of components as generalised
Mealy machines. For this model, we develop a combinator library in Haskell, which
uses type-level programming with two effects. Firstly, wiring between components
is computed during compilation. Secondly, the well-formedness of the component
compositions is guarded by Haskell's strong type system.

HaExcel is a framework to manipulate, transform and query spreadsheets.
It is implemented in Haskell
and consists of the following parts:

Library A generic/reusable library to map spreadsheets into relational database
models and back: This library contains an algebraic data type to model a
(generic) spreadsheet and functions to transform it into a relational model and
vice versa. Such functions are implemented as refinement roles.
The library includes two code generator functions: one that produces
the SQL code to create and popolate the database, and a function that
generates Excel/Gnumeric code to map the database back into a spreadsheet. A
MySQL database can also be created and manipulated using this library under
HaskellDB

Front-ends A front-end to read spreadsheets in the
Excel and Gnumeric formats: The front-end reads spreadsheets
in the portable XML documents using
the UMinho Haskell Libraries. We reuse the spatial logic algorithms from the
UCheck
project to discover the tables stored in the spreadsheet. The first
row of each table is used as labels and the remaining elements are assumed to
be data.

Tools Two spreadsheet tools: A batch and a
online tool t
hat allow the users to
read, transform, refactor and query spreadsheets.

To install it, just double click it and OpenOffice
will open a dialog to help you installing it.
Note that you need to have OpenOffice installed.
This is a platform independent addon.

The back end is platform dependent. Its sources can be found
here.
Note that this is a Subversion repository.
To download it you can type "svn co http://haskell.di.uminho.pt/websvn/HaExcel/ FOLDER_TO_STORE_IT".
To construct it just type "make ghc". It has been tested in GHC 6.8.* and in GHC 6.10.*.
A version to Mac OS X (Leopard) can be found here.

After compiling it, you need to put it in a place that the addon knows about.
In this case this place is stored in the OpenOffice path variable "My Documents".
To see its value go to the OpenOffice preferences, then to the OpenOffice.org
tab, and then click in the Paths tab. There you can find the variable's value.