Systems and Software

Software and Large-Scale Distributed Information Processing Systems

Large-scale distributed information processing involves huge sets of nodes performing information acquisition, processing and communication, often in a self-organized manner, to achieve an ambitious overall system objective. Compounding the technological challenges, such processing must occur at high speed and with low power consumption. At the large scale envisioned in this proposal, our understanding of the process of building the architecture of complex systems is still in its infancy. The large number of devices in the resulting systems, the complexity of their structure, and their essentially distributed nature pose significant system-level challenges, such as designing algorithms for self-organizing large-scale communication and computation.

Algorithms and software play a major role in integrating the distributed components and guaranteeing the desired overall system behavior. In the past, software for such complex systems has been developed in a trial-and-error process that often caused large delays and cost overruns. In order to create new industries around such devices, we need to improve the development process to make it more predictable. We are on the verge of making fundamental advances in automating the production of software, from its design to its verification and deployment, including the ability to formally prove its correctness and security. We are also in the midst of a fundamental change in the way information is stored, distributed and processed: instead of the current static and relatively inflexible systems, we are moving towards more sophisticated interfaces, more powerful data manipulation languages, and an increasing ability to cope with the enormous amount of data including real-time streams.

The software infrastructure addresses questions of self-organization, reliability and resource usage of large distributed systems. At the system infrastructure level, questions raised by large systems for processing, storing and retrieving the vast amounts of data will be addressed. The following sections review, for each of the relevant area, the associated challenges and their relation to the three major application scenarios, namely wearable, ambient and space systems.

Self-Organization

In many of the applications considered within Nano-Tera.CH, devices will be expected to network themselves without centralized infrastructure in order to autonomously and adaptively react to changes in the environment. Self-organization by simple rules can be observed in nature, for example when flocks of birds adopt formations by following simple rules. However, it has been shown that discovering such rules has enormous computational complexity so we cannot hope to exploit this for engineering.

However, evolutionary computation can uncover such mechanisms and has been used to evolve self-organizing systems of robots, for example. Recent work in computational game theory has shown how to design adaptive agents that converge on Nash equilibria of a joint game that can to some degree be designed. The emerging field of mechanism design is about optimizing properties of these games and thus of the behavior that will be exhibited by the collective system. Based on these principles, we will develop systematic techniques for the design of self-organizing adaptive mechanisms with well-defined properties, and their application to ambient intelligence such as building control.

When components can explicitly coordinate through message exchange, it becomes possible to design self-diagnosing, self-configuring and self-repairing systems using algorithmic methods. Such autonomic computing systems have been used successfully for example in spacecraft. For many other applications within Nano-Tera.CH, the main challenge will be to develop reliable distributed versions of these algorithms. We will develop both centralized and decentralized tools for autonomic computing, including (1) self-diagnosis, (2) self-configuration, (3) error avoidance and (4) self-repair.

Dependability

Many of the envisioned application domains, such as health, security and environment, crucially depend on the reliability correctness, robustness, security, and availability of the underlying software. They must guarantee reliable, safe, secure and predictable operation, even under the occurrence of changes in requirements, environment changes during operation, hardware faults and software faults.

Fault tolerance and graceful performance degradation are particularly important in applications involving large numbers of small devices, some of which are likely to fail or loose communication. It can be achieved by replication, redundancy and diversity, meaning replicated computation using independently developed implementations. We will investigate such techniques as well as their combination with autonomic computing principles that can take advantage of them.

Future applications will increasingly rely on systems with limited resources, which are particularly vulnerable to attacks from malicious users. Data security will remain one of the primary concerns in information systems. In this program, we will develop and implement (1) cryptographic solutions that address embedded system security at different abstraction layers; (2) solutions for security in wireless sensor networks; (3) a methodology for security engineering; (4) techniques for verifying security properties by tracking information flow within applications and through the network.

While the traditional concerns on performance and features remain important, part of the emphasis must be shifted to guaranteeing software quality throughout specification, design, verification, integration, deployment, maintenance, including among others (1) techniques and automated tools for developing reusable components with a guarantee of quality (trusted components); (2) fault-tolerant and highly-available systems. Particular emphasis will be placed on making techniques easy to use by system implementers, either through integration in programming methodologies or through entirely new methods.

To avoid lengthy and inconclusive cycles of testing and repair, it is important to develop better design techniques rather than after-the-fact verification techniques. This includes (1) software architectures that guarantee certain properties by design; (2) preservation of properties by system composition (non-interference); (3) rich component interfaces that expose resource information (timing, power use); (4) composition and integration of heterogeneous subsystems (different concurrency semantics, hybrid systems).

The design of reliable systems should take advantage of progress in rooting software and system development in a mathematical basis, including: (1) formal specification; (2) computer-supported proofs of correctness, with links to the ongoing Grand Challenge on Verified Software; (3) advances in systematic testing; (4) model checking, constraints solving, decision procedures and automated theorem proving; (5) modular static analysis; (6) inclusion into programming languages of mechanisms allowing mathematical verification. A major challenge will be that embedded systems are hybrid systems linking physical and operational domains. Therefore, verification at least for some parts will have to take the physical aspect into account; this will also require language and verification techniques for enforcing constraints on timing and resources. Another challenge is the concurrent and distributed nature of software in this domain, which, when combined with high reliability requirements, prompts for new automated techniques for designing, analyzing and debugging concurrent and distributed software.

Resource Awareness and Real-Time

One of the major characteristics of the envisioned application domains is the interaction between the physical environment in which the systems are embedded and the computational behavior of the system itself. Classically, physicality and resource usage has been abstracted away from computation. The interaction with sensors and actuators, the use of energy and other scarce system resources as well as timely operation have been often sacrificed for the benefit of developing models and methods for versatile and large-scale computing systems.

The Nano-Tera program raises the challenge of resource awareness in large-scale distributed embedded systems, with the central issue of interdependence between software and resource usage:

developing component and interface concepts that are able to talk about resource interaction.

Efficient use of resources: In wearable and ambient systems, the available energy is a scarce resource. Energy harvesting devices have been and will be used to convert various forms of ambient energy, e.g., solar energy, light, temperature differences and pressure changes, into a usable form. This leads to asking a different fundamental question, from how to use energy efficiently to when and how to use energy. A consequence is the need to develop control strategies supporting a graceful degradation of available system services and the management of underlying applications on the basis of current a predicted availability of energy. The solutions must continue to guarantee dependability in the presence of adaptive and self-organized behavior.

System-wide properties: In ambient systems for building control and environmental monitoring, it is necessary to guarantee that sensor readings are transmitted within a certain time bound. In addition, ambient and wearable systems for multimedia applications require throughput constraints for transmitted data streams. Sensor fusion and distributed control require time-predictable operation. This makes it necessary to develop modular, component-based methods to analyze and construct hardware and software systems that provide end-to-end guarantees of delay and throughput. Adaptivity to changes in global system requirements (new nodes enter the system, new applications are running, changes in the environment) call for on-line methods that combine analysis, measurements, estimations and control.

Real-time behavior: The strict time demands and memory constraints of embedded applications call for new techniques for guaranteeing performance, in particular response time, in hard or soft real-time contexts. Distribution also raises a full new set of software challenges, which require practical solutions for the needs of the Nano-Tera.CH applications

Design and Concurrency

The new systems' scale requires a profound re-examination of software engineering techniques including:

project and configuration management;

distributed development, meaning both development of distributed software and distributed development among many teams and locations;

quality assurance for mission-critical and life-critical systems;

specific needs of scientific and engineering applications, which too often continue to use software development processes and techniques that do not benefit from advances in software engineering.

For large-scale information management, another challenge will be semantic interoperability, considering for example transferring data from the body network to other information systems.

The demands of highly parallel applications require better approaches to concurrency. The software for many of the Nano-Tera.CH applications will need to run in a concurrent, distributed or embedded mode. Concurrent programming, while increasingly needed and practiced (in particular with multi-threading and multi-core applications) still relies for the most part on techniques developed in the late nineteen-sixties and the seventies.

Languages - not only for programming but also for specification as well control languages for scripting, component composition, software management and configuration management for large-scale distributed systems and their implementations will play a fundamental role. The effort includes:

language mechanisms supporting programmer expressiveness as well as software reliability;