In this paper, we propose a design framework for distributed embedded control systems that ensures reliable execution and high quality of control even if some computation nodes fail. When a node fails, the configuration of the underlying distributed system changes and the system must adapt to this new situation by activating tasks at operational nodes. The task mapping as well as schedules and control laws that are customized for the new configuration influence the control quality and must, therefore, be optimized. The number of possible configurations due to faults is exponential in the number of nodes in the system. This design-space complexity leads to unaffordable design time and large memory requirements to store information related... (More)

In this paper, we propose a design framework for distributed embedded control systems that ensures reliable execution and high quality of control even if some computation nodes fail. When a node fails, the configuration of the underlying distributed system changes and the system must adapt to this new situation by activating tasks at operational nodes. The task mapping as well as schedules and control laws that are customized for the new configuration influence the control quality and must, therefore, be optimized. The number of possible configurations due to faults is exponential in the number of nodes in the system. This design-space complexity leads to unaffordable design time and large memory requirements to store information related to mappings, schedules, and controllers. We demonstrate that it is sufficient to synthesize solutions for a small number of base and minimal configurations to achieve fault tolerance with an inherent minimum level of control quality. We also propose an algorithm to further improve control quality with a priority-based search of the set of configurations and trade-offs between task migration and replication. (Less)

@inproceedings{e4027baa-885a-47db-bd76-bd796eb58a68,
abstract = {In this paper, we propose a design framework for distributed embedded control systems that ensures reliable execution and high quality of control even if some computation nodes fail. When a node fails, the configuration of the underlying distributed system changes and the system must adapt to this new situation by activating tasks at operational nodes. The task mapping as well as schedules and control laws that are customized for the new configuration influence the control quality and must, therefore, be optimized. The number of possible configurations due to faults is exponential in the number of nodes in the system. This design-space complexity leads to unaffordable design time and large memory requirements to store information related to mappings, schedules, and controllers. We demonstrate that it is sufficient to synthesize solutions for a small number of base and minimal configurations to achieve fault tolerance with an inherent minimum level of control quality. We also propose an algorithm to further improve control quality with a priority-based search of the set of configurations and trade-offs between task migration and replication.},
author = {Samii, Soheil and Bordoloi, Unmesh D. and Eles, Petru and Peng, Zebo and Cervin, Anton},
booktitle = {24th Euromicro Conference on Real-Time Systems (ECRTS), 2012},
isbn = {978-1-4673-2032-0},
issn = {1068-3070},
language = {eng},
pages = {68--77},
publisher = {IEEE--Institute of Electrical and Electronics Engineers Inc.},
title = {Control-Quality Optimization for Distributed Embedded Systems with Adaptive Fault Tolerance},
url = {http://dx.doi.org/10.1109/ECRTS.2012.40},
year = {2012},
}