(Re)CANcentrate — Improving Error Containment and Reliability of Communication Subsystems based on CAN by Means of Adequate Star Topologies

FOCUS

The Systems, Robotics and Vision Group (SRV) of the University of the Balearic Islands (UIB) and the Electronic Systems Lab (LSE-IEETA) of the University of Aveiro (UA) have developed two innovative star topologies that push error containment and fault tolerance of CAN: CANcentrate and ReCANcentrate. Due to the specific characteristics of their hubs, CANcentrate and ReCANcentrate are fully compatible with existing CAN COTS components and are transparent to any application or protocol based on CAN. Both CANcentrate and ReCANcentrate have been the subject of a patent filing.

DESCRIPTION

Despite CAN has been used for several years in a wide range of applications due to its good dependability properties and its deterministic access delay, it is necessary to further improve CAN networks in order to make them suitable for systems that demand new features regarding dependability.
In a bus topology there are many components attached with each other without the appropriate error-containment mechanisms. Thus, a single fault in a network component, e.g., connector, transceiver, cable, CAN controller, etc, may generate errors that propagate along the communication subsystem. These errors can make impossible the communication capabilities of many nodes, causing a generalized failure of the communication subsystem. In fact, bus topologies present multiple points of severe failure, i.e. multiple points such that a single fault of any of them can make impossible the communication capabilities of more than one node.
In particular, the faults that can provoke a severe failure in CAN are: stuck-at-recessive, stuck-at-dominant and bit-flipping. A stuck-at fault occurs when a network component (either in the medium or in a node) issues a constant bit value, e.g., a transceiver that, due to a short circuit, constantly delivers a logical “0”. In CAN, the logical “0” is referred to as the “dominant” bit whereas the logical “1” is referred to as the recessive. Since dominant bits overwrite recessive bits, a stuck-at-dominant fault forces the medium to be permanently at the dominant level, regardless the place where the fault occurs. Stuck-at-recessive can only prevent further communication when they occur in the medium itself. A bit-flipping fault occurs when a network component sends randomly and erroneous bit values, which corrupt any bit stream conveyed by the medium, e.g., a medium partition, a bad welding, etc.
Some solutions based on replicated bus architectures and bus guardians tried to deal with these faults in CAN. Nevertheless, a replicated bus cannot prevent a faulty node from sending erroneous information to all media, and bus guardians cannot confine faults that occur in the medium, thus they still present multiple points of severe failure. Moreover, even when they are used together, replicated buses and bus guardians suffer from spatial proximity failures and from common mode failures.
In contrast, the centre of a star topology, e.g., a hub, has a privileged view of the communication among nodes, and could be used to detect and isolate faults occurring at media and nodes. Moreover, star topologies do not suffer from spatial proximity and common mode failures.
There are some star topologies available for CAN. The major part of these stars do not include error-detection mechanisms in the hub or are only able to detect stuck-at-dominant ports and requiring a high latency for this detection. Some of these stars even impose strong limitations on the bit-rate / star diameter or suffer from electrical drawbacks.
In order to improve fault confinement in CAN networks overcoming existing limitations, we firstly developed a new communication infrastructure we called CANcentrate [1, 2, 5]. The major benefit of CANcentrate is that it incorporates enhanced error detection and fault isolation mechanisms into the star hub so that it reduces the multiple points of severe failure of a bus topology to a unique one point of severe failure: the hub. Therefore the probability of a severe failure of the communications is significantly reduced as we have shown in a recent study [6].
CANcentrate yields benefits over the CAN bus for applications that require a communication infrastructure in which a minimum number of nodes can communicate with each other throughout a complete interval of time, e.g., in a factory plant it is required that a fault in any of its components jeopardizes the communication capabilities of the less number of nodes as possible. Moreover, CANcentrate is also better than CAN for applications that accept that it exists a minimum number of nodes that can still communicate, e.g., in the intra-building communication network of a hotel, in which the main objective is to provide service to the maximum number of rooms, even when faults occur. Note that CANcentrate brings to CAN similar features regarding error containment than a hub brings to Ethernet networks.
However, safety-critical applications, such as those used in X-by-wire systems, usually require a high reliable communication infrastructure, in which the probability that all nodes can communicate with each other throughout a complete interval of time is very high. In order to fulfil this requirement, we have developed a new communication infrastructure, we called ReCANcentrate [3, 4], that keeps all the error-containment capabilities already achieved by CANcentrate, further presents no points of severe failure and provides tolerance to hub and link faults. In fact, ReCANcentrate provides for CAN most of the features concerning fault-tolerance that are typical of protocols such as TTP/C. Finally, notice that both CANcentrate and ReCANcentrate are fully compliant with CAN, can be built using COTS components, and are transparent for any CAN-based application and any CAN-based protocol.