Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Enhance functional safety in embedded designs

Posted: 22 May 2013 ?? ?Print Version ?Bookmark and Share

Keywords:Embedded networked systems? Process control? Ethernet Switch?

We also need to look at possible failure modes when remote code updates or other sensitive messages are sent over the network. Without a sufficient level of data protection, transmission errors or malicious attacks could alter program code execution, incorrectly adjust trigger levels or capture sensitive operating parameters. Standard error detection functions (like a Cyclical Redundancy Check or CRC) can be used to protect messages from transmission errors. The Ethernet Switch will automatically check messages for errors using this technique. If required the System Controller can implement additional Error Detection and Correction functions. Cryptographic protocols and standard encryption algorithms can be used to improve the security of network traffic within the system by securing the data in transit and authenticating remote facilities.

Single event upsets as a source of errors
The Single Event Upset (SEUs) phenomenon was first discovered in 1979 by Intel and Bell Labs as failures in DRAMs and is attributed to stray alpha particles or neutrons 'flipping' the memory cell. In 1999 Sun Microsystems noticed errors in cached SRAMs for mission critical servers. In space and aviation applications the effects of radiation on electronics is well understood as operational altitudes have a higher neutron flux. However, the SEU phenomenon is increasingly becoming a concern at sea level as well. The continuous drive to smaller semiconductor geometries reduces the charge at each SRAM cell and the ever increasing content of electronics in fielded systems increases the likelihood of SEU related SRAM errors. Note that Flash memories, which require a significantly higher energy level to 'flip' state, are immune to these types of SEU events.

Mitigation of errors via redundancy, design diversity
In safety critical systems redundancy is mandatory to operate properly in the event of a failure. There are two well-known techniques that are widely utilisedDual Modular Redundancy (DMR) and Triple Modular Redundancy (TMR). In the case of Dual Modular Redundancy, duplicate designs work in parallel. Each processing element receives the same input and a fail-safe certification engine checks for consistency. If a fault is identified then prevention must be taken to avoid a failure. Triple modular redundancy creates three duplicate designs and the results of each output are presented to a voting circuit such that the output state that receives the most votes is set. This can withstand the complete failure of one sub-system and allows a supervisor circuit to attempt to fix the fault, or alert an operator.

A design diversity methodology is sometimes employed to further improve reliability. Using this methodology parallel designs are not just duplicated but will perform the same function using a different implementation. For example, an FPGA might be used for one of the designs and the parallel design might use an MCU. This diversity in the target implementations increases reliability even more since errors related to complex design or implementation 'bugs' will not be duplicated in dramatically different targets.

Implementing redundancy in our example design
Let's take our example design and look at how we can significantly improve reliability by using the redundancy techniques previously described. Figure 2 shows the changes to the example system. In order to improve network reliability we added redundant Ethernet connections to the upstream node and the downstream node. The new redundant power sub-system helps recover from a failure in the main supply. Power will switch over to a redundant supply if the main supply fails. The System Controller and Equipment Controller are now each implemented using a Dual Modular Redundancy (DMR) technique, as illustrated in the 'blow-up' of the Equipment Controller (The System Controller would use a similar technique). The controller functions are duplicated and compare logic is added to identify any outputs that do not 'agree'. When such an error is detected the sub-system responsible for the error can be reset and diagnostics performed. This mitigates the chance of the error resulting in a system failure. Note that the dual implementations of the System Controller use a design diversity technique. One controller is implemented with an MCU and the other is implemented with an FPGA. This provides additional reliability since each implementations error characteristics will be significantly different and thus the chance of a common systematic error (for example their response to noise, temperature, voltage, timing differences or even implementation 'bugs') will be significantly reduced.

Figure 2: Example design with DMR and design diversity.

SmartFusion2 SoC FPGAs
When implementing the example design from figure 2, it would be possible to use separate components for the FPGA implementation of the controller, the MCU implementation and the Compare Logic blocks. These extra components will create new possibilities for errors and system failures however. An approach that integrates all of these functions into a single device has advantages to the system designer, namely better MTBF due to the reduction of components, better cost, and now the ability to drive functional safety into smaller systems.

?First Page?Previous Page 1???2???3?Next Page?Last Page

Article Comments - Enhance functional safety in embedde...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top