Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
?
EE Times-Asia > Controls/MCUs
?
?
Controls/MCUs??

Managing single event upsets with C-slow retiming

Posted: 19 May 2015 ?? ?Print Version ?Bookmark and Share

Keywords:pipelining? C-slow retiming? single event upsets? processors? FPGA?

With memory duplication it is guaranteed that a faulty memory content will only impact a single design copy; the remaining design copies are not affected. Figure 3 illustrates that when an instruction or data from an external memory reaches the incoming register with comparison logic, faulty data or instructions are detected by the proposed mechanism. Once a thread behaves differently, it can be recovered by using the method discussed in the next section, which can also help to clear spikes from incoming data streams.

Recovery
When an SEU is detected, safety-critical designs can restart or execute predefined software recovery routines. When using CSR, an on-the-fly recovery is possible. Figure 4 shows the CSR-ed design enhanced by an SEU detection circuit. When C >= 3, the SEU detection circuit uses a majority decoder to detect the failing thread by comparing the key register values of C identical threads. This is performed every C micro-cycles.

Figure 4: On-the-fly recovery.

A modified write enable sequencecontrolled by a finite state machine (FSM)then overwrites the specific Rn register associated with the failing thread. This write control must also be combined with a specific Rn read sequence to establish an on-the-fly recovery mechanism.

Results
In this portion of our discussions, two alternatives are compared. First, a system is instantiated three times to detect SEUs. Second, the system uses CSR to generate a time-redundant system as discussed. The results are based on empirical data and can be found in a workshop paper [1] as well as in an extended version [2].

Area on FPGAs: It can be assumed, that instantiating a system three times generates (roughly) three times the area of the original system. The nice thing with CSR is that, for C=3, it comes at almost no additional area cost on FPGAs than the original (single) system, because of the freely available register and memory resources (assuming memory usage is not critical).

Area for ASICs: Here again, a triply redundant system needs three times the area of a single system. When CSR is applied, it can be said (simplified), that only the registers (not the combinatorial logic) need to be duplicated, which also results in a lesser area than the traditional alternative.

Performance: Due to register insertion on the critical path (although timing-driven), it can be said that the maximal performance is most likely reduced (to approximately 90% of the original performance for C=3). But it may be that this is not the critical aspect in the project overall.

Power Consumption: Another very interesting point is the comparison of power consumption. Let's assume that instantiating three identical systems results in three times the power consumption of a single system. In the case of the proposed CSR solution, the signals don't toggle when identical threads are executed three times, which saves energy. On the other hand, the clock needs to run three times faster, which results in higher power consumption by the clock tree. Overall, the CSR-ed system results in significantly reduced power consumption as compared to the alternative of instantiating the same system three times. Additionally, the smaller area of the CSR-ed system might make it possible to use a smaller FPGA in the project, and a smaller FPGA usually generates less IDD current.

Recovery: The solution discussed here allows the implementation of an on-the-fly recovery mechanism. This makes it possible to recover from a SEU within one cycle, which means that complex software recovery execution time can be avoided.

Conclusion
This column describes how to use C-Slow Retiming to generate a time-redundant digital design. The advantages of the proposed solution are significantly reduced area and less power consumption, as compared to the traditional technique of instantiating the same system multiple times. The on-the-fly (single cycle) recovery mechanism is an intriguing feature that can only be used when the system uses C-Slow Retiming.

Please feel very welcome to share your thoughts on this with me and the community. I'm especially interested to hear information about known projects where the discussed solution is already applied, or about anything that might make the usage of the discussed solution less applicable.

References
[1] T. Strauch, "Using C-Slow Retiming in Safety Critical and Low Power Applications", First International Workshop on FPGAs in Aerospace Applications, FASA 2014, 5 September 2014, Munich, Germany, pp. tbd [http://www.edaptix.com/FASA2014_Strauch_CSR_SEU.pdf]
[2] T.Strauch, "Running Identical Threads in C-Slow Retiming based Designs for Functional Failure Detection", Cornell University Library, arXiv:1502.01237, [http://arxiv.org/abs/1502.01237]

About the author
Tobias Strauch is a freelance contractor.


?First Page?Previous Page 1???2



Article Comments - Managing single event upsets with C-...
Comments:??
*? You can enter [0] more charecters.
*Verify code:
?
?
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

?
?
Back to Top