Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Processors/DSPs

Minimize concurrency errors in multicore design

Posted: 26 Mar 2012 ?? ?Print Version ?Bookmark and Share

Keywords:multicore processors? interleavings? concurrency?

Manufacturers of microprocessors are increasingly choosing multicore processors as the best approach to improving performance in computationally intensive embedded systems. This is the result of having reached the limits of performance gains that can be realized from miniaturization and integration.

However, realizing the potential performance gains with multicores has a cost. Software written to be single-threaded for a single core processor will realize little or no performance benefit when executed on a multicore processor. To take advantage of the cores it must be rewritten or adapted to use multithreading.

The key challenge is to keep the cores busy as much as possible, while ensuring that they coordinate access to shared resources properly. Unfortunately writing such code is much harder than writing single-threaded code. Concurrent code is vulnerable to entirely new kinds of programming defects, such as deadlocks or race conditions, and when these occur, they can manifest in ways that are difficult to diagnose. Traditional techniques for finding and eliminating concurrency errors may be ineffective.

A principal reason concurrency bugs are so difficult is because there is an enormous number of ways in which the events in the threads can be interleaved when those threads execute. As the number of threads or instructions increases, the number of interleavings increases exponentially. If thread A executes M instructions and thread B executes N instructions, there are N+MCN possible interleavings of the two threads. For example, given two trivial threads with ten instructions each, there are 184,756 possible interleavings of those instructions. Even with very small programs it is clear that it is next to impossible to test all combinations. Secondly, even if it is feasible to identify a single interleaving that leads to a failure, it can be very difficult to set up a repeatable test case that uses that particular interleaving because scheduling of threads is effectively nondeterministic. Consequently debugging concurrent programs can be very expensive and time consuming.

Insidious race conditions
Of all the concurrency defects that can occur, race conditions are probably the most pernicious. Race conditions have been responsible for numerous system failures over the years. The famous case of the Therac-25 radiation therapy machine featured a race condition that was responsible for the deaths of several patients [2]. Similarly, the 2003 Northeast blackout was exacerbated by a race condition that resulted in misleading information being communicated to the technicians [3].

There are several different kinds of race conditions. One of the most common and insidious formsdata racesis the class of race conditions involving concurrent access to memory locations. A data race occurs when there are two or more threads of execution that access a shared memory location, at least one thread is changing the data at that location, and there is no explicit mechanism for coordinating access. If a data race occurs it can leave the program in an inconsistent state.

Figure 1: Race condition leads to incorrect count of items on an assembly line.

An example of a data race is shown in figure 1. A manufacturing assembly line has entry and exit sensors, and maintains a running count of the items currently on the line. This count is incremented every time an item enters the line, and decremented every time an item reaches the end of the line and exits. If an item enters the line at the same time that another item exits, the count should be incremented and then decremented (or vice-versa) for a net change of zero. However, normal increment and decrement are not atomic operations: they are composed of a sequence of individual instructions that first load the value from memory, then modify it locally, and finally store it back in memory. If the updating transactions are processed in a multithreaded system without sufficient safeguards, a race condition can arise because the sensors read and write a shared piece of data: the count. The interleaving in figure 1 results in an incorrect count of 69. There are also interleavings that can result in an incorrect count of 71, as well as a number that correctly result in a count of 70.

1???2???3?Next Page?Last Page

Article Comments - Minimize concurrency errors in multi...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top