Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Embedded

Grasping parallelism and concurrency

Posted: 20 Mar 2014 ?? ?Print Version ?Bookmark and Share

Keywords:parallelism? concurrency? GPU? preemption? picothreads?

For most of us, the terms parallelism and concurrency are almost synonymous, and even if we feel they represent somewhat different concepts, we don't all agree on what makes one construct parallel and the other concurrent. But as we enter the era of multi-core, manycore, and GPU-based computing, it is useful to make a distinction between these two, and see how those distinctions affect the embedded programmer's world.

Concurrent programming has a long history in embedded systems, and in that context it represents the approach of using separate threads of control to manage the multiple activities taking place. Often these threads of control have different relative priorities, with interrupt handlers preempting non-interrupt code, and code requiring a higher invocation frequency or an earlier deadline preempting code with lower frequency or later deadlines. For threads of control managing activities with equivalent priorities, often a round-robin scheduling approach is used to ensure no activity gets ignored for too long. None of this requires multiple physical processors, and it can be seen overall as a way to share limited resources in an appropriate way when many activities need to be managed. It can also be seen as a way to simplify the logic of the programming, by allowing any given thread of control to be restricted to managing a single activity. Because introducing concurrent threads of control can simplify the logic of the overall program, it is all right if the language constructs to support this approach are themselves of somewhat heavier weight.

In contrast with concurrent programming, parallel programming is often more about solving a computationally-intensive problem by splitting it up into pieces, using an overall divide-and-conquer strategy, to take better advantage of multiple processing resources to solve a single problem. Rather than simplifying the logic, introducing parallel programming can often make the logic more complex, and therefore it is important that the parallel programming constructs be very light weight both syntactically and at run-time, or else the complexity and run-time overhead may outweigh the potential speed-up.

Figure 1: Work stealing approach using double ended queues.

Scheduling the threads
For both concurrent and parallel programming, the actual number of physical processors may vary from one run of the program to the next, so it is typical to provide a level of abstraction on top of the physical processors, typically provided by a scheduler. For concurrent programming, preemption is often used by the scheduler, where one thread is interrupted in the middle of its execution to allow a higher priority thread to take over the shared processing resource. For parallel programming, more often the goal is simply to get all of the work done in the shortest possible time, so preemption is not used as often, while overall throughput becomes the critical criterion of success.

For concurrent programming used to manage external activities of varying criticalities, a real-time scheduler often relies on priorities assigned using some sort of rate-monotonic or deadline-monotonic analysis, to ensure that all threads will meet their deadlines [3]. In contrast, for parallel programming, an approach called work stealing (figure 1) has emerged as a robust approach to balancing the load across multiple physical processors, while providing good locality of reference for a given processor, and good separation of access between processors.

Work stealing [2] is based on some relatively simple concepts, but often requires very careful coding of the underlying scheduler to achieve the required low overhead. The basic idea is that computations are broken up by the compiler into very light-weight threads (which we will call picothreads), and in the scheduler each physical processor has a server with its own double-ended queue (often called a deque) of picothreads (figure 1).

A typical picothread might represent the execution of a single iteration of a loop, or the evaluation of a single subexpression. Picothreads are automatically placed on the tail of the deque of the server that spawned them, and when a server finishes working on a given picothread, it removes the picothread it most recently added to its own deque (from the tail), and starts running that one. This last-in-first-out (LIFO) discipline is using the deque effectively as a stack of work to do.

1???2???3?Next Page?Last Page

Article Comments - Grasping parallelism and concurrency
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top