Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Memory access ordering in complex embedded designs

Posted: 22 Dec 2014 ?? ?Print Version ?Bookmark and Share

Keywords:embedded systems? processor? Sequential Execution Model? SEM? Compilers?

The DSB instructions are required to ensure that the side effects of the context-changing operation (update page tables and invalidate TLB) are complete before the program continues. The ISB instruction is required to ensure that all subsequent instructions are loaded in the new context rather than the old.

Moving from single-core, single-threaded systems to multi-threaded, multicore systems opens another can of worms!

It is obviously important that multiple processors that share memory have a consistent and coherent view of the contents of that memory. Suppose one processor in a system updates two memory locations, X and Y, in that order. We might assume that other processors reading Y then X would read either:
???New values for both X and Y
???The old value for Y and the new value for X
???The old values for both X and Y

We should also be able to assume that they will NEVER see the new value of Y and the old value of X. We used to be able to make that assumption, when the SEM held across whole systems, but we cannot make that assumption any more.

In multicore/multi-processing systems, the processors almost certainly share regions of memory through which they communicate, share data, and pass messages. This could be a common heap, an operating system message pool, or a frame buffer share between an application processor and a GPU.

In the following sequence, in which order are A and B loaded from memory?

LDR r0, [A]
LDR r1, [B]
ADD r2, r0, r1
STR r3, [C]

If B is cached and A is not, then B may actually be loaded before A (in the sense that the LDR for B will complete before the LDR for A). This may not matter, but it may do so if either variable is being updated by an external agent. If the values are in some way correlated in time, then this will cause problems.

In order to provide controlled access to critical sections, we might implement some kind of lock or guard code that might look like this:

LDR r0, [S]
ADD r2, r1, #1
STR r2, [S]
MOV r3, #0
STR r3, [LOCK]

This assumes that S is updated in memory before LOCK. That may not be true if S is cached and LOCK is not. Or if a write buffer chooses to re-order the writes. You might be tempted to fix this problem by placing the lock variable in device memory (or possible device shared memory) but placing all shared memory in device regions is going to have unacceptable effects on performance.

The solution is either to use memory barriers or to employ a more robust form of locking. ARM provides exclusive access functionality via the LDREX/STREX pair of instructions. When used together, these allow a programmer to implement robust lock constructs as used by most popular operating systems.

Load and store exclusive
ARM's LDREX and STREX exclusive access instructions:

LDREXThe load exclusive instruction carries out a load from an addressed memory location and also flags that location as reserved for exclusive access. The flag will be cleared by a subsequent store to that location.

STREXThe store exclusive instruction stores from a register to an addressed memory location and returns a value indicating whether the addressed location was reserved for exclusive access. If it wasn't, the store doesn't take place and memory is unchanged. The exclusive reservation is cleared regardless of whether the store succeeds or not.

CLREXClear exclusive is intended for use in context switches, the CLREX instruction clears any exclusive access reservations in the memory system.

Since an exclusive reservation is cleared by any subsequent store, exclusive or not, these instructions can be used by a lock construct, such as a mutex, to set a new value for a lock variable only if no other program has done so since this particular program checked its value.

A lock routine might look like this:

; void lock(lock_t * addr)
???LDREX r1, [r0] ; check current lock value
???CMP r1, #LOCKED
???BEQ lock ; try again

???MOV r1, #LOCKED
???STREX r1, r2, [r0]
???CMP r2, #0 ; if store failed, try again
???BNE lock

???BX lr

And the corresponding unlock function:

; void unlock (lock_t *addr)
???DMB ; ensure accesses have completed
???STR r1, [r0]
???BX lr

Note that an STREX is not required to clear the lock, only when testing and setting it.


int flag = BUSY;
int data = 0;

int somefunc(void)
??&nbsp while (flag != DONE);
??&nbsp return data;

void otherfunc(void)
??&nbsp data = 42; flag = DONE;

What would you expect somefunc() to return? 42? Well, that's possible. In a multi-threaded system, so is 0!

About the author
Chris Shore is Training Manager at ARM headquarters in Cambridge, UK. In that role for the past ten years he has been responsible for the worldwide customer technical training team, coordinating the delivery of over 80 courses per year to licensees all over the globe. He is passionate about ARM technology and, as well as teaching customers, regularly presents papers and workshops at engineering conferences. Starting out as a software engineer in 1986, his career has included software project management, consultancy, engineering management, and marketing. Chris holds an MA in Computer Science from Cambridge University. This article was presented at the Embedded Systems Conference as a part of a class that Chris Shore taught on "Memory Access Ordering in Complex Embedded Systems" (ESC-231).

?First Page?Previous Page 1???2???3???4???5

Article Comments - Memory access ordering in complex em...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top