Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > EDA/IP

Checks for cache-coherency verification in complex SoCs

Posted: 16 Jul 2015 ?? ?Print Version ?Bookmark and Share

Keywords:Cache? Cache Coherency? verification? Memory Built-in Self-test? MBIST?

Cache, in its crude definition, is a faster memory that stores copies of data from frequently used main memory locations. Nowadays, multiprocessor systems are supporting shared memories in hardware, and the question arises: how can these different processors share each other's caches? The answer is Cache Coherency. Shared memory systems implement a coherence protocol for this purpose. Coherency seeks to make the caches of a shared-memory system functionally available to all the processors. In this article, we present the efficient checks for Cache Coherency verification in Complex SoCs.

Today's multiprocessors have different independent caches, including instruction and data caches and a translation lookaside buffer (TLB). An instruction cache speeds up executable instruction fetch, a data cache speeds up data fetch and store, and a TLB speeds up virtual-to-physical address translation for both executable instructions and data. The data cache is usually organized as a hierarchy of more cache levels (L1 level, L2 level, etc.). Coherency is usually maintained at the granularity of cache blocks. That is, the hardware enforces coherence on a block by block basis.

Let us assume we are having a subsystem in which we are having 2 cores, namely core0 (shown as CPU0 in the figure) and core1 (shown as CPU1 in the figure). Each core is having its independent instruction cache (L1 Icache) and data cache (L1 Dcache) and they share a common L2 data cache. The L2 cache is accessible to both the cores via Snoop Control Unit (SCU). SCU is a module that helps in sharing the common pool of data between the cores, core0 and core1. This overall subsystem may be termed as a Cluster. An overall system comprises of one or more such clusters. The cores present in different clusters can access each other's L2 caches using Cache Coherency Interface (CCI). Figure 1 is showing the above mentioned fundamentals in a two- cluster system.

Figure 1: A two- cluster system.

Basically there are two types of coherency:

Intra-cluster Coherency: When the cores present within a cluster are sharing data present in their L2 cache via SCU, this is termed as Intra-cluster Coherency.

Inter-cluster Coherency: When the cores present in different clusters are sharing data. For example, if core0 and core2 are sharing data present in their L2 cache, it is done via CCI and it is termed as Inter-Cluster Coherency.

Now we will be presenting the various scenarios for foolproof Cache Coherency verification at SoC level.

1. Setting up MMU and page tables for complete L1 caches for each core: A memory management unit (MMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses. An MMU is effectively performing the virtual memory management, handling at the same time memory protection, cache control and bus arbitration. We check the performance enhancement when we are filling cache lines completely. There are 4 possible combinations which need to be checked for generating the overall cache performance numbers for each core. Cores' Performance counters or timers available on SoC are used to check the performance differences when the caches are enabled/disabled. SV based assertions are used to check the data sanity and Cache line fills along with directed test cases.

Table: There are 4 possible combinations which need to be checked for generating the overall cache performance numbers for each core.

2. Verification of Clock gating at cluster level: In general when the core goes to low power state (for example, ARM cores go to low power state by executing wfi instruction), its program counter stops itself and wait for interrupt/event to trigger the core back into full power mode. Since we are having multi-core system, so we can save more power if we can send both the cores of a cluster into low power simultaneously and clock-gate the L1 and L2 caches and SCU of that cluster. The problem arises if all the four cores of the cluster-system are participating in coherency. We will have to send a signal to other cluster that this cluster and its cores are going out of coherency and the data copies present in caches needs to be moved to invalid state. Only after following this protocol, we can move to low power state.

1???2?Next Page?Last Page

Article Comments - Checks for cache-coherency verificat...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top