Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

PCM progress report no. 5: Scaling issues

Posted: 23 Feb 2012 ?? ?Print Version ?Bookmark and Share

Keywords:phase change memory? solid state drive? scaling?

Phase change memory (PCM) devices from Micron were the stars of a paper [1] presented by members of the Department of Computer Science and Engineering at the University of California, San Diego at the recent Hot Storage conference [2].

They described the results of testing a PCM-based solid state drive (SSD) called Onyx. This SSD uses what are described as Micron's "first generation" P8P (90nm) 16 MB PCM devices. Onyx has a capacity of 10GB organized in 8 banks of 1.25GB, connected to a host system by a PCIe bus. Data storage is allocated 8GB of storage with 2GB of storage for error correction. Figure 1 is a schematic of the high-level architecture of Onyx.

Figure 1: The Onyx high level architecture, with The Brain able to simultaneously track up to 64 in flight requests.

Some concerns were raised that Onyx may not have been fully populated, the system requiring some 640 PCMs each with16MB capacity. We have now had assurances from Adrian Caulfield, one of the authors of the paper [1], that the system was fully populated, with all its 16 x 40 PCM DIMMs. It represents the largest collection of PCMs that has been subjected to the rigors of assembly and shown to the public in a system; this is a significant PCM milestone.

Onyx as a prototype system is based on the design of Moneta, an SSD that was designed in anticipation that at some time, some type of non-volatile memory would become available. It uses DRAM in place of PCM. Onyx now uses PCM in place of the DRAM, but it retains the highly-optimized software stack of Moneta to minimize latency and maximize concurrency.

In essence, the Onyx architecture employs eight memory controllers, each controlling 1GB memory and linked on 4GB/s ring communicating with the "brain" of the system that interfaces with the PCIe bus. The prototype system employs four FPGAs ring connected, with four DIMMs to each FPGA. The system clock frequency is 250MHz. Each DIMM has 40 of Micron's 16 MB P8P PCM devices. The DIMMs fit into a standard DIMM slot

Some of the techniques for dealing with PCM design challenges, "its own idiosyncrasies" [1], are worth commenting on. The first is the use of a "large capacitor" to assure that PCM does not breach the fundamental definition of a NV memory, i.e. it does not lose data in the event of a mains failure. The use of the large capacitor is not quite as bad as it might at first appear. The PCM controller is able to provide two indications of the write to PCM status. One is called "late completion," indicating write is complete. The other, called "early completion," is provided when all the data is in the PCM buffers. Early completion is used to allow Onyx to hide most of the write latency but is vulnerable to power failure. In the event of a mains failure, the large capacitor has enough power to complete the write operation. The position is defended on the basis that flash can achieve this. It is claimed the use of early completion provides a peak bandwidth per PCM DIMM pair of 156 MB/s for read and 47.1MB/s for writes.

The next PCM "idiosyncrasy" design challenge with which the U of Cal team had to deal, is PCM wear out [2]. They cited discussions with the PCM manufacturer explaining the difference between lifetime of a PCM and flash. Simply put, the PCM lifetime, 1 million cycles, is an estimate of the number of programs per cell before the first bit error occurs in a large population of the device (no population number provided) without error correction. While for flash, lifetime is the number of program/erase cycles before the error-correcting scheme can no longer handle the problems.

To deal with the write lifetime and wear out problem, Onyx employs what is claimed as the first real-system implementation of a "start-gap" wear-leveling scheme in order to avoid uneven PCM wear out. In operation, it slowly rotates the mapping between 4KB rows of PCM memory and their storage addresses. If the storage address of row x is n, after some interval it will become n+1 and so on. This does mean that, periodically, memory content must be rewritten. The start-gap interval used was 128. It introduces a new term into the memory lexicon "line vulnerability factor," as the number of writes to an address before it is rewritten by start-gap. In a system, the tradeoff is vulnerability against extra overhead for access and writing.

1???2???3???4?Next Page?Last Page

Article Comments - PCM progress report no. 5: Scaling i...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top