Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Insights on using NAND flash in portable designs

Posted: 14 Nov 2007 ?? ?Print Version ?Bookmark and Share

Keywords:NAND flash? portable designs? solid-state memory? software development?

By Lane Mason and Robert Pierce
Denali Software

As the raging success of Apple's iPod still rings in our ears, NAND flash memory is seen as the rising star of solid-state memory for portable and consumer applications.

This acceptance and use has largely been caused by NAND flash memory's rapid decline in price over the past three to four years (approximately 50 percent per year), which places it in a very competitive position (along with its other positive attributes) in relation to more established media such as tape, compact discs and magnetic hard disk drives.

In addition to its low cost per gigabyte, it is touted as being more shock resistant than CDs or HDDs, more compact (smaller form factor) for the small configurations needed in most of today's consumer applications and more systemically consistent with the remainder of the solid-state electronic system, MPU or APU and analog circuitry. Flash memory requires no mechanical parts and no read sensor to track location on the disk. It does, however, require much lower power (operating and standby) in this era of heightened power consciousness.

Even with all these good product attributes and its low cost, NAND flash memory still has some work to do to convince users that it is better than its storage alternatives.

Most flash users are concerned with single-level cell (SLC) and multi-level cell (MLC) endurance because of the wearing-out of mechanisms for reading and writing the stored charge in the oxide layer. If infinite reads and writes are necessary, NAND flash memory will not work. If 100K read/write cycles are good enough, SLC is the way to go. However, if 3K to 10K read/write cycles are consistent with the lifetime of your NAND-containing product, then MLC might be a better choice.

Other concerns for users and system designers considering choosing NAND flash memory are its inability to randomly access data, its high latency and its slow write times, all of which pose some design hurdles and add system complexity. Some applications need better performance along some axis or in some metric and must stick with NOR flash, DRAM or HS SRAM or HDD, tape or CD.

Still other concerns within the industry are a lack of NAND interoperability, a vendor-specific product specification, and a lack of other feature standards. This makes controller design a real challenge, especially when the user wants multisourcing, a clear road map to next-generation standards and products, and a road map to future performance. Although the Open NAND flash Interface consortium has made real progress, major NAND makers Toshiba, Samsung and SanDisk still remain unsure whether this "rush to standards" path will lead to their own eventual prosperity or to their demise.

Toward the future
The jury is still out, and until it comes back with a standards road map agreed to by all major NAND players, excess work will be required to switch NAND suppliers, to plan future products according to enforceable NAND specifications and to design flash controllers that can serve more than one NAND master.

Mason and Pierce: NAND flash is the rising star of solid-state memory for portable apps.

All this terra incognita is steadily being filled in, however, at shows such as MemCon, hosted by Denali Software, which this year featured over 30 presentations about memory trends and products with a special focus on NAND flash memory. Intel and Samsung, in particular, are doing a lot of good technical work on the attributes of NAND flash memory in various applications and reported the results of their own studies on flash power use, failure rates (HDD and flash), shock resistance (HDD and flash), detailed first access and steady-state bandwidth for flash and HDD, as well as reinforcing the cost advantage of NAND with updated market data on pricing. Total cost of ownership (TCO) over the life of the product is being applied to design choices between NAND flash and alternatives with great effect.

The technical and marketing issues of 2006 are now laid to rest, for the most part, and new items have come forth to be understood as devices are scaled from 70nm to 50nm, as MLC becomes the dominant storage technique (and four-bit cells start to appear) and as NROM technology from Saifun and Spansion gains a larger standing in many applications. The performance and reliability changes resulting from these technical advances in NAND flash memories need to be understood and communicated to users and potential users in the market, for better or for worse. The details below can help one understand how best to design a solution to one's application.

Getting started
When working with different manufacturers of flash devices, there are many features that vary or are not supported when trying to use devices from multiple sources.

The command sequence (e.g. cache read), command values and number of address cycles are different. Some devices have multiple planes and different command sets as well as copyback command sequences for single plane and multiplane devices; block location information, I.D. location and pin out variations are different. These are just a few of the problems found when looking at multiple sources.

NAND boot can be quite easy when dealing with SLC devices. Device manufacturers at this time still support known good boot block, meaning they guarantee that all of the bits in Block 0 are good. For MLC, this is still currently true, but you can never tell when a read disturb could occur, so some form of error correction code (ECC) should be used when booting. Also it is possible that, in the future, some manufacturers might not support known good boot block. If this should happen, then support for booting will become much more difficult and require additional circuitry to allow for ECC correction and a multistage boot loader. One other key item here that most applications do not require is the entire space of the NAND device.

Two examples of flash architectures.

Using multiple images is a good idea as this can serve several purposes, the first being that good code always exists on the flash array; and the second that ECC detection is much easier, because the correction takes many more cycles: thus having just detection makes booting easier. If you have hardware ECC detect and correct, then, while loading in the first image, correction is possible and will only require the backup images when the number of corrected bits exceed the correctable number of bits.

When deciding on error correction, several issues need to be taken into account: the NAND type, the spare array of the page, the overhead of the file system and the number of error correction bits required for your device and application. The number of correctable bits should be as high as possible. Once moving to an multi-bit error correction algorithm, such as a BCH or RS (Reed Solomon) code, the amount of time for ECC does not change drastically. Once you have selected a Hamming, BCH or RS code you need to take into account the latency, the number of cycles to execute detect and correct. When using a hamming code, this is usually done in real time, so no additional buffering is typically required. For BCH and RS codes this can vary widely based on the configuration. This can vary from 300 to over 6,000 cycles for an error correction. When determining the correct buffering required for your application, you also need to look at the bandwidth or number of sectors that will be transferred at a given time. Denali is offering a compliable ECC solution that can be configured for your application so that the number of detections, corrections and cycles are all options.

Fresh approach
There are several flash controller designs to think about when considering a solution for one's needs. A totally flexible design means mostly a software solution to maximize the ability to change to flash manufactures and the requirements for each. In this application, there is a dumb controller; however, this controller places the burden of the NAND operation on the software. This is quite a common solution for many applications, but a fair amount of effort is needed to develop the Low Level Driver (LLD). The LLD is typically optimized for each device or several devices; also, the processor needs to execute much more code than a hybrid or hardware solution.

The hybrid design is where more intelligence is placed into the hardware RTL, with the goal of improving performance, reducing the overhead of the processor and reducing software development and verification. This method removes the LLD development effort and, in many cases, can support quite a few suppliers, as long as the command state machines are well designed. A hardware solution with very limited SW interaction is not very common in designs today. However, for limited or single-device support, this architecture will provide the highest bandwidth.

A complete system-level flash solution including the VFFS, FTL and LLD plus hardware, Flash controller, data DMA, command DMA and best-in-class ECC.

The software is composed in three sections: (a) the virtual flash file system (VFFS), which acts as the interface to the RTOS or OS; (b) the Flash Translation Layer (FTL), one of the most important sections in the software stack, where all of the operational control of the flash device is handled, along with subroutines for the specific application; and (c) the LLD, which is the hardware abstraction layer for the controller and, in certain cases, is where flash command processing exists. When selecting a processor, the overhead for the LLD needs to be taken into consideration, as is the type of wear leveling and block management used in the FTL. This overhead can be quite considerable and will have an impact on power, clock rate, performance and booting.

In a typical controller, the LLD moves the data by programmed I/O (PIO), which means that the FW moves the data. Here, the data rate is tied to the clock rate; but when adding in a data DMA, the bottleneck will be eliminated. In other words, when reading the flash data into the block cache (move #1), one should apply ECC, as needed, and then copy the page data to the host's memory (move #2). The data DMA speeds up the first move, but the second move is still PIO.

Therefore, when using an architecture without a DMA, it will benefit directly from processor speed (clock rate); as a result, more is better. There is, and will always be, firmware overhead. From the time the FTL is called until the first word of data is delivered (and this data is typically called the "CMD to DRQ" time) is the time that the FW uses to receive the command, parse the command and do the mapping, flash control and ECC correction. At Denali Software, our extensive testing was completed last year, and we clearly established that the command overhead is negligible compared with the flash access time and flash transfer rate.

Our testing showed that the current PIO controller design suggests that a fast uP clock is desired for decent bandwidth. DMA controller design suggests that a slow uP clock should be used for lower-power applications.

About the authors
Lane Mason
is memory market analyst at Denali Software. Robert Pierce is senior director of flash products at Denali.

Article Comments - Insights on using NAND flash in port...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top