Stay in touch with EE Times Asia
?
EE Times-Asia > Memory/Storage
?
?
Memory/Storage??

Posted: 08 Jul 2013 ?? ?Print Version ?

Keywords:DRAM? NAND flash memory? memory scaling? PCM? STT-MRAM?

We are currently investigating another increasingly significant obstacle to MLC NAND flash scaling, which is the increasing cell-to-cell program interference due to increasing parasitic capacitances between the cells' floating gates. Accurate characterisation and modelling of this phenomenon are needed to find effective techniques to combat program interference. In recent work, we leverage the read retry mechanism found in some flash designs to obtain measured threshold voltage distributions from state-of-the-art 2Y-nm (i.e., 24-20 nm) MLC NAND flash chips. These results are then used to characterise the cell-to-cell program interference under various programming conditions. We show that program interference can be accurately modelled as additive noise following Gaussian-mixture distributions, which can be predicted with 96.8% accuracy using linear regression models. We use these models to develop and evaluate a read reference voltage prediction technique that reduces the raw flash bit error rate by 64% and increases the flash lifetime by 30%.

Conclusion
We have described several research directions and ideas to enhance memory scaling via system and architecture-level approaches, by co-designing memory and other system components as well as with cooperation across multiple levels of the computing stack, including software, microarchitecture, and devices. We believe such approaches will become increasingly important and effective as the underlying memory technology nears its scaling limits at the physical level.

Acknowledgments
I would like to thank my PhD students Rachata Ausavarungnirun and Lavanya Subramanian for logistic help in preparing this manuscript. Many thanks to all my students in the SAFARI research group and collaborators at Carnegie Mellon as well as other universities, whom all contributed to the works outlined in this paper. Thanks also to our group's industrial sponsors over the past few years, including AMD, HP Labs, IBM, Intel, Nvidia, Oracle, Qualcomm, Samsung. Some of the research reported here was also partially supported by GSRC, Intel URO Memory Hierarchy Program, Intel Science and Technology Centre on Cloud Computing, NIH, NSF, and SRC.

Part of the structure of this paper is based on talks I have delivered at various venues on Scaling the Memory System in the Many-Core Era between 2010-2013, including at the 2011 International Symposium on Memory Management and ACM SIGPLAN Workshop on Memory System Performance and Correctness [54]. Section VII of this article is a much condensed and slightly revised version of the introduction of an invited article that is to appear in a special issue of the Intel Technology Journal, titled "Error Analysis and Retention-Aware Error Management for NAND Flash Memory" [7].

References
1. International technology roadmap for semiconductors (ITRS). 2011.
2. Hybrid Memory Consortium, 2012. http://www.hybridmemorycube.org.
3. C. Alkan et al. "Personalized copy-number and segmental duplication maps using next-generation sequencing." In Nature Genetics, 2009.
4. G. Atwood. "Current and emerging memory technology landscape." In Flash Memory Summit, 2011.
5. R. Ausavarungnirun et al. "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems." In ISCA, 2012.
6. R. Bryant. "Data-intensive supercomputing: The case for DISC." CMU CS Tech. Report 07-128, 2007.
7. Y. Cai et al. "Error analysis and retention-aware error management for NAND flash memory." To appear in Intel Technology Journal, 2013.
8. Y. Cai et al. "FPGA-based solid-state drive prototyping platform." In FCCM, 2011.
9. Y. Cai et al. "Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis." In DATE, 2012.
10. Y. Cai et al. "Flash Correct-and-Refresh: Retention-aware error management for increased flash memory lifetime." In ICCD, 2012.
11. Y. Cai et al. "Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis and modeling." In DATE, 2013.
12. K. Chang et al. "HAT: Heterogeneous adaptive throttling for on-chip networks." In SBAC-PAD, 2012.
13. E. Chen et al. "Advances and future prospects of spin-transfer torque random access memory." IEEE Transactions on Magnetics, 46(6), 2010.
14. E. Chung et al. "Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPUs?" In MICRO, 2010.
15. K. Condit et al. "Better I/O through byte-addressable, persistent memory." In SOSP, 2009.
16. R. Das et al. "Application-to-core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems," HPCA, 2013.
17. R. Das et al. "Application-aware prioritization mechanisms for on-chip networks." In MICRO, 2009.
18. R. Das et al. "Aergia: Exploiting packet latency slack in on-chip networks." In ISCA, 2010.
19. G. Dhiman. "PDRAM: A hybrid PRAM and DRAM main memory system." In DAC, 2009.
20. X. Dong et al. "Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems." In SC, 2009.
21. E. Ebrahimi et al. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010.
22. E. Ebrahimi et al. Parallel application memory scheduling. In MICRO, 2011.
23. E. Ebrahimi et al. Prefetch-aware shared-resource management for multi-core systems. In ISCA, 2011.
24. M. Ekman. A robust main-memory compression scheme. In ISCA, 2005.
25. R. Gallager. Low density parity check codes. 1963. MIT Press.
26. B. Grot et al. Preemptive virtual clock: A flexible, efficient, and cost-effective qos scheme for networks-on-chip. In MICRO, 2009.
27. B. Grot et al. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees. In ISCA, 2011.
28. J. A. Joao et al. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS, 2012.
29. T. L. Johnson et al. Run-time spatial locality detection and optimization. In MICRO, 1997.
30. K. Kim et al. A new investigation of data retention time in truly nanoscaled DRAMs. IEEE Electron Device Letters, 30(8), Aug. 2009.
31. Y. Kim et al. ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.
32. Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behaviour. In MICRO, 2010.
33. Y. Kim et al. A case for subarray-level parallelism (SALP) in DRAM. In ISCA, 2012.
34. Y. Koh. NAND Flash Scaling Beyond 20nm. In IMW, 2009.
35. E. Kultursay et al. Evaluating STT-RAM as an energy-efficient main memory alternative. In ISPASS, 2013.
36. S. Kumar and C. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In ISCA, 1998.
37. B. C. Lee et al. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA, 2009.
38. B. C. Lee et al. Phase change memory architecture and the quest for scalability. Communications of the ACM, 53(7):99C106, 2010.
39. B. C. Lee et al. Phase change technology and the future of main memory. IEEE Micro (Top Picks Issue), 30(1), 2010.
40. C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO, 2008.
41. D. Lee et al. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In HPCA, 2013.
42. C. Lefurgy et al. Energy management for commercial servers. In IEEE Computer, 2003.
43. J. Liu et al. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. To appear in ISCA, 2013.
44. J. Liu et al. RAIDR: Retention-aware intelligent DRAM refresh. In ISCA, 2012.
45. G. Loh. 3D-stacked memory architectures for multi-core processors. In ISCA, 2008.
46. A. Maislos et al. A new era in embedded flash memory. In FMS, 2011.
47. J. Mandelman et al. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). In IBM JR&D, 2002.
48. J. Meza et al. A case for small row buffers in non-volatile main memories. In ICCD, 2012.
49. J. Meza et al. Enabling efficient and scalable hybrid memories using fine-granularity DRAM cache management. IEEE CAL, 2012.
50. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007.
51. T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008.
52. S. Muralidhara et al. Reducing memory interference in multi-core systems via application-aware memory channel partitioning. In MICRO, 2011.
53. O. Mutlu. Asymmetry everywhere (with automatic resource management). In CRA Workshop on Adv. Comp. Arch. Research, 2010.
54. O. Mutlu et al. Memory systems in the many-core era: Challenges, opportunities, and solution directions. In ISMM, 2011. http://users.ece.cmu.edu/omutlu/pub/onur-ismm-mspc-keynote-june-5-2011-short.pptx.
55. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.
56. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008.
57. G. Nychis et al. Next generation on-chip networks: What kind of congestion control do we need? In HotNets, 2010.
58. G. Nychis et al. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In SIGCOMM, 2012.
59. G. Pekhimenko et al. Base-delta-immediate compression: A practical data compression mechanism for on-chip caches. In PACT, 2012.
60. G. Pekhimenko et al. Linearly compressed pages: A main memory compression framework with low complexity and low latency. CMU SAFARI Tech. Report, 2012.
61. M. K. Qureshi et al. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA, 2007.
62. M. K. Qureshi et al. Enhancing lifetime and security of phase change memories via start-gap wear leveling. In MICRO, 2009.
63. M. K. Qureshi et al. Scalable high performance main memory system using phase-change memory technology. In ISCA, 2009.
64. S. Raoux et al. Phase-change random access memory: A scalable technology. IBM JR&D, 52, Jul/Sep 2008.
65. V. Seshadri et al. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012.
66. V. Seshadri et al. RowClone: Fast and efficient In-DRAM copy and initialisation of bulk data. CMU SAFARI Tech. Report, 2013.
67. L. Subramanian et al. MISE: Providing performance predictability and improving fairness in shared main memory systems. In HPCA, 2013.
68. M. A. Suleman et al. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, 2009.
69. T. Treangen and S. Salzberg. Repetitive DNA and next-generation sequencing: computational challenges and solutions. In Nature Reviews Genetics, 2012.
70. A. Udipi et al. Rethinking DRAM design and organisation for energy-constrained multi-cores. In ISCA, 2010.
71. H.-S. P. Wong. Phase change memory. In Proceedings of the IEEE, 2010.
72. H. Xin et al. Accelerating read mapping with FastHASH. In BMC Genomics, 2013.
73. J. Yang et al. Frequent value compression in data caches. In MICRO-33, 2000.
74. D. Yoon et al. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In ISCA, 2011.
75. D. Yoon et al. The dynamic granularity memory system. In ISCA, 2012.
76. H. Yoon et al. Row buffer locality aware caching policies for hybrid memories. In ICCD, 2012.
77. H. Yoon et al. Data mapping and buffering in multi-level cell memory for higher performance and energy efficiency. CMU SAFARI Tech. Report, 2013.

Onur Mutlu is the Strecker Early Career Professor at Carnegie Mellon University. His research is in computer architecture, especially memory system design and management. He was a recipient of the IEEE Computer Society Young Computer Architect Award, Intel Early Career Faculty Award, and a number of best paper awards.

Note: This work was first presented at the 2013 International Memory Workshop and appears here courtesy of the IEEE.

?? 1???2???3???4???5