Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > Memory/Storage

Microsoft: DRAM errors cause PC crashes

Posted: 22 May 2007 ?? ?Print Version ?Bookmark and Share

Keywords:DRAM errors? PC crashes? Microsoft white paper?

Desktop and notebook computers may need to adopt error-correcting code (ECC) memory to combat rising system crashes from single-bit memory errors, according to a white paper by Microsoft Corp. The software giant raised the issue in a panel discussion on memory at the Windows Hardware Engineering Conference although it admits its data on system failures is still inconclusive.

For about four years, Microsoft has been collecting data through its Online Crash Analysis (OCA) tool that reports system crashes to a Microsoft Website. About 18 months ago it began sharing OCA data and the white paper with systems and chipmakers. According to one source, the report said single-bit error rates in DRAM are now among the top ten causes of systems failures.

Microsoft admits the data is still inconclusive because OCA does not provide enough detail about the types of systems that crash and the memory they use. As it tries to improve the tool, Microsoft is asking OEMs to help provide more data and to consider ECC memory in desktops and notebooks.

Extra cost
Today ECC memory is widely used in PC servers. But so far desktop, notebook and many chipmakers have resisted the move because it would add costs in the form of extra DRAM chips on a module and upgraded memory controllers in chipsets.

Some system maker in the audience at the WinHEC panel expressed support for a move to ECC, but DRAM makers on the panel were still skeptical.

"I think the problem is significant," said Jeff Galloway, an engineer in Hewlett-Packard's x86 server group. Microsoft has shown him data on HP crashes that appeared to come from single-bit DRAM errors and were all on systems not running a Windows Server OS, he added.

"The industry needs to do something about this," Galloway said. "Microsoft got ECC into servers by requiring it for a Windows Server logo, and I think they should do the same thing for desktop and notebooks now," he added.

"This kind of forum is one way we can engage OEMs in what we should do going forward," said Son VoBa, a principal program manager in Microsoft's Windows Server group who led the panel discussion. "ECC may be only one way to address the problem," he added. The single-bit errors are typically traced to the effects of neutron radiation, so-called cosmic rays, bombarding individual capacitors in a DRAM and changing their charge state. DRAM makers say that effect has actually been diminishing over time and the errors could have come from a variety of sources including chipsets.

"We have seen reductions [in soft error rates] with each of the last several process technology generations," said Dean Klein, VP of market development for Micron.

DRAM makers, including Samsung and Qimonda, also note that SDRAM and DDR1 memories provided ECC capabilities that notebooks and desktops did not use. Thus, when the standard was set for today's DDR2 memories, engineers eliminated ECC to save costs associated with the unused feature.

Better approach?
One memory maker suggested a better approach would be to create a retry facility in the DDR4 interface standard now in the works. A Samsung spokesman said the DDR4 group is in the early stages of discussing a feature for monitoring the memory I/O interface.

Peter Glaskowsky, an analyst with Envisioneering, said Microsoft pushed for adopting on ECC to combat soft errors in the mid 1990s, but OEMs resisted. They refused to take on the costs of the shift, making the case that more crashes were caused by Windows failures than DRAM soft errors.

Now that the Windows OS is becoming more stable it makes sense that the company would re-open the issue. However it is unclear whether the soft errors have become significant enough to convince OEMs to change this time, he added.

- Rick Merritt
EE Times

Article Comments - Microsoft: DRAM errors cause PC cras...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top