Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > FPGAs/PLDs

Guide to hardware troubleshooting (Part 1)

Posted: 30 Jan 2014 ?? ?Print Version ?Bookmark and Share

Keywords:debug? CAD tools? PCB? FPGA? bug?

Don't worry about or speculate on the cause, just focus on reproducing the bug. Be careful at this point. Sometimes, similar but different issues can appear to be caused by the same bug, with one masking the other. If other bugs are uncovered while looking for the initial bug, make a note of them and go back to them later, but don't get side-tracked.

In the example shown in the figure, the bug was extremely sporadic, so the first part of the exercise was to write software that exercised the system vigorously, sampling at the maximum rate This reduced the time between failures and made analysis easier.

Step 4: Gather the evidence
Be methodical and document what you see and what happens. Don't theorise at this point about causes, just create a table of what aggravates or alleviates the bug as well what as has no effect. Be aware that multiple bugs can have the same symptoms, which can produce contradictory evidence.

Step 5: Try the easy stuff first
So you can reproduce the problem, look for the easy explanations first. For instance, are the connectors wired back-to-front? Are the chip pin-outs as per the data sheet? Are clocks running at the correct frequency? Very often, bugs are caused by mistakes that look dumb only in hindsight.

Remember when you did the design work, you had probably thousands of small decisions to make; most of these were correct. If the error proves to be "obvious" you can correct it and skip to step 9.

Step 6: Break the problem down
So now the easy things have been checked out. You can reproduce the problem, but perhaps only occasionally, or there are conflicting messages. It is important to remember that In complex systems, multiple bugs can show the same symptoms on the surface, but require different cures.

To help clarify the issue, eliminate as much of the system as possible that doesn't appear to be relevant to the bug. For instance, you could power down devices on the PCB that are unrelated, or unplug cables to other boards. Do this while retesting. If the bug suddenly stops when an unrelated module is taken out of the equation, you have a smoking gun. Document it. Try to reproduce the bug again, with the module and then again without it.

In the DSP system described earlier (figure), the only symptom was that the interrupts from the FPGA sometimes just stopped coming in. We wrote a simple program that just thrashed the system, reading from the interrupt status registers at the highest rate possible with the sample rate as high as possible, but without any other accesses to other parts of the system.

The frequency of the bug dramatically increased and a bug in the FPGA was found, relating to the system reading the interrupt status register at the same time as a new interrupt arrived. We fixed the issue and put the board on for a 24-hour soak test. After 23? hours, the bunting was up and we were starting to push the champagne corks out of the bottles....then it crashed again.

Step 7: Talk it over with a colleague
When dealing with what seem to be intractable bugs, just talking it over with someone else can often help, even when that person is from a different engineering discipline. Explaining what you see to another can be all that is required for you to see the bug from a different point of view and realise a crucial fact. At the very least you may come up with inconsistencies that need ironing out or receive suggestions of other things to try.

?First Page?Previous Page 1???2???3?Next Page?Last Page

Article Comments - Guide to hardware troubleshooting (P...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top