IT Component Failure Analysis

Unevenly mounted chip shredded conductive graphite grease

Cards of a particular type were experiencing a greater than expected failure rate. Our analysis revealed an unevenly mounted “flip chip”. When the heat sink and graphite-loaded compound was screwed down to make contact with the chip, one chip corner gouged fragments of the compound. When the card was in use, the fragments were blown across the surface of the card, causing shorts at adjacent components.

Lid of heat sink displayed showing how graphite 'grease' was scoured.

Power strip failures traced to manufacturer’s lack of stress testing

The client was experiencing many failures of multi-socket power strips which were mounted in the rear of server cabinets and delivering power to multiple individual servers. Our analysis of a failed power strip showed a failed internal power supply circuit card, including a ruptured casing around the capacitor, “melted” resistors and charred circuit card. The electrolytic capacitor had clearly overheated and the case surrounding it ruptured at the weakest point. Clearly the capacitor overheated and failed. Whether this was a consequence of the unit being at excessive temperature or an overload condition is unclear. But, the resulting circuit failure caused overheating and failure in downstream resistors, hence their ‘melted’ appearance. Also, each output was fused at a value which allowed the power strip to be run at above its rated performance.

The power strips were located at the rear, usually the hottest part, of cabinets. When all servers were running, the heat load, in combination with the strips’ inadequate cooling, caused the chemicals in the capacitor to expand and rupture the surrounding case. Although the capacitor was rated at 100 Deg C., the power strips may not have been tested under full load and under the realistically warm conditions such as those found in the exhaust side of server cabinets.

Image of a resistor that has overheated.

KVM switch failures linked to excessive EM fields

A client was experiencing frequent failures with rack mounted KVM switches. The switch could be recovered by replacing its modular power supply. We measured voltages, noted that the power supply issue was accompanied by excessive radiated EM fields, and that in a short time, the output voltage dropped below 5v, the threshold to operate the switch.

We tested and compared candidate supplies and found that many were unable to sustain the claimed voltage at maximum rated output. By varying load resistances and operating candidate supplies at maximum rated conditions we were able to reproduce all the failure symptoms and also identify a suitable power supply which performed as required. The KVM issue was resolved.

Faulty injection moulding led to fans malfunctioning

A number of early life failures were experienced with cooling fans in mainframe servers; the fan blades disintegrated. On investigation we found that there were issues with the path of material flow in the injection moulding process, which had resulted in ‘cold welds’ instead of a continuous bulk of material. These faults had then propagated when in use, leading to fracture and failure.

$A cold weld fracture$

IT Component Failure Analysis

How RITEL Uncovered Reasons for Failure - Case Studies

Unevenly mounted chip shredded conductive graphite grease

Power strip failures traced to manufacturer’s lack of stress testing

KVM switch failures linked to excessive EM fields

Faulty injection moulding led to fans malfunctioning

Reliable IT Environments Limited

Contact Us

+44 (0) 2380 361156

Emergency: +44 (0) 7815 185778

info@ritel.co.uk

Useful Links

Understanding ISO 14644-1 Particle Counting

All About Zinc Whiskers

Salt in the Data Centre

About Data Centre Contamination Analyses