IT Component Failure Analysis

It is not always obvious whether a breakdown is caused by a global environmental issue or a more localised component failure. We can examine failed circuit cards, IT components, cabinet components with a view to tracking down the root cause of a failure, be it an environmental problem, manufacturer fault or installation error. Reliable IT Environments is independent, not aligned with any component manufacturer, so our analysis is truly objective.


How RITEL Uncovered Reasons for Failure - Case Studies

Unevenly mounted chip shredded conductive graphite grease

Cards of a particular type were experiencing a greater than expected failure rate. Our analysis revealed an unevenly mounted “flip chip”. When the heat sink and graphite-loaded compound was screwed down to make contact with the chip, one chip corner gouged fragments of the compound. When the card was in use, the fragments were blown across the surface of the card, causing shorts at adjacent components.

Lid of heat sink displayed showing how graphite 'grease' was scoured.

Power strip failures traced to manufacturer’s lack of stress testing

The client was experiencing many failures of multi-socket power strips which were mounted in the rear of server cabinets and delivering power to multiple individual servers. Our analysis of a failed power strip showed a failed internal power supply circuit card, including a ruptured casing around the capacitor, “melted” resistors and charred circuit card. The electrolytic capacitor had clearly overheated and the case surrounding it ruptured at the weakest point. Clearly the capacitor overheated and failed. Whether this was a consequence of the unit being at excessive temperature or an overload condition is unclear. But, the resulting circuit failure caused overheating and failure in downstream resistors, hence their ‘melted’ appearance. Also, each output was fused at a value which allowed the power strip to be run at above its rated performance.

The power strips were located at the rear, usually the hottest part, of cabinets. When all servers were running, the heat load, in combination with the strips’ inadequate cooling, caused the chemicals in the capacitor to expand and rupture the surrounding case. Although the capacitor was rated at 100 Deg C., the power strips may not have been tested under full load and under the realistically warm conditions such as those found in the exhaust side of server cabinets.

Image of a resistor that has overheated.

KVM switch failures linked to excessive EM fields

A client was experiencing frequent failures with rack mounted KVM switches. The switch could be recovered by replacing its modular power supply. We measured voltages, noted that the power supply issue was accompanied by excessive radiated EM fields, and that in a short time, the output voltage dropped below 5v, the threshold to operate the switch.

We tested and compared candidate supplies and found that many were unable to sustain the claimed voltage at maximum rated output. By varying load resistances and operating candidate supplies at maximum rated conditions we were able to reproduce all the failure symptoms and also identify a suitable power supply which performed as required. The KVM issue was resolved.

Faulty injection moulding led to fans malfunctioning

A number of early life failures were experienced with cooling fans in mainframe servers; the fan blades disintegrated. On investigation we found that there were issues with the path of material flow in the injection moulding process, which had resulted in ‘cold welds’ instead of a continuous bulk of material. These faults had then propagated when in use, leading to fracture and failure.

A crack in the molding of a curved piece of casing
Mobirise

Reliable IT Environments Limited

© Copyright 2022
Reliable IT Environments Limited
All Rights Reserved

RITEL Privacy Policy

Contact Us

+44 (0) 2380 361156 

Emergency: +44 (0) 7815 185778