Manufacturing defect in Cisco DIMMs causes server crashes

Manufacturing defect in Cisco DIMMs causes server crashes

A manufacturing defect in several Cisco dual in-line memory modules (DIMMs) puts systems at risk of crashing.

The manufacturing defect is present in 16GB, 32GB, and 64GB models from 2020. Users of Cisco DIMMs are advised to use the Serial Number Validation Tool to check whether their model is among the faulty DIMMs. The Number Validation Tool is only available to logged-in users with a serial number. The serial numbers of DIMMs can be checked with several methods. Cisco published a guide for each method (see ‘How To Identify Affected Products’).

DIMMs supply memory in various systems, including PCs, servers and printers. Cisco’s faulty DIMMs generate memory errors over time. Cisco’s RAS service is supposed to resolve memory errors. The manufacturing defect causes error messages to disappear after a repair, while in reality, nothing is solved. Over time, the DIMM provides less and less memory. Furthermore, the DIMM may fail, causing an entire system to crash.

Solution

Cisco advises users to remove faulty DIMMs from systems as quickly as possible. That’s the only way to prevent system failure. The DIMM can immediately be replaced with a new or alternative model. Under some circumstances, the new DIMM may continue to generate errors. “Check whether the connectors are in place. Seating is the most common cause of errors after a replacement”, the organization shared.

Cisco says the problem has been fixed in newer DIMMs. On May 4, the organisation introduced Predictive Networks, an AI technology for predicting and solving network problems. Predictive Networks should tackle bandwidth spikes and vulnerable configurations. The big question is whether Predictive Networks can handle manufacturing defects.

Tip: Cisco adds Silicon One chip to Catalyst switches: what does that mean for the network?