Network Fault Management Description
March 11, 2010Faults can be defined as any failure or outage in the network. These can be system or service related and often times are masked as a downstream product of a combination of the two. Proactive fault analysis is an essential component of network management deployment. The same type of data that is collected for performance management can be used for proactive fault analysis. However, the timing and use of this data is different between proactive fault management and performance management.
Proactive fault management is the way that the ideal network management system can achieve the goals you determined. This mutual connection to performance management is through the baseline and the data variables that you are using. Proactive fault management is the conceptual area that ties together fault, performance, and change management in an ideal, effective network management system by integrating customized events, an event correlation engine, trouble ticketing, and the statistical analysis of the baseline data.
Network Fault Management Elements to consider
Requirements | Now | Later | Not Sure |
Establish a fault baseline | |||
Up/Down status monitoring | |||
Track and report when error rates exceed thresholds | |||
Establish a notification schema utilizing email, pager, trouble ticketing tools (This hasn’t been done yet) | |||
Implement an Event Correlation tool with Root Cause Analysis capability | |||
Historical Fault Reporting | |||
Fault Prioritization | |||
De-Duplication (filtering and suppressing multiple reports of same event) | |||
Fail Over Detection (notification when a fault tolerance event occurs) | |||
Fault alarm generation, and tracking | |||
Help Desk software interface | |||
Fault Correlation |