networking , security

Network Fault Management Description

March 11, 2010

Faults can be defined as any failure or outage in the network.  These can be system or service related and often times are masked as a downstream product of a combination of the two.  Proactive fault analysis is an essential component of network management deployment. The same type of data that is collected for performance management can be used for proactive fault analysis. However, the timing and use of this data is different between proactive fault management and performance management.

Proactive fault management is the way that the ideal network management system can achieve the goals you determined. This mutual connection to performance management is through the baseline and the data variables that you are using. Proactive fault management is the conceptual area that ties together fault, performance, and change management in an ideal, effective network management system by integrating customized events, an event correlation engine, trouble ticketing, and the statistical analysis of the baseline data. 

Network Fault Management Elements to consider

 

Requirements Now Later Not Sure
Establish a fault baseline       
Up/Down status monitoring      
Track and report when error rates exceed thresholds      
Establish a notification schema utilizing email, pager, trouble ticketing tools (This hasn’t been done yet)      
Implement an Event Correlation tool with Root Cause Analysis capability      
Historical Fault Reporting      
Fault Prioritization       
De-Duplication (filtering and suppressing multiple reports of  same event)      
Fail Over Detection (notification when a fault tolerance event occurs)      
Fault alarm generation, and tracking      
Help Desk software interface      
Fault Correlation