How to effectively operate mission critical facilities
Achieving uninterrupted mission critical facility operation requires both standardized employee training and detailed documentation.
Mission critical environments are just that—critical to business functions. Without exception, mission critical facilities cannot bear any shutdowns or interruptions. This applies even during planned maintenance, making proper preparation a vital factor to reducing human errors, equipment failure, and downtime. The successful operation of mission critical data center facilities requires process standardization, especially in the important areas of training and documentation. Properly executing these functions in support of equipment maintenance activities can alleviate a primary root-cause of downtime.
The goal of every mission critical facility is to operate safely, reliably, and efficiently at its design capacity. Most studies of downtime in mission critical environments come to the same conclusion: human error is a leading cause. While there is no way to completely eliminate human error and its negative effects on business productivity, there are a number of steps facility managers can take to greatly reduce its frequency and impact. The most reliable method is to invest in effective documentation and training programs, which will provide the basis for improving accuracy, consistency, and reliability.
Documentation and reporting
Nearly all critical facility operations have some level of documentation in place; however, some documentation programs do not meet the needs of mission critical environments. Considering the importance of accurate and current documentation to the reliable operation of the facility, a strong program standard is warranted. Structured documentation programs have a cost that varies according to system complexity, the facility automation scheme, and the level of change management needed to achieve the reliability and uptime goals of the enterprise.
Mission critical facilities are delivered with a considerable volume of documentation, but effectively sustaining operations is dependent on the right type of documentation. Typically, the detailed procedures needed to perform important daily functions are missing or incomplete.
Proper documentation requires the following:
- Detailed written procedures for all operations and maintenance activities including:
- Emergency operating procedures (EOP)
- Standard operating procedures (SOP)
- Methods of procedure (MOP)
- Administrative procedures (AP)
- Site walk-through procedures
- Facility work rules
- Change management processes and procedures
- Accurate and up-to-date drawings and schedules
- Report templates
- Weekly, monthly, quarterly reports on facility operations and system capacities
- Incident reports
- Failure analysis
- Lessons learned
Employee training should be a priority when new staff is hired, and should be conducted at regular intervals to ensure all personnel are up to date on any changes in industry standards and organizational best practices. Properly trained employees understand how the plant works, how to safely operate and maintain the plant equipment, and what to do when equipment and systems don’t function as expected. Thorough, accurate, and readily accessible documentation is both the foundation of this knowledge and the means of implementing it. However, the establishment of a comprehensive documentation and training program is a crucial, but rarely achieved, goal in mission critical environments.
What constitutes “proper training”? A best practice approach is to implement a multilevel training program that aligns each site operating procedure to a specific level of certification. This ensures that all operating and maintenance procedures are conducted or supervised by fully qualified personnel. Certification is achieved through a rigorous evaluation program, with regular recertification required. Such a program requires a large variety of materials and methods, such as:
- Theory of operation for major equipment and systems
- Training modules for EOPs, SOPs, and MOPs
- Drills for EOPs
- Exams for various training levels
It all starts with the most difficult aspect of any training program: developing the training materials. However, this effort cannot begin without timely and accurate information from the design and construction teams on the equipment configuration, the basis of design, the sequence of operations, and the as-built configuration. While this may seem to be readily available information, often it is poorly documented and late to be delivered. This is a major issue for both the commissioning and operations teams.
The main reason for the lack of effective training programs is the time and expense of development and training activities. This is a short-sighted view, however, as the cost and effort are largely offset by the resulting increased uptime, lower maintenance costs, and decreased employee turnover. The fact is that a proper documentation and training program is as important a consideration to achieving the required facility performance, efficiency, and reliability goals as the quality of the system design itself.
An effective multilevel training program can be broken down into four certification levels:
- Level 1: Basic knowledge and emergency response
- Level goal: Train an employee capable of properly responding to emergency situations.
- Training covers:
- Administrative functions
- Theory of operation
- Daily routines
- Security policies
- Emergency procedures.
- Level 2: Intermediate knowledge and frequent procedures
- Level goal: Provide focused teaching of critical systems in order for the employee to begin participating in routine work practices.
- Training covers:
- Technical critical systems equipment knowledge
- Frequently performed and/or elementary operational procedures
- Frequently performed maintenance procedures.
- Level 3: Advanced knowledge and infrequent procedures
- Level goal: Broaden training to include noncritical systems, and provide additional in-depth training on critical systems.
- Training covers:
- Technical noncritical systems equipment knowledge
- Infrequently performed maintenance procedures
- Infrequently performed and/or moderately difficult operational procedures.
- Level 4: Subject matter expertise on specific systems
- Level goal: Train employees to become subject matter experts so they in turn will be able to train new employees.
- Training covers:
- Select, technically difficult procedures throughout the facility
- Specialized outside training
- Training course development
- Training delivery.
Training doesn’t end after an employee has qualified and become certified at a certain level. It’s vital to continuously supplement that knowledge with lessons learned from all available sources, particularly the direct experience of the facility technical workforce. This new information is incorporated into the training program and formalized in the recertification process. To test skills and responsiveness, ongoing emergency response drills are conducted that keep employees at peak readiness to handle any emergent events in the mission critical environment.
Achieving uninterrupted mission critical facility operation requires more than an investment in redundant critical infrastructure systems. It also requires both a financial investment and time commitment in their sustained operation, which stems from properly documenting the environment and training staff in conducting regularly scheduled, standardized maintenance on all facility equipment.
The cost of these programs should be considered necessary to fulfill the critical mission and to protect the original infrastructure investment. The cost of creating and consistently implementing high-quality employee training and conducting effective maintenance is offset by increased uptime, longer asset life, more efficient system operations, and less employee turnover.
As senior vice president of critical environment services, Woolley oversees the operation of all on-site facility operations and maintenance programs at data center solutions provider Lee Technologies, a subsidiary of Schneider Electric. He also leads the quality system group, which establishes and continuously improves the company’s service offerings, and is responsible for the company’s environmental health and safety program. He has been involved in the mission critical facilities management field for more than 20 years and has extensive experience in building technical service programs in addition to managing operations for more than 50 data centers throughout his career.