How to effectively operate mission critical facilities

Achieving uninterrupted mission critical facility operation requires both standardized employee training and detailed documentation.


Specialized technicians monitor the data center power and cooling systems, which provides alerts to any system issues. This type of technician is trained to respond to a wide variety of failure scenarios, using the emergency operating procedures that haveMission critical environments are just that—critical to business functions. Without exception, mission critical facilities cannot bear any shutdowns or interruptions. This applies even during planned maintenance, making proper preparation a vital factor to reducing human errors, equipment failure, and downtime. The successful operation of mission critical data center facilities requires process standardization, especially in the important areas of training and documentation. Properly executing these functions in support of equipment maintenance activities can alleviate a primary root-cause of downtime. 

The goal of every mission critical facility is to operate safely, reliably, and efficiently at its design capacity. Most studies of downtime in mission critical environments come to the same conclusion: human error is a leading cause. While there is no way to completely eliminate human error and its negative effects on business productivity, there are a number of steps facility managers can take to greatly reduce its frequency and impact. The most reliable method is to invest in effective documentation and training programs, which will provide the basis for improving accuracy, consistency, and reliability. 

Documentation and reporting

Personnel conduct a vibration measurement on a rotary UPS system. This information is used to perform an analysis as part of Lee Technologies (part of Schneider Electric’s) reliability centered maintenance program. Courtesy: Schneider ElectricNearly all critical facility operations have some level of documentation in place; however, some documentation programs do not meet the needs of mission critical environments. Considering the importance of accurate and current documentation to the reliable operation of the facility, a strong program standard is warranted. Structured documentation programs have a cost that varies according to system complexity, the facility automation scheme, and the level of change management needed to achieve the reliability and uptime goals of the enterprise. 

Mission critical facilities are delivered with a considerable volume of documentation, but effectively sustaining operations is dependent on the right type of documentation. Typically, the detailed procedures needed to perform important daily functions are missing or incomplete. 

Proper documentation requires the following: 

  • Detailed written procedures for all operations and maintenance activities including:

    • Emergency operating procedures (EOP)
    • Standard operating procedures (SOP)
    • Methods of procedure (MOP)
    • Administrative procedures (AP)

  • Site walk-through procedures
  • Facility work rules
  • Change management processes and procedures
  • Accurate and up-to-date drawings and schedules
  • Report templates

    • Weekly, monthly, quarterly reports on facility operations and system capacities
    • Incident reports
    • Failure analysis
    • Lessons learned
    • Near-misses.  


This technician is performing an inspection on a diesel generator fuel system during a preventative maintenance event. Proper training allows for this inspection in lieu of subcontracting the service to an outside source. Courtesy: Schneider ElectricEmployee training should be a priority when new staff is hired, and should be conducted at regular intervals to ensure all personnel are up to date on any changes in industry standards and organizational best practices. Properly trained employees understand how the plant works, how to safely operate and maintain the plant equipment, and what to do when equipment and systems don’t function as expected. Thorough, accurate, and readily accessible documentation is both the foundation of this knowledge and the means of implementing it. However, the establishment of a comprehensive documentation and training program is a crucial, but rarely achieved, goal in mission critical environments. 

What constitutes “proper training”? A best practice approach is to implement a multilevel training program that aligns each site operating procedure to a specific level of certification. This ensures that all operating and maintenance procedures are conducted or supervised by fully qualified personnel. Certification is achieved through a rigorous evaluation program, with regular recertification required. Such a program requires a large variety of materials and methods, such as: 

  • Theory of operation for major equipment and systems
  • Training modules for EOPs, SOPs, and MOPs
  • Drills for EOPs
  • Exams for various training levels 

Personnel are performing a switching procedure using an approved method of procedure. Note that they are using the “pilot-copilot” method of stepping through the checklist. Courtesy: Schneider ElectricIt all starts with the most difficult aspect of any training program: developing the training materials. However, this effort cannot begin without timely and accurate information from the design and construction teams on the equipment configuration, the basis of design, the sequence of operations, and the as-built configuration. While this may seem to be readily available information, often it is poorly documented and late to be delivered. This is a major issue for both the commissioning and operations teams.

The main reason for the lack of effective training programs is the time and expense of development and training activities. This is a short-sighted view, however, as the cost and effort are largely offset by the resulting increased uptime, lower maintenance costs, and decreased employee turnover. The fact is that a proper documentation and training program is as important a consideration to achieving the required facility performance, efficiency, and reliability goals as the quality of the system design itself. 

An effective multilevel training program can be broken down into four certification levels: 

  • Level 1: Basic knowledge and emergency response

    • Level goal: Train an employee capable of properly responding to emergency situations.
    • Training covers:

      • Administrative functions
      • Theory of operation
      • Daily routines
      • Security policies
      • Emergency procedures.

  • Level 2: Intermediate knowledge and frequent procedures

    • Level goal: Provide focused teaching of critical systems in order for the employee to begin participating in routine work practices.
    • Training covers:

      • Technical critical systems equipment knowledge
      • Frequently performed and/or elementary operational procedures
      • Frequently performed maintenance procedures.

  • Level 3: Advanced knowledge and infrequent procedures

    • Level goal: Broaden training to include noncritical systems, and provide additional in-depth training on critical systems.
    • Training covers:

      • Technical noncritical systems equipment knowledge
      • Infrequently performed maintenance procedures
      • Infrequently performed and/or moderately difficult operational procedures.

  • Level 4: Subject matter expertise on specific systems

    • Level goal: Train employees to become subject matter experts so they in turn will be able to train new employees.
    • Training covers:

      • Select, technically difficult procedures throughout the facility
      • Specialized outside training
      • Training course development
      • Training delivery. 

Personnel prepare for a switchgear maintenance performed three times per year. This is a potentially hazardous procedure that can only be performed with a detailed procedure and by personnel with thorough training in this maintenance operation. Courtesy:Training doesn’t end after an employee has qualified and become certified at a certain level. It’s vital to continuously supplement that knowledge with lessons learned from all available sources, particularly the direct experience of the facility technical workforce. This new information is incorporated into the training program and formalized in the recertification process. To test skills and responsiveness, ongoing emergency response drills are conducted that keep employees at peak readiness to handle any emergent events in the mission critical environment. 

Achieving uninterrupted mission critical facility operation requires more than an investment in redundant critical infrastructure systems. It also requires both a financial investment and time commitment in their sustained operation, which stems from properly documenting the environment and training staff in conducting regularly scheduled, standardized maintenance on all facility equipment. 

The cost of these programs should be considered necessary to fulfill the critical mission and to protect the original infrastructure investment. The cost of creating and consistently implementing high-quality employee training and conducting effective maintenance is offset by increased uptime, longer asset life, more efficient system operations, and less employee turnover. 

As senior vice president of critical environment services, Woolley oversees the operation of all on-site facility operations and maintenance programs at data center solutions provider Lee Technologies, a subsidiary of Schneider Electric. He also leads the quality system group, which establishes and continuously improves the company’s service offerings, and is responsible for the company’s environmental health and safety program. He has been involved in the mission critical facilities management field for more than 20 years and has extensive experience in building technical service programs in addition to managing operations for more than 50 data centers throughout his career.

The Top Plant program honors outstanding manufacturing facilities in North America. View the 2015 Top Plant.
The Product of the Year program recognizes products newly released in the manufacturing industries.
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
Doubling down on digital manufacturing; Data driving predictive maintenance; Electric motors and generators; Rewarding operational improvement
2017 Lubrication Guide; Software tools; Microgrids and energy strategies; Use robots effectively
Prescriptive maintenance; Hannover Messe 2017 recap; Reduce welding errors
The cloud, mobility, and remote operations; SCADA and contextual mobility; Custom UPS empowering a secure pipeline
Infrastructure for natural gas expansion; Artificial lift methods; Disruptive technology and fugitive gas emissions
Mobility as the means to offshore innovation; Preventing another Deepwater Horizon; ROVs as subsea robots; SCADA and the radio spectrum
Research team developing Tesla coil designs; Implementing wireless process sensing
Commissioning electrical systems; Designing emergency and standby generator systems; Paralleling switchgear generator systems
Natural gas engines; New applications for fuel cells; Large engines become more efficient; Extending boiler life

Annual Salary Survey

Before the calendar turned, 2016 already had the makings of a pivotal year for manufacturing, and for the world.

There were the big events for the year, including the United States as Partner Country at Hannover Messe in April and the 2016 International Manufacturing Technology Show in Chicago in September. There's also the matter of the U.S. presidential elections in November, which promise to shape policy in manufacturing for years to come.

But the year started with global economic turmoil, as a slowdown in Chinese manufacturing triggered a worldwide stock hiccup that sent values plummeting. The continued plunge in world oil prices has resulted in a slowdown in exploration and, by extension, the manufacture of exploration equipment.

Read more: 2015 Salary Survey

Maintenance and reliability tips and best practices from the maintenance and reliability coaches at Allied Reliability Group.
The One Voice for Manufacturing blog reports on federal public policy issues impacting the manufacturing sector. One Voice is a joint effort by the National Tooling and Machining...
The Society for Maintenance and Reliability Professionals an organization devoted...
Join this ongoing discussion of machine guarding topics, including solutions assessments, regulatory compliance, gap analysis...
IMS Research, recently acquired by IHS Inc., is a leading independent supplier of market research and consultancy to the global electronics industry.
Maintenance is not optional in manufacturing. It’s a profit center, driving productivity and uptime while reducing overall repair costs.
The Lachance on CMMS blog is about current maintenance topics. Blogger Paul Lachance is president and chief technology officer for Smartware Group.
The maintenance journey has been a long, slow trek for most manufacturers and has gone from preventive maintenance to predictive maintenance.
Featured articles highlight technologies that enable the Industrial Internet of Things, IIoT-related products and strategies to get data more easily to the user.
This digital report will explore several aspects of how IIoT will transform manufacturing in the coming years.
Maintenance Manager; California Oils Corp.
Associate, Electrical Engineering; Wood Harbinger
Control Systems Engineer; Robert Bosch Corp.
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
click me