April 1, 2017
Back in May of 2016, I had just joined WCU as the new Vice President for the Information Services & Technology (IS&T) Division. During the first few weeks, while still trying to learn which building was which and while trying hard to remember the names of the hundreds of people I was meeting, the IS&T division was rolling forward with projects already underway at the time of my hire. I needed to study, learn quickly, and to be humble and ask lots of questions.
One of the items that was well underway by the time I arrived was a Disaster Recovery (or DR) test. The IS&T division here at WCU has a practice of running a formal DR test of some component of our technology infrastructure every year, usually in May. In May of 2016, our testing would involve our distributed mass storage, which was deployed in two separate campus data centers and designed for resilience and fault tolerance. In the May 2016 exercise, under the watchful eye of an auditor from our State System of Higher Education, we aimed to verify that the full failure of one side was a survivable event and that we could continue to provide IT services to campus should such a failure actually occur. The testing, which involved multiple sub-tests and took place over several hours, was successful, much to the credit of the team I was lucky enough to be joining. We used that DR test to exercise our recovery practices, confirm our documentation, and make sure that multiple people on staff had a working understanding of how our fault tolerant distributed storage design worked.
Most responsible IT organizations run such disaster recovery tests on a regular basis, and annually is a fairly common schedule. We design IT infrastructure to serve our communities and, within the limits of the funding available, we design it to survive predictable failures such as technology component failures, building power failures, or even severe damage (flooding, fires) to a location like a data center. While these DR tests are very important and really must take place, there is a related planning and testing discipline focused on Mission Continuity.
Mission Continuity (MC) is about broader planning and the processes used to keep the mission of an organization on course after a severe adverse event. Rather than testing (only) technology components, MC planning and testing focuses on how the University can continue to serve our community and our primary mission of student success. MC might involve well-designed faulttolerant technology, and DR testing, but it also involves having and understanding good plans that go well beyond what technology organizations do. MC involves people from all around the University coordinating and responding in order to maintain the ability to deliver services.
This May, IS&T will work with Chester County's Department of Emergency Services to hold a table top exercise that is really more about MC than DR. A tabletop exercise is a drill where a situation is presented to the group and guided conversations occur to address the situation. The entire exercise takes place around a table (or for a large group, around a few tables). The focus of this exercise will be a campus emergency and how working together we can respond in support of mission continuity. The Chester County team will work with a group of us representing IS&T, Facilities, Academic Affairs, and more.
After several years of technology-focused DR testing, I am very pleased that we are now beginning to bring focus to the larger topic of Mission Continuity. At a moment in which a very encouraging level of inter-Divisional collaboration and cooperation is taking hold, we can learn and work together to protect the interests of the University and our students.