Flying High: Rethinking Resilience Amidst Industry-Wide Computer Outages

May 16, 2019

Hundreds of flights were recently delayed when multiple US airlines were forced to ground operations - after a third-party vendor they rely on to help balance weight on planes required before takeoff - experienced system-wide computer outages. And just a few weeks earlier, reservation systems company Sabre’s computer outages impacted multiple airlines and travelers around the world.

There may be no other industry that is so heavily regulated or impacted by outages than the airline industry - and for good reason. Real lives, safety, security, money, time and emotions are on the line 24 x 7 x 365, and dependent on near-perfect operations and uptime.

Despite the high pressure to ensure uninterrupted and safe experiences, the airline business has been incredibly vulnerable to IT outages. And outages have severe business implications, ranging from frustrated customers, to damaged brand reputation, to not being able to execute revenue generating operations. But many airlines have come to accept outages as a norm.

Sungard AS, which tracks airline outages back to 2007 shows outages increasing over time -- with a whopping 10 in 2018, the second highest on record. While reasons for outages vary, travelers can shun an airline after just one IT failure. Sungard’s research, conducted by Qualtrics, reports that 65% of consumers will not travel with an airline after experiencing a technical problem that led to a flight delay. 

ITs complicated

IT can be exceptionally complex in the airline industry; and IT outages or disruptions can happen for many reasons. Third-party dependencies, severe weather impacting data centers, company mergers and acquisitions that suddenly marry two different IT environments, combining old hardware with new software, outdated legacy systems, technology upgrades, patches or transformations, connectivity issues, simple human error, cyberattacks or power outages can all be to blame. 

Outages can last for hours and affect everything from operations, dispatching systems, flight paperwork, booking, flight check-ins and boarding, or the ability to communicate with the FAA. And while some outages are out of airlines’ control, much more can be done to ensure they are less vulnerable from an IT perspective. Airlines do not have to accept outages as a cost of doing business.

Shifting to More Modern, Resilient Strategies

In an ‘always-on’ world, IT leaders are constantly challenged to maximize resources and mitigate the risks of downtime and data loss. The following are a few ways airlines can ensure they’re doing everything they can to minimize the risks of outages. 

     Modernize DR plans with resilience: Innovative airline IT organizations are modernizing their DR plans to go beyond traditional backup technologies to add resilience and continuous data protection. IT resilience technology combines continuous availability, workload mobility, and multi-cloud agility to withstand any disruption, seamlessly adopt new technology, and drive transformation forward. 

      Add geodiversity: It’s important to have geodiversity among your data centers in case one is impacted by severe weather, which could knock out legacy systems and primary data. Having one or two more data centers in a different part of the country offers more diversity of systems to protect and replicate data so that it’s constantly available, and not impacted by an outage. Critical applications and data can be moved or migrated out of harm’s way within minutes to another datacenter or cloud, if needed, with IT resilience technology. 

      Staff Up: Investing in top disaster recovery talent can quickly improve the strength and capabilities of an IT organization. Experience in disaster situations is valuable and helps ensure IT teams stay cool under pressure. Oftentimes the business case can be made to invest in more people when C-level executives understand the bottom line impact of downtime or outages.

      Practice makes perfect: DR plans should include potential ‘what if’ scenarios and must be laid out clearly with owners and step by step instructions. Plans must be signed off across all department heads, including your board of directors. This involves extended coordination and continuous communication, and in times of trouble will prove to make things as seamless as possible.

      Test regularly: Test your systems often to ensure systems are working as needed. Incremental improvements can be made among all involved when testing is done regularly.

No matter how big or small an outage it’s important that organizations do what they can today and prepare for the future. The difference between being down for hours or days versus minutes or seconds is the difference between a solid disaster recovery plan and one that is outdated, barely tested or even non-existent. 

Ultimately, the end goal is to maintain regular business operations so that customers will not experience any interruption or frustrations. The simplicity provided by cloud-based advancements are making disaster recovery systems much more resilient and capable of withstanding frequent, modern IT disruptions and threats.

Caroline Seymour is the product marketing director for Zerto, based in Boston where she leads overall product marketing strategy and execution. Prior to Zerto, Caroline was at IBM for nine years and has a wealth of experience in the Enterprise software space from the many roles she has held in Europe and in North America, covering pre-sales, product marketing and partner marketing.