Select Page
Brian Vigna

Written by: Brian Vigna

Instructional Designer and Security Awareness Training Specialist

 
On July 19, 2024, one of the largest computer outages ever occurred. Details are still trickliang in, and the impact is still being felt particularly at airports where a single plane malfunction can cause a ripple effect of delays and cancellations. In this case it was not an overzealous weather pattern but an unprecedented combination of reliance on a single point of failure (computers running Microsoft software), and a lack of continuity planning.
 

Details of the Incident

So, what exactly happened on the fateful morning of July 19? Information on the incident is still being gathered, and novel issues are still being reported by Microsoft reliant companies almost two weeks after the initial outage. What we do know at this point is that what Microsoft referred to as a “security update” (engineered by cybersecurity company CrowdStrike) was pushed out to all modern Windows systems in the early morning hours of July 19. Within minutes the newly updated systems began to crash. The faulty update impacted the boot process rendering computers useless as they failed to properly load the Windows operating system. The result was something IT folks call the Blue Screen of Death or BSoD. The update was rolled back (cancelled) within hours, but the damage was done, millions of computers had downloaded and installed the defective update.
 

Impacts of the Outage

The impact was immediate, and widespread. In a recent blog post Microsoft estimated the outage affected 8.5 million Windows devices. Microsoft was quick to note that this number makes up less than one percent of all Windows machines (take this with a grain of salt, as Microsoft is in full PR crisis mode). What made the damage particularly newsworthy was the businesses that felt the brunt of the impact, airlines, banking institutions, healthcare, with many others affected. While banks and healthcare have been reported to be the most disrupted businesses (with estimated losses of $1.15 billion and $1.94 billion, respectively) airlines and stranded passengers stole the headlines. Images of families sleeping on the floor of airports, angry customers berating gate agents, and sparse information on what the airlines were doing to fix the issue became the top national news story. The story didn’t end quickly- some airlines are still canceling flights as of July 30 as this story is being written.
 

Weakness Revealed

The outage was not a cybersecurity hack, but the immediate impact was very similar to one. The outage resulted in a complete inability to access important data, services, and systems, much like a ransomware attack. So, what are big companies like Delta, American Airlines, JP Morgan, Wells Fargo, etc. doing to prevent this from happening again? While inquiries into the cause and continued mitigation are top-of-mind for these companies, the IT and leadership teams are diving headfirst into another conversation, single points of failure. The big takeaway is allowing your company to rely on a single entity (like Microsoft Windows), or device can be a costly mistake. Companies are thinking about diversification, less reliance on single source, and how to stand up contingency plans in the event of future outage. The lesson learned for these businesses is that the cost of a single outage can be massive (current estimates are over $5 billion for the CrowdStrike outage).
 
What can smaller businesses learn from this incident and what the Fortune 500 companies are doing in the aftermath? Most importantly, they can reduce the impact of a similar event by focusing on business continuity planning (BCP). BCP is not a new concept, and it is not solely used in the IT industry, but it is a growing part of the cybersecurity landscape. BCP is a planning system that helps a company prepare for and respond to potential threats. The plan’s goal is to protect people and assets and ensure they can maintain or resume functionality during a disaster of any kind. BCP is a valuable tool against outages, regardless of the source of the outage. BCP helps to identify single points of failure, determine potential impacts, and then create or enhance systems that are more diverse and resilient. The global companies impacted will be spending millions of dollars of BCP over the coming months and years after feeling the pain of the CrowdStrike outage, and small businesses should be thinking about it just as much. BCP can work for even the smallest businesses, and can be adapted to the needs, size, and risk level of a business.
 
Want to keep reading about Business Continuity Planning? Check out these resources:

 
Interested in speaking with professionals about BCP, cybersecurity, or IT issues? Contact IES today!