Exploring The Thorn in IT’s Side: The Cost of Downtime

Tim Flower

VP, DEX Strategy

Exploring The Thorn in IT’s Side: The Cost of Downtime

Published

October 27, 2022

If you work in IT, you know what it’s like to operate under leadership’s ever-present microscope. When businesses invest so heavily in their workplace technology and IT departments, they want to see those investments pay off and make the workplace more efficient – and especially more cost-efficient. But there’s one cost that has long plagued even the most well-equipped support teams: the cost of unplanned downtime.

The DEX Hub editorial team recently sat down with digital transformation expert and co-host of the DEX Show, Tim Flower, to discuss this timely issue of IT downtime costs.

Read on to learn how organizations can create smarter IT budgets, mitigate the costs of unplanned downtime, and much more!

Q: On occasion, organizations have to plan time to do upgrades or patches that create downtime for employees – how much can these planned downtime events cost an organization?

Tim: Whether planned or unplanned, downtime has a major impact on an organization. Downtime means a loss of revenue opportunity, even if there is a plan in place to navigate the gap in availability. If a revenue-generating function is unavailable, then customers aren’t able to transact business with the organization.

To minimize the impact of planned downtime, the best course of action is to schedule very short outages, ideally occurring in low business hours, typically overnight. However, if you are a global business, there is no “overnight” – you need to be available 24/7. So, in those situations lost opportunities to generate revenue are unavoidable. When you do deploy in the “off-hours”, you need to have IT resources on hand who can test, validate and deploy, which means incurring higher IT expenses that you wouldn’t have during normal business hours.

Finally, there is a need for high availability in application designs where the app and its data are able to failover to a synchronized copy while maintenance activities occur on the primary instance. These designs are costly and drive IT expenses even higher.

Q: While they are planned, they may not be planned months in advance. So, how can organizations budget for these expenses?

Tim: The first conversation around planned downtime should be with your stakeholders, where you can discuss and prioritize which applications are impacted. By creating a priorities list, organizations can adjust the necessary schedules, take any downtime overlap into account, and set appropriate maintenance windows as needed.

This will also help identify the mission-critical global apps that require high availability in addition to disaster recovery, which are two different and distinct designs. The shortlist of apps requiring high availability designs will allow organizations to budget for the design and implementation work necessary to avoid planned downtime.

Q: Of course, not all downtime is planned. What are the costs associated with the dreaded unplanned downtime?

Tim: Unplanned outages indicate a much bigger issue. It means that something in the design, implementation, or update / upgrade process is unknown to those supporting it, and a failure has occurred. Unplanned outages can happen any time, and as anyone in IT can tell you, it is usually during critical business hours – which of course impacts revenue. Unplanned downtime also means that the employees paid to do a job are not able to perform that work. So, the major costs are: lost revenue, lost employee productivity, and a damaged reputation for IT due to the lack of ability to maintain stability and reliability. And perhaps even brand reputation impact if the outage affects customer interaction.

Each of these costs grow exponentially as time passes waiting for a resolution as the count of outages over time accrues. This culminates into one additional cost – employee sentiment. When employees aren’t able to do their jobs, it drives frustration and lack of job satisfaction. Employees like to succeed, and they don’t like things getting in their way. Unplanned downtime can therefore lead to morale issues and even increased rates of employee attrition – both of which costs organizations in the end.

Q: How can an organization budget for unplanned downtime?

Tim: Most organizations don’t plan for downtime because they aren’t expecting failure. What companies instead focus on is making sure they have the right technologies in place to ensure there aren’t any unplanned outages. To do this right, organizations should budget for technology testing and validation of change, and increased awareness of the complexities that exist in their production environment. These each play critical roles in reducing downtime.

The scope of applications in the conversation of unplanned downtime is incrementally larger than those in the “planned downtime” category. Unplanned outages can impact any category of applications and the entire business in many ways. So, any budget should be spent on new platforms that increase the ability to test and validate change. It is a much more strategic position than just adding headcount to handle the increase in failures.

Q: Since most companies aren’t planning for unplanned downtime as you said, what would be the best way to mitigate downtime expenses?

Tim: The goal is always to avoid unplanned outages, or, at the very least, shorten the duration of a wide-scale outage. To do this, organizations should start with testing, which requires a full awareness of how the change is interacting with both the technology platform as well as the end-user of the app. This can be extremely difficult in a lab, even with advanced simulators and synthetic transactions. Budgeting for platforms that provide robust visibility into the underlying test environment both before and after the change is critical to understanding the potential for success or failure.

The second step is to enable full visibility of the production environment that will be receiving the change. Once a device, application, or profile is deployed into production, it becomes very susceptible to unplanned changes, or for the environment around it to morph from the original design. If the original design is flawed, then the risk of failure is high and different app versions that pose new and unplanned conflicts will occur. In addition, failures of prior planned changes will leave the devices in an unanticipated state. Unknown settings or components will function as-is but will fail after a change is made. Visibility into the production environment is critical for successful and low-impact change.

The third area of concern, and one that doesn’t get much discussion, is the unknown variable. For example, if an employee installs a new application that isn’t part of the company’s existing ecosystem, it could cause a conflict. A well-meaning IT technician changing a setting while troubleshooting an old issue may leave the new change vulnerable to failure. The manual upgrade or downgrade of a dependent application may cause a compatibility failure. The great unknown is often the biggest risk.

Q: Any final thoughts?

Tim: Mitigation of downtime is more than just increased rigor at the Change Advisory Board (CAB) or requiring IT to be more diligent in their testing. It requires budgeting for platforms that increase visibility and awareness of the current and future state environments, and ultimately lead to more intelligent planning and decision-making which results in more reliable changes.