What is resilience?



Aside from the fact that operational resilience is mandated by law as of January 2025 (yes, next year), having your systems and applications available to your customers whenever they need your services is always a good idea. Customers, both existing and new ones, typically prefer smooth operations over new functionality. If you have any roadblocks in your current customer journey, then solving those is also part of operational resilience (and excellence).

Does this mean you should not market new products or services? Of course not! Solving a customer journey roadblock is ensuring that your company is resilient. The Happy Meal is a prime example: it solved a product roadblock for small children and a profits roadblock for the company. For more info, just google it. But before you bring a new service online, be sure that it can withstand the punches that will be thrown at it. 

What is resilience? 

Resilience is the art of making sure your services are available to your customers whenever they can use them. Note I did not say 24/7/365. Your business may require that, but perhaps your systems need "only" to be available during "normal" business hours.

Resilient systems can withstand adverse events that impair their ability to perform normal functions, and, like in the case the Happy Meals, increased peak demands. Events can include simple breakdowns (like a storage device, an internet connection that fails, or a file that fails to load) or something worse, like a cyber attack or a larger failure in your data center.

Your client does not care what the cause is; what counts for the client is, "Can I access your service? (or buy that meal for my kid.)"

Resilience entails several aspects:

  • availability
  • performance
  • right-sizing
  • hardening
  • restore-ability
  • testing
  • monitoring
  • management and governance

It is now tempting to apply these aspects only to your organization's IT or technical parts. That is insufficient. Your operations, management, and even e.g. sales must ensure that services rendered result in happy clients and happy shareholders/owners. The reason is that resilient operations are a symphony. Not one single department or set of actions will achieve this. When you have product development working with the technical teams to develop a resilient flow at the right level for its earning potential, then you maximize profits.

This synergy ensures that you invest exactly the right level of resources. There are no exaggerated technical or operational elements for ancillary services. That frees resources to ensure your main services receive the full attention they deserve.

Resilience, in other words, is the result of a mindset and a way of operating that helps your business remain at the top of its game and provides a top service to clients while keeping the bottom line in the black. 

Why do we need to spend on this?

I mean, if it ain't broke, don't fix it. That old adage is true, and yet not. Services can remain up and running for a long time with single points of failure. But can you afford to have them break at any time? If yes, and your customers don't mind waiting for you to patch things up, then you can "risk-accept" that situation. But how realistic is that these days? If I cannot buy it at your shop today, I'll more than likely get it from another. If I'm in a contract with you, yet you cannot deliver, we will have a conversation, or at the very least, a moment of disappointment. If you have enough "disappointments," you will lose the customer. Lose enough customers, and you will have a reputational problem or worse.

We don't like to spend resources on something that "may"go wrong. We do risk assessments to determine the true cost of non-delivery and the likelihood of that happening. And there are different ways to deal with that assessment's outcome. Not everything needs to have double the number of people working on it, just in case one resignes. Not every system needs an availability of 99,999%.

But sometimes, we do not have a choice. When lives are at stake, like in medical or aviation services, being sorry is not a good starting point. The same goes for financial services. the DORA and NIS2 legislation in the EU, the CEA, FISMA, and GLBA in the US, and ESPA in Japan, to name a few, are legislations that require your company, if active in the relevant regulated sectors, to comply and ensure that your services continue to perform.

Most of these elements have one thing in common: we need to know what is important for our service delivery and what is not.

Business service

That brings us to the core subject of what needs to be resilient. The answer is very short and very complex at the same time. It is the service that you offer to your customers which must meet reliance levels.

Take the example of a hospital. When there is a power outage, the most critical systems must continue operating for a given period. That also means that sufficient capable staff must be present to operate said equipment; it even means that the paths leading to said hospital should remain available; if not by road, then, e.g., by helicopter. If these inroads are unavailable, an alternate hospital should be able to take on the workload. 

Not everything here in this example is the responsibility of the hospital administrators! This is why the management and governance parts of the resilience ecosystem are so important in the bigger picture. 

If we look at the financial sector, the EU DORA (Digital Operational Resilience Act) specifically states that you must start with your business services. Like many others, the financial sector can no longer function without its digital landscape. If a bank is unexpectedly disconnected from its payment network, especially SWIFT, it will not be long before there are existential issues. A trading department stands to lose millions if the trading system fails. 

Look in your own environment; you will see many such points. What if your internet connection goes down, and you rely on it for most of your business? How long can you afford to be out? How long before your clients notice and take action? Do you supply a small but critical service to an institution? Then, you may fall under the aforementioned laws (it's called third-party requirements, and your client may be liable to follow them.)

But also, outside of the technology, we see points in the supply chain that require resilience. Do you still rely on a single person or provider for a critical function? Do you have backup procedures if the tech stops working, yet your clients require you to continue to service them? 

In all these and other cases, you must know what your critical services are so that you can analyze the requirements and put the right measures in place.

Once you have defined your critical business services and have analyzed their operational requirements, you can start to look at what you need to implement the aforementioned areas of availability, monitoring, hardening, and others. Remember we're still at the level of business service. The tech comes later and will require a deeper analysis. 

In conclusion.

Resilient operations ensure that you continue to function, at the right price, in the face of adverse events. If you can, resilience starts at the business level from the moment of product conception. If the products have long been developed, look at how they are delivered to the client and upgrade operations, resources, and tech where needed.

In some cases, you are legally required to undertake this exercise. But in all cases, it is important that you understand your business services and the needs of your clients and put sufficient resources in the right places of your delivery chain. 

If you want to discuss this further, please contact me for a free talk.

 

Client rating

Cost Savings

Days Saved