While the world is adjusting to a “new normal” during the COVID-19 epidemic, the word “resilience” is making headlines, looking at people, companies, and societies. We hear those terms applied to self-healing power grids, water systems, memes, and shows on Netflix.
But, what is resilience really? How is it different from other words we use to describe well-functioning systems, such as reliability and robustness?
I prefer Erik Hollnagel’s definition of Resilience, “The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances so that it can sustain required operations under both expected and unexpected conditions.”
In other words, how quickly can a system bounce back from a bad situation to get back to a good (or at least better) state?
Since resilience includes the ability to recover from unexpected things, not just expected things, it means that you can’t build a system and expect it to continue to function. It’s always a work in progress because the real world keeps offering you new unanticipated situations.
A “self-healing grid” that routes power around faults isn’t totally resilient, then.
Imagine a tree falls on a power line by your house and cuts off power to your house. Your neighbors will hopefully continue to have their power (which is good, of course). Your recovery isn’t complete until a team of line workers shows up, removes the tree, tests all the nearby equipment, and turns your lights back on. For you, the source of that recovery is human (even more so if you add additional things going wrong, like the tree also damaging cable and fiber wires, which requires coordination between multiple companies).
Set Your Team Up for Success
In fact, Resilience Engineering is a discipline particularly focused around the recognition of the key role humans play: if our primary source of resilience is human, what do we need to do to ensure our people are set up to succeed as much as possible?
What is interesting about measures of success is that they often have tradeoffs between each other. If you look at robustness (how much a system can absorb before failing), building highly robust systems often means you’re building less resilient systems or vice versa.
Control room operators manage the grid daily through a variety of situations, from the mundane to the emergency. They are the “guardian angels” of the grid, protecting all of the crews working in the field, and all the homes and businesses that depend on electricity. Since resilience in power systems depends on these people in the control room and the field, our mission is to do everything we can to set them up to be successful.
That’s why it’s critical to apply principles from neuroscience and human factors research to build software to optimize their ability to see, process, and understand data, manage multiple simultaneous problems in emergency situations, and make and implement decisions more effectively. Utilities have to look end-to-end, from what the operators are seeing, the cultures in which they’re working, and the tools, habits, and expectations they have in their work. If all of those were to align, a utility could truly be resilient.
Figure 1: ResilientGrid Resiliency Management System™ (RMS™)
It’s one of the reasons that ResilientGrid designs systems like the Resiliency Management System™ (RMS™) by working closely with these operators, making sure it gives them strong support. Whenever we can identify an inefficiency or an opportunity to help an operator achieve a goal, we find that it not only makes it easier for them to do their work, but also they have more mental capacity to process what’s happening on the system. It supports their building stronger situational awareness, and gives them the extra space to think through what could go wrong next, so they can prevent it from ever happening. Through the integration of traditional and emerging data sources (SCADA, EMS, DMS, Weather, Maps, etc.) the RMS provides a single pane of glass visualization, helping the operators successfully visualize challenges and make better informed decisions.
As discussed, the primary source of resilience comes from humans. Here are some suggestions that may help you and your organizations improve resilience. We’re looking forward to upcoming blog posts on these topics!
Near Miss Reporting drives an organization to pay close attention to “small signals” and “near misses” as opportunities to identify and reduce latent weaknesses in your organization. Quite often, these weaknesses are hidden in normal operations, so emergencies like those posed by COVID-19 are great opportunities to better understand and address (in a non-punitive way) those weaknesses. It may help you get through this situation better, but also mean you’re better prepared for whatever comes next.
High Reliability Organizational Culture can create environments where people have the habits of improved collaboration and problem solving both in normal and emergency situations.
Just Culture can create an environment where people feel safer and are better able to express risks they see, and count on their organizations to address those risks appropriately. It fights the “outcome bias” (judging a decision by what happened, rather than how risky it was), and offers the same set of rules across an organization for handling the same types of risks.
Please check back for future posts to continue the conversation. Message us or tweet us at @Resilient_Grid.