Preparing Your Fleet for Outages and Crises
Jul 24, 2024
A major outage last week caused by a faulty update from CrowdStrike disrupted global industries, highlighting the fragility of our digital systems. Learn what steps, if any, you can take to prepare your fleet for such unexpected disruptions, ensuring your operations remain resilient and your team stays safe.
Subscribe On:
Transcript
Welcome to The Fleet Code, a podcast brought to you by Fleetio where we dive into the latest fleet industry trends, technologies and best practices.
My name is Zach Searcy and today I want to dive into a topic that is still very fresh on our minds.
Last week, there was a major outage that halted many shipping processes across the whole world. A security platform called CrowdStrike released a faulty update that shut down industries from banking to hospitals to airlines, even to emergency dispatch units.
Some of us – and by that, I really just mean myself – are referring to this as the digital blockage of the Suez Canal. One digital boat released a bad line of code that turned sideways and we all had to deal with the repercussions. You can tell that I really put a lot of thought into that comparison.
But I use that comparison to say that things like this happen – more often than we'd like – and often outside of our control. And without a proper plan in place, the impact of something like this on your fleet could be felt for years to come.
Let's get into it.
What Happened: The CrowdStrike Outage
So let's start by talking about what happened. Early Friday morning, CrowdStrike - a cybersecurity company - released a faulty update that brought down Microsoft powered systems, globally. I didn't fully understand the impact until somebody described it to me as "this is what we expected Y2K to feel like." It's the biggest IT outage in history and, childhood trauma aside, this update had significant ramifications across many industries.
Airlines were grounded. Shipping companies were halted. Emergency services went offline. Hospitals lost all digital charting. Banks were unable to process payments. And worst of all, Universal Studios in Osaka, Japan couldn't sell tickets.
Needless to say, it was a huge event and shows just how fragile our systems can be in the digital age. Fortunately, the service disruptions were temporary and everything was mostly restored by the end of the day.
Preparing Your Fleet for Outages
As evidenced by how widespread this outage was, there's not much that fleets could actually do about it to prevent it. There were some ways to get systems restored quickly, but those involved doing a temporary workaround on every single impacted device. Many around the world just had to wait it out until a proper fix was in place.
And while this one was on a global scale, outages can come in many different forms – from natural disasters, poor power grids, nearby construction, or it could even be hyper local to your organization or your business systems.
I wanted to take a few minutes and chat through some of the takeaways from this event for fleets, and make some recommendations to your processes that can help you be – eh, think like a prepper when something like this happens again.
And I want to be clear that I have no ill feelings towards preppers, but for the sake of comparison, it makes me think of food rations in a basement that you hope to never have to touch. And I like to hope that you never have to touch the rations of your preparation when it comes to global technology outages.
Anyways, let's talk about what you can do for the next outage.
Steps for Fleet Managers During Outages and Crises
1. Ensure the safety of your fleet team
Before we get into action items, let's remember that your number one priority as a fleet manager is to guarantee the safety of your team. All of these steps can be implemented to try and continue your fleet management operations, but if the surrounding environment that created an outage is not safe for your fleet, then everything else is moot.
This specific outage was purely digital, but if it's an outage due to an environmental disaster, then you need to assess the external situation before you decide to pursue anything internally.
2. Implement systems with data backups to minimize long-term impact
Now that I've gotten that disclaimer out of the way, here are some steps you can take preemptively to minimize the impact for future outages.
First, you should make sure that you're implementing systems with proper backups in place. It's one thing to lose access for a short period of time, but it's another entirely if the outage is due to lost data and you're not able to restore your systems once the outage subsides.
This doesn't mean that you have to store everything on a local hard drive, but rather that you should make sure you're working with providers who have proper data management processes in place.
We were fortunate that Fleetio, our fleet maintenance management software, was not impacted by this outage, but if it were to go offline for some other reason, we can be confident that all the data would still be available, because it's backed up in two different time zones.
Data Backup Best Practices
On the topic of data backups, I talked with John Anderson, the Security and Compliance Manager at Fleetio to get a few best practices for those of you at home.
- Create Frequent Backups of Important Data: First, he recommended that you set up frequent backups of your most important data - every 24 hours if possible.
- Distribute Your Data Geographically: Next, he recommended that you backup your data in multiple time zones. You don't want to store all your data in Florida, and a hurricane hits and temporarily knocks out your data. Instead store data in Florida, but back up in California.
- Implement a Backup Testing Schedule: And lastly, he said that you should test your data backups at least once a year, but as often as quarterly if you can. People often create backups, but lose all the data anyways when they go to restore a backup and realize the process is broken.
By ensuring that your data is secure and backed up in several physical locations, you can be confident that any operational impact from an outage is limited to the duration of the outage, rather than having to locate and restore historical records after the fact.
3. Have a manual reporting system in place that you can fall back on
Another step that you can take as a fleet manager is to have a system of analog processes that you can use for a short period of time, just to endure the outage. And not just to have these processes, but to make sure your team knows how to use these processes.
My wife is a nurse at a hospital (that was impacted by this, by the way) and a hospital can't just stop operations, so they practice paper charting once a month. By having their team familiar with these paper processes, they're able to adjust quickly and continue their patient care.
Once their outage is complete, they have a recovery timeline for efficiently getting all charts and notes taken during the outage uploaded into their digital charting system.
I'm not saying that you need to have fire drills every month and force your team to go back to paper inspection forms just so they're ready in a time of crisis. Instead, have clear and available standard operating procedures around how your fleet operates during individual events, and make sure you communicate these steps with your team, so they know where to find the SOPs and what each of their roles are during this time.
You should have multiple binders with detailed instructions, as well as any forms or templates necessary to continue tracking your fleet operation during the outage. Once the outage is complete, you can go back through your notes to get the missing information uploaded into your fleet management system.
4. Have multiple communication channels in place in case any go offline
And just as you should be communicating these processes in advance, you should also have clear lines of communication ready during an outage event.
You should have multiple channels for broadcasting messages to team members – either through email, internal communications, text chains, physical meetings. There are many different ways that you can communicate, each with varying vulnerabilities and degrees of reliability. However, by having all of these systems in place in advance, you can almost always find one broadcasting channel that is not impacted by an outage.
I feel like I just justified Ryan's idea for WUPHF in The Office and I want to briefly apologize for that.
In your message to the team, you should clarify the problem, the plan, any resources related to the plan, and multiple ways to contact you in case they have any additional problems or questions.
Communication is often the first channel to break down in the event of an outage, because we're all spoiled with immediate availability through channels like Slack and Microsoft Teams. Establishing clear communication channels allows you to be nimble and adjust operations on the fly.
The main reason that Peyton Manning was a great quarterback is because he was able to read a defense before the snap and adjust his coverage according to what he saw. As a fleet manager in a crisis or an outage event like this, your job is to see what's happening as it's happening and call an audible to adjust your coverage and set your team up for success.
5. Remember to take care of yourself
But also, remember that you aren't expected to solve every issue yourself. Many times things are out of your control. Even the best crisis response plan is going to be challenged in a situation like this and you do not need to take personal responsibility to right the ship.
Lean into your strengths. Surround yourself with people who fill in your weaknesses (I'm not a data security expert, that's why I talked to ours, John, earlier because I knew that I would make a fool of myself if I tried to pretend to be the expert). And realize that sometimes things are going to go wrong, and you just need to wait it out and figure out how to fit the pieces back together after everything stabilizes.
And even if you can take control of a situation like this, it might be an extremely challenging period and you will need to create space to recharge. Being a fleet manager often requires you to always be on call, and you can easily end up feeling burnt out or worse.
Remember to take care of yourself and disconnect from work when you get a chance, because situations like this are always temporary and the fleet will still need your oversight when normalcy is restored.
Key Takeaways
That's all I have for today's episode of The Fleet Code. I hope your fleet was minimally impacted by last week's outage and that you had a nice restful weekend.
Here are the key takeaways from today's episode:
- Above all else, make sure the environment is safe for your team to operate in.
- Implement and test your data backups to reduce the chance of a long-term impact to your fleet.
- Have a plan in place, as well as resources to help implement the plan, so your team knows exactly what to do.
- Establish communication channels early so that you can keep your team updated on the latest.
- And last, but absolutely not least, make sure you take care of yourself.
As a reminder, The Fleet Code is brought to you by Fleetio. If you're looking for the best way to run reports and track the success of your fleet operation, Fleetio's fleet management system brings all of your fleet data into one system so you can set goals and surface any gaps immediately. You can learn more about Fleetio at fleetio.com - that's f-l-e-e-t-i-o.com.
I've included a few resources around today's topic in the episode description in case you want to add that to your underground closet of fleet management rations.
Make sure you subscribe to The Fleet Code on your podcast platform of choice to keep up with the latest tips and tricks for fleet managers. Leave a review or rating if you're into that kind of thing. If you have a topic that you'd like us to cover, send us an email to podcast@fleetio.com and let us know. Subscribe to our newsletter and follow at-fleetio on social media for even more fleet management best practices.
Additional Links and Resources
Ready to get started?
Join thousands of satisfied customers using Fleetio
Questions? Call us at 1-800-975-5304