Pardon the Interruption: Jul 23rd Downtime Postmortem

Early this morning (U.S. time) Fleetio was inaccessible for almost 5 hours. While it was nothing wrong with our software or servers, it still caused our service to be unavailable for an extended amount of time. That’s not acceptable to us and not in line with our vision to be the best cloud-based fleet management software provider.

Here’s what we learned, and how we’re fixing it going forward.

What happened?

DNS is sort of like the phonebook of the internet. It’s how when you type “fleetio.com” into your browser, it knows where to go.

We were using a 3rd-party vendor called Zerigo to manage DNS settings for Fleetio. We were happy with their service until this morning when we couldn’t reach our servers through the normal channels.

Some quick checking around let us know Zerigo was in the midst of a Denial of Service Attack, making their service unavailable. Thousands upon thousands of otherwise perfectly fine websites were simultaneously inaccessible as a result, including Fleetio.

We were able to switch DNS service providers to a different 3rd-party vendor fairly quickly, however DNS can sometimes take hours to update across the internet. As of around 8:45 AM central U.S. time this morning, things were pretty much back to normal.

Going forward

We are going to be using multiple 3rd-party DNS server vendors from now on to guarantee that if one vendor is having an issue, the other one will pick up the slack. We should have done this in the first place, and we have certainly learned a valuable lesson.

We will also be reviewing all parts of our infrastructure to find and resolve any other risks. As a software-as-a-service (SaaS) provider of fleet management software, it’s our mission to ensure Fleetio is available when you need it. You can be certain that we will do whatever it takes to do so.