What i learned from the OVHCloud data center fire

Posted by on May 28, 2021 · 7 mins read

I have been an OVH customer for more than five years now. I run some apps and I deploy a bunch of lightweight websites for me or my clients on their cloud. This includes my website, others apps landing pages, ongoing startups POCS, clients demos ..
In march 2021, a fire took place at one of ovh datacenter and put down the majority of services running on that datacenter. https://www.reuters.com/article/us-france-ovh-fire-idUSKBN2B20NU

Unfortunately I was one of these affected users and all my apps/servers which were running on VPSs went down.

Hopefully, this disaster didn’t affect serious apps I run. all other apps were hosted on dedicated servers.
Seeing my apps disappearing instantly taught me a lot of things about how to prepare for a disaster and the best way to react as quickly as possible.


The majority of apps I hosted were zero host dependent. They are either containerized apps or static bundles. This an application example

Here’s a minimum set of measures to take into account to deal with such a situation. This will not protect the datacenter from fire but at least it will speed up your web app recovery.

Notify your users/clients

When your online service or client app went down, the first thing to do is to notify your customers. Regardless of the reason, bad things happen and when something goes wrong a message should be sent to all concerned users.
Show people that you care even by a helpful message.
We have already seen many web platforms go down recently. Remember slack, gmail, g calendar, github …
So, think about a bulk mailing service for further events.. You can always find one for some pennies.

Do not trust cloud provider

No doubt today the cloud is the defacto choice for hosting web apps. I have been working with medium to big clients for a while now, in the last seven years, only one did choose to build and deploy on his own infrastructure. But the cloud is just a cluster of physical servers (similar to your desktop but most powerful) that can go down at any time. So consider your cloud provider SLA / DR services and never take their service for granted.

Automate almost everything

From the code commit to new version deployment, all tasks should be automated and packed within your project. For ci/cd you may take a look at drone io, a lightweight tool to run ci/cd pipeline in a full containerized mode. There are a lot of other cool tools for automating almost everything in your application lifestyle.

Package manager / Dependencies upgrading

All your packages of your application must be ready for any fresh installation with zero cost. You should use the suitable package manager for your web app. Package.json for node, Gemfile for ruby, requirements.txt for python …. Keep your dependencies with the latest version and upgrade them regularly. This is an example of how a licence version modification affected hundreds of rails based websites.

Containerization is not an option

Containers, docker or the like are no longer an option. Your application should run in a container mode so you can move it from one host to another with zero changes in your code.

Backups

The most important asset in a web app is the data. Creating and setting up a cron/job to backup your data must be considered from day one.
Data backup includes :

  • Databases

  • Files uploads

  • Logs

  • Errors & analytics

Below a boilerplate script to backup a pg database and create a zip file. This script is used with a postgres server running in a container. Replace the <some_value> with your values

#!/bin/sh
DIRNAME=<your_backup_folder_location>
# create folder if doesn't exist yet
if [ ! -d $DIRNAME ]
then
       echo "backup folder does not exist yet. Creating backup folder"
       mkdir $DIRNAME
       echo "backup folder exists"
else
       echo "backup folder already exists"
fi
BACKUP_FILE_NAME="backup-$(date +'%y-%m-%d-%HH-%MM')"
PATH_TO_FILE=$DIRNAME/$BACKUP_FILE_NAME
# backup the db to date
docker exec <container_name> /usr/bin/pg_dump -U <your_pg_user> <your_db_name> > $PATH_TO_FILE
# gzip file
gzip -c $PATH_TO_FILE > $PATH_TO_FILE.gz
# send the file now to your remote
rsync -a $PATH_TO_FILE.gz  <your_remote_file_location>
# CLEAN folder
# PUSH to another server
# SEND EMAIL …

Note that your <your_remote_file_location> can be an ssh remote. S3 or any backup remote server


You can configure a simple cron job using crontab at any interval you wish.

Healthchecker

One of the easy steps to take right now is to set up a health check system for your app. You can use a lightweight service that pings your server or your api and notify you when things go wrong (slow response, no response ….). Check healthchecks.io for more info

503 status page

When your website goes down, your domain page must stay accessible. Configure your deployment in a way to display a maintenance page with code 503 when the app is down.

Just imagine you have a physical store and one day you just closed the door for any reason, then you can not serve any customer but your store is still there.

Status page

A status page is a nice feature that indicates the status and the health of each of your services.
Check out this list

Other dedicated solutions

There are many dedicated solutions for complete disaster recovery. Almost all cloud providers come with their DR recovery solution. You can choose to use their service or check for other products.


These were a list of fast remedies to catch up a sudden disaster that can shutdown your services and vanish your data.