I deleted the production database by accident šŸ’„

October 17, 2020



Today at around 10:45pm CET, after a couple of glasses of red wine, I deleted the production database for my online product (KeepTheScore.com, an online scoreboard app) by accident šŸ˜Ø. Over 300.00 scoreboards and their associated data were vaporised in an instant. By the way, Iā€™m a one-man show, building a software product for a living.

Thankfully my database is a managed database from DigitalOcean, which means they automatically do backups once a day. After 5 minutes of blind panic, I took the website into maintenance mode and worked on restoring a backup. At around 11:15pm CET, 30 minutes after the disaster, I went back online, however 7 hours of scoreboard data was gone forever šŸ˜µ.

To be precise, any scoreboards created or scores added on the 17th October 2020 between 15:47 CET and 23:21 CET have been lost. I am extremely sorry about this.

Production Disaster

What happened?

The function that wiped the database was written to delete the local database and create tables from scratch. However, it connected to the production database and wiped it due to a misconfiguration.

Here is the code that caused the disaster:

def database_model_create():
    """Only works on localhost to prevent catastrophe"""
    database = config.DevelopmentConfig.DB_DATABASE
    user = config.DevelopmentConfig.DB_USERNAME
    password = config.DevelopmentConfig.DB_PASSWORD
    port = config.DevelopmentConfig.DB_PORT
    local_db = PostgresqlDatabase(database=database, user=user, password=password, host='localhost', port=port)
    local_db.drop_tables([Game, Player, Round, Score, Order])
    local_db.create_tables([Game, Player, Round, Score, Order])
    print('Initialized the local database.')

The host is hardcoded to localhost, so it should only connect to the developer machine. However, the connection was initialized with the live database due to an incorrect environment variable setting. For Python Flask, you must set export FLASK_ENV=development to ensure you are running in a development environment. Argh šŸ™ˆ.

What have I learned? Why wonā€™t this happen again?

Iā€™ve learned that having a function that deletes your database is too dangerous to have lying around. The problem is, you can never really test the safety mechanisms properly, because testing it would mean pointing a gun at the production database.

Iā€™ve learned that having a backup which allows a quick recovery is absolutely essential. Thanks DigitalOcean, for making this part reliable and simple.

Iā€™ve learned that even a disaster can have some up-sides. This blog post generated a lot of interest. When life gives you citrus fruits,ā€¦ and so on.

The truth is, I can never be 100% sure that something like this wonā€™t happen again. Computers are just too complex and there are days when the complexity gremlins win. However, I will figure out what went wrong and ensure that this particular error doesnā€™t happen again.

Some perspective

Thankfully nobodyā€™s job is at risk due to this disaster. I am not going to fire the developer ā€“ because I am the developer.

Also, this webapp is just a side-project (Update: this is no longer true.) Itā€™s not the software thatā€™s running a power-plant. Nonetheless, I have many users, some of them paying customers, and I try our very best to make them happy. Today I let those users down and that hurts.

The wonderful irony is that not 4 days earlier I tweeted a hilarious meme about deleting your production database:

Epilogue

This generated a controversial and active discussion on Hackernews. The top comment is:

Iā€™m appalled at the way some people here receive an honest postmortem of a human fuck-up. The top 3 comments, as I write this, can be summarized as ā€œno, itā€™s your fault and youā€™re stupid for making the faultā€. This is not good! We donā€™t want to scare people into writing less of these. We want to encourage people to write more of them. An MBA style ā€œdue to a human error, we lost a day of your data, weā€™re tremendously sorry, weā€™re doing everything in our power yadayadaā€ isnā€™t going to help anybody.

Yes, thereā€™s all kinds of things they could have done to prevent this from happening. Yes, some of the things they did (not) do were clearly mistakes that a seasoned DBA or sysadmin would not make. Possibly they arenā€™t seasoned DBAs or sysadmins. Or they are but they still made a mistake.

This stuff happens. It sucks, but it still does. Get over yourselves and wish these people some luck.

You can you can follow my journey on LinkedIn.

Photo by Niko Virtanen license Creative Commons