The January 2014 outage: a buggy upgrade script took Dropbox offline for days
January 2014
A subtle bug in a maintenance script reinstalled the operating system on a small number of active production database machines, knocking Dropbox offline starting Friday 10 January 2014, with full service not restored until Sunday.
What happened
On Friday 10 January 2014, around 5:30 PM Pacific, Dropbox went down during what was supposed to be routine OS-upgrade maintenance. A subtle bug in the upgrade script caused it to reinstall the operating system on a small number of machines that were still actively serving production traffic, including some master-replica database pairs. With those databases knocked out, the service became unavailable.
Dropbox restored most functionality within about three hours, but full recovery dragged on because some of the affected MySQL databases were very large and slow to rebuild from backups. Core service was not fully restored until Sunday 12 January, roughly two days after the incident began. In a detailed engineering post-mortem, VP of Infrastructure Akhil Gupta explained the root cause and the fixes: a new verification step requiring machines to confirm their own state before executing destructive commands, and a tool to parallelize MySQL binary-log replay so future restores would be faster.
Dropbox stressed that no file data was lost — the affected databases held metadata, and users' files were never at risk.
Impact
The outage was one of the most visible reliability failures in Dropbox's history, taking down a service that tens of millions of people and businesses relied on for access to their files for the better part of a weekend. It became a widely cited case study in how a single automation bug can cascade into a multi-day outage, and in the value of public, technical post-mortems. The slow database restore also exposed how recovery time, not just failure prevention, is a core reliability concern for cloud storage.