Less than two hours after I logged into the admin end of this website, which is powered by WordPress, the site went offline, with a 502 Bad Gateway error. What the…?
So I logged in via ssh and noticed a serious lag between when I type a letter and when it appears on the screen. I’ve got a problem. And with latency like that, running top will not help. After verifying that all the server applications were running, I checked disk usage using df -h. The output hinted at the problem I was facing.
Filesystem Size Used Avail Use% Mounted on /dev/vda 40G 38G 0 100% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 992M 12K 992M 1% /dev tmpfs 201M 320K 200M 1% /run none 5.0M 0 5.0M 0% /run/lock none 1002M 0 1002M 0% /run/shm none 100M 0 100M 0% /run/user
The server was out of disk space! But this is a Cloud server with 40 GB of disk space and only serving a couple of websites. What could have consumed all the disk space?
Just to recover some space and see if I could get the site back online, I decided to delete a few zip files that had served their purpose. Another df -h after that revealed that I had recovered 500 MB.
Filesystem Size Used Avail Use% Mounted on /dev/vda 40G 37G 500M 99% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 992M 12K 992M 1% /dev tmpfs 201M 336K 200M 1% /run none 5.0M 0 5.0M 0% /run/lock none 1002M 0 1002M 0% /run/shm none 100M 0 100M 0% /run/user
That brought the site back online and put a smile on my face. And just at that point I recalled having installed a backup plugin. Perhaps the backups were the culprits. An ls -lh in the backup directory confirmed my suspicion.
-rw-r--r-- 1 93M Feb 8 03:00 database-backup-1417314413.sql -rw-r--r-- 1 14M Jan 25 03:05 backup-1417314413-complete-2015-01-25-03-00-51.zip -rw-r--r-- 1 11G Feb 1 03:08 backup-1417314413-complete-2015-02-01-03-00-02.zip -rw-r--r-- 1 3.4G Feb 8 03:00 backup-1417314413-complete-2015-02-08-03-00-00.zip
Two huge backup files were in the directory and the latest had a timestamp that matched when the site went offline. Since I had already downloaded the previous backup, I deleted it from the server. Another df -h broadened my smile.
Filesystem Size Used Avail Use% Mounted on /dev/vda 40G 27G 11G 72% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 992M 12K 992M 1% /dev tmpfs 201M 336K 200M 1% /run none 5.0M 0 5.0M 0% /run/lock none 1002M 0 1002M 0% /run/shm none 100M 0 100M 0% /run/user
Now the site is running like it’s supposed to. The server running out of disk space was an oversight. The main lesson here is I need a better backup strategy. Probably need to be backing up to a Cloud storage service. The other lesson is I need to have a script that reports disk usage by email daily, because this should never happen again.