I awoke early Sunday morning (December 4 2016) to find that this website was not reachable. I, and visitors too, was getting the dreaded (for a website owner) 500 error message.
I went to bed just after 2:00 a.m. that Sunday, so between that time and 7:00 a.m. when I woke up, something had gone wrong. I manage my own servers, so there’s no technical support to call. I had to do something, because server down means revenue will be down at the end of the month.
Based on experience, the first thing I did after logging into the server was check whether MySQL was running. It was not. A database-driven website has to have the database application up and running or there’ll be no connection. An attempt to restart MySQL failed, with an output that included a line about not being able to write to disk or something of that nature.
That pointed to a disk is full situation. Not good! Now I have to find out what application or applications chewed up whatever free disk space I had left.
That’s when I thought about the easiest tool to use when troubleshooting disk usage on a Linux server – ncdu.
If you’re not familiar with it, ncdu is an ncurses interface for du, the tool used for estimating file space usage on Linux distributions. It’s installed out of the box, but ncdu is not, so if you which to use it, first install it using one of the following commands.
# Install ncdu on Ubuntu- or Debian-based distributions sudo apt install ncdu # Install ncdu on Fedora or CentOS sudo dnf install ncdu #
So I had a problem (my website was down) that was the result of full disk space on the server. There were two solutions – one temporary (delete files not needed on the server) and another much deeper solution (find out what application was writing too much data to disk). The temporary solution was easy, and in about five minutes I had freed up almost 300 MB of disk space. That was more than enough to restart MySQL (and other services) and get the website back up.
That out of the way, I deployed ncdu to get at the root cause of the problem. The application can be run from any directory, including the root directory. The output will tell you the disk usage of each file and directory, with the ability to drill down into any listed directory.
So I began my troubleshooting journey by typing the following command:
# Run ncdu on a Linux distribution ncdu / #
The output, and the main interface of ncdu, is shown in Figure 1. It’s a very simple and easy application to use – the arrow keys, ENTER/Return key and a few other keys are all you need to navigate its interface. Just by looking at what was displayed on the screen, I knew right away where most of the disk space is being used. But what I had completely forgotten was I had assigned 4 GB of disk space to a swap file. Because DigitalOcean no longer recommends configuring swap on their servers, that’s 4 GB that I could reclaim. So installing and running ncdu has already paid dividends.
To drill down into a directory, you only have to selected it and press ENTER.
Doing that on the /var directory further narrowed where most of the disk space is being used. What I saw on this screen surprised me – OSSEC is using up almost 14 GB of disk space. That’s too much, and hints at a misconfiguration of OSSEC. If OSSEC is new to you, it’s an host-based IDS. More about it here.
By drilling much further down into the OSSEC directory, it was easy to see that OSSEC was the application that’s chewing up way too much disk space than it’s supposed to. So problem has been identified. I’ll have to do something about OSSEC within the next few days.
By drilling down into the directory holding the website’s data, I was able to find a few archive files that I could delete. These were files from last year that I didn’t need any more. One was a 1.2 GB archive that was neatly tucked away into in the bowels of WordPress. Now it’s gone.
So that’s how I spent the early hours of Sunday, December 4 (2016). It pays when you know what to do when trouble strikes. One final commentary on ncdu: Though it provides much needed information about each file and directory, it does not tell you the last access time of any resource, something you can get when calling du directly. For example, if you run du as given in the following command:
# Run du with the --time option du -chs --time /var/www/* | sort -rn | head #
The output should look something like the one below: The last time a file was accessed is useful when looking for signs of unauthorized access.
# Output of du with the --time option 8.0K 2015-07-13 11:13 /var/www/html 5.7G 2016-12-05 17:58 /var/www/sites 5.7G 2016-12-05 17:58 total #
Hope you find this little piece about ncdu useful. And I hope your Sunday didn’t start out like mine.
I like xdiskusage, which gives a nice graphic presentation of directory sizes.