ncdu for the Win

While reading through some internal tool enhancement tickets at work the other day, I happened accross a quick mention of a command line tool that I'd not yet seen, but which proved to have immediate value. The tool is ncdu, 'NCurses Disk Usage', which as the man page states, "...provides a fast way to see what directories are using your disk space."

In the process of onboarding new sites to Acquia Cloud, it's not always clear where the lines have been drawn with regard to separating out code and media assets for a site. Drupal itself is also quite flexible about where media can be stored, and custom PHP opens up the possibilities completely. Version control is not so forgiving, as loading media into a VCS can cause a repository to be unusably slow. In order to maintain wholistic efficiency for a project, it's helpful to know if there are bulky files stashed somewhere in an application. With this piece of the puzzle, it's possible to divert media assets out of the repo and into the file system.

This is where ncdu comes in. Regular old du is a handy tool indeed, but requires a lot of iterative manual steps to walk through an entire directory tree. By contrast, ncdu pops you into an interactive screen with a simple graph showing where the heaviest files are located. You can quickly navigate up the chain and find those big files in no time! Note: Calculating disk usage in a large and vast file system is still going to take some time to crunch.

GoAccess Plugin: UTC Support Added

In other news, the GoAccess Shell Plugin is freshly outfitted with UTC time specification. In addition to being able to filter by start time X hours or minutes ago, you can now pass a UTC argument to hit a specific time range in an access log. To have the script return values starting at UTC 9:30am, you can pass --time=0930. The new change is particularly helpful for honing in on that 3-5 minute period of downtime where you want to determine if there were any anomalies in the traffic pattern.

I made a bunch of big changes to the GoAccess shell plugin including improved argument handling, greater agnosticism, and greater configurability with time filtering. I converted the plugin from sourced script to regular bash script. This meant the script components had to be ordered into a sequential configuration, but this also makes the script more portable and easier to fire up.

The script options now support short and long call values, courtesy of Stack Overflow. So the log file location can be designated with either -l or --location. and the same is true of each option. I also made the options more intuitive to use by ensuring that options such as --report and --open can be called without having to pass an associated extra value such as '1' or 'yes'.

The time filter defaults to showing results for the past hour, but this can be altered in various ways. You could specify that you want results from 3 hours ago with -b3. By default, the end value is 'now', but this can also be customized by passing -d10M to specify that results should span a 10-minute period following the start time. Time units and integer values are parsed with regex and sed, respectively.

Many of the big changes were made over the course of several hours at a location where I was unable to iteratively test. It was a bit scary to diverge so far from a working copy of the script, but in the end I think it allowed me to be more adventurous with the direction of the scipt. The subsequent debugging wasn't as involved as I had anticipated.

The remaining TODOs are to add support for parallel remote connections, compression support for gzipped log files, and support for filtering by arbitary UTC (HH:MM) time values.

I just checked off another item on the TODO list - set up VPN. This was so simple and quick, I wish I'd done it earlier.

I basically just ran through the instructions at the following 3 links:

Mostly the same instructions apply for Ubuntu 12.10 and OpenVPN 1.8.5.

I made some good progress on the GoAccess plugin over the past few days. Many of the kinks have been ironed out and making access.log reports has never been so easy :)

It's a pretty simple plugin, but it does the job well. My favorite parts of the script are a fun bit of regex that's just aesthetically pleasing, the awk date filter, and the overall flow of execution.

bash Fun Regex yes='^1$|^([y|Y]([e|E][s|S])?)$'

bash awk Date Filter cmDATE=`date -u -v-"$goINTVAL"H +\[%d/%b/%Y:%H:%M:%S` # OSX date command format (-v). ssh -F $HOME/.ssh/config "$goSRV" \ "sudo awk -v Date=$cmDATE '\$4 > Date { print \$0 }' /var/log/$tech/$file"

An excellent log analysis tool that I picked up recently from a blog post by my colleague, Amin Astaneh, is the GoAccess interactive web log analyzer. Out of the box, you can unleash GoAccess on raw or piped log data to reveal an array of interesting traffic patterns that might otherwise take some serious piping skills to crack - I covered some of these in a recent blog post.

Working at Acquia, we do diagnostic work with log files across lots of servers, web applications, and stack technologies. To aid our efforts at log analysis, I started a GoAccess shell plugin to enable traversing various servers and types of log files using aliased GoAccess commands over secure ssh tunnels.

At first I was a little disappointed at the limitations of the GoAcess project, but the simplicity leaves a lot of room for extensibility.