A little over a year ago, I wrote a quick blog post about deploying Hubot with Docker. A lot has changed with Hubot and Docker since that time, so I decided to revisit the build.

The new implementation I whipped up consists of three main components:

  1. Yeoman-generated Hubot
  2. Base Docker image
  3. Dockerfile for configuring Hubot

The Hubot ‘Getting Started’ instructions walk us through generating a deployable Hubot with Yeoman. Once generated, the code can be stashed away somewhere until we’re ready to pull it into a Docker image. In this case I committed the code to Github.

Now that we have a bot defined, we can build a new Docker image to deploy and run the bot. The base Docker image (below) installs Node.js, pulls in our bot repo, and runs npm install, but notice we’re not deploying any configuration yet:

# DOCKER-VERSION  1.3.2

FROM ubuntu:14.04
MAINTAINER Nathaniel Hoag, info@nathanielhoag.com

ENV BOTDIR /opt/bot

RUN apt-get update && \
  apt-get install -y wget && \
  wget -q -O - https://deb.nodesource.com/setup | sudo bash - && \
  apt-get install -y git build-essential nodejs && \
  rm -rf /var/lib/apt/lists/* && \
  git clone --depth=1 https://github.com/nhoag/bot.git ${BOTDIR}

WORKDIR ${BOTDIR}

RUN npm install

Anyone can use or modify the build to create their own Docker images:

git clone git@github.com:nhoag/doc-bot.git
# Optionally edit ./doc-bot/Dockerfile
docker build -t="id/hubot:tag" ./doc-bot/
docker push id/hubot

At this point, we have an image ready to go and just need to sprinkle in some configuration to make sure our bot is talking to the right resources. This is where we’ll make use of the bot-cfg repo, which contains yet another Dockerfile:

# DOCKER-VERSION        1.3.2

FROM nhoag/hubot
MAINTAINER Nathaniel Hoag, info@nathanielhoag.com

ENV HUBOT_PORT 8080
ENV HUBOT_ADAPTER slack
ENV HUBOT_NAME bot-name
ENV HUBOT_GOOGLE_API_KEY xxxxxxxxxxxxxxxxxxxxxx
ENV HUBOT_SLACK_TOKEN xxxxxxxxxxxxxxxxxxxxx
ENV HUBOT_SLACK_TEAM team-name
ENV HUBOT_SLACK_BOTNAME ${HUBOT_NAME}
ENV PORT ${HUBOT_PORT}

EXPOSE ${HUBOT_PORT}

WORKDIR /opt/bot

CMD bin/hubot

Here we’re extending the public nhoag/hubot image created earlier by adding our private credentials as environment variables. Once this is populated with real data, the last steps are to build and run the updated image.

Below is the full deployment process that should give you a new Slack-integrated Hubot:

  1. docker pull nhoag/hubot
  2. git clone git@github.com:nhoag/bot-cfg.git
  3. vi ./bot-cfg/Dockerfile (configure ENVs)
  4. docker build -t="nhoag/hubot:live" ./bot-cfg/
  5. docker run -d -p 45678:8080 nhoag/hubot:live
  6. Add public Hubot address to your Slack Hubot Integration (i.e. http://2.2.2.2:45678/)

Happy chatting!

Update – 2014-12-08

Small optimization to bot-cfg to remove command arguments in favor of environment variables.


Update: It turns out I have multiple accounts and was reviewing a secondary account I’d forgotten about and had barely used. Still, it’s a useful exercise to consider the worst case of data loss in a blackbox cloud system. Digging deeper into the topic of efficient and distributed notes, I found that Brett Terpstra has put an incredible amount of time and effort into evolving this space. Nothing yet feels fully baked, but tools such as Popclip (with awesome extensions), nvalt, Bullseye, and GistBox provide a lot of interesting avenues.


A few weeks ago, it seemed my Evernote account was unexpectedly truncated - it went from hundreds of notes to a mere handful. Turns out I was looking at the wrong account - D’oh! Without realizing the mistake, I was suddenly very motivated to find a transparent and robust system for keeping notes.

My personal notes are mostly excerpts from daily tech ramblings - passages and one-liners from projects, emails, chat transcripts, and the Web. I leverage notes to recall information sources, as fodder for blogging, to remember tricky tech solutions and problems, and to share information with friends and colleagues at opportune moments. All the regular stuff.

The (presumed) data loss provided motivation to investigate alternatives. I can’t yet say that my search is anywhere near complete, but following are some thoughts about Smallest Federated Wiki and IPython Notebook, along with musings around a simpler alternative.

Smallest Federated Wiki


Update: I recently listened to the Javascript Jabber episode on Federated Wiki (no longer ‘Smallest’). It’s worth a listen if you’re interested in distributed information systems.


Smallest Federated Wiki is an impressive distributed information system that has a lot of potential to revolutionize the Wiki-sphere. The major block for me with using this project is the investment to learn how to use it correctly. It has a dense UI and is as amorphous as they come. Up front, this reads as a deep investment of time to only possibly get my needs met.

IPython Notebook

IPython Notebook has so far been easy and fun to set up and use. It maps to my expectations pretty well and is very pluggable. IPython Notebook can provide a very similar functionality and experience to Evernote. It also has the capability to execute code, which is an awesome bonus. There are some lacking features with IPython that if addressed could make it even better for this use-case, but as I’m discovering with additional use, IPython is full of fun surprises.

IPython Notebook doesn’t have built-in functionality for creating local directories, comprehensive search, or note sharing. But it’s easy enough to add or make up for these missing features with plugins, in-notebook code execution, and straight bash.

One thing I really miss from Evernote is the ability to embed an HTML snapshot of a webpage in a note. IPython provides embedded iframes, but this doesn’t protect against a page going away.

There are implementations of IPython for Ruby and PHP, which adds further power to in-notebook computation.

For backups I set up an S3 bucket with sync’ing courtesy of AWS CLI ala:

aws --profile=profile-id s3 sync . s3://bucket-id --delete

Simpler Alternative

Getting back to basics, there are plenty of alternatives for assembling a simple notebook repository. The most straight-forward approach would be to write/paste locally in $editor and commit to $vcs repository. This is stable, can be backed up anywhere, and is version controlled. For sharing, specific notes can be piped to Github Gist:

gist -c -p < path/to/file

For increased compatibility and consistency across mediums (notebook, Gist, static website), it’s probably not a bad idea to compose notes in Markdown. There are a few tools for automating conversions to markdown, but it’ll take some investigation to identify whether the below options are any good:

The Fast 404 Drupal contributed module project page provides a lot of context for why 404s are expensive in Drupal:

… On an ‘average’ site with an ‘average’ module load, you can be looking at 60-100MB of memory being consumed on your server to deliver a 404. Consider a page with a bad .gif link and a missing .css file. That page will generate 2 404s along with the actual load of the page. You are most likely looking at 180MB of memory to server that page rather than the 60MB it should take.

The explanation continues to describe how Drupal 7 has a rudimentary implementation for reducing the impact of 404s. You may have seen the below code while reviewing settings.php:

<?php
$conf['404_fast_paths_exclude'] = '/\/(?:styles)\//';
$conf['404_fast_paths'] = '/\.(?:txt|png|gif|jpe?g|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp)$/i';
$conf['404_fast_html'] = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>404 Not Found</title></head><body><h1>Not Found</h1><p>The requested URL "@path" was not found on this server.</p></body></html>';

But wouldn’t it be nice to actually see the difference between all of these implementations? Thanks to the magic of open source, we can!

Nearly a month ago now, Mark Sonnabaum posted a Gist with instructions for generating flame graphs from XHProf-captured Drupal stacks. The technique converts XHProf samples to a format that can be read and interpreted by Brandan Gregg’s excellent FlameGraph tool.

I set up a local Drupal site in three 404 configurations (unmitigated, default 404, Fast 404) and tested them one at a time. One difficulty with testing is that the default XHProf sample rate is 0.1 seconds. This was plenty for unmitigated 404s, but I had to make a lot of requests to capture a stack with the Fast 404 module in place.

The flame graph screenshots below corroborate what we would expect, with unmitigated 404s being the tallest stack of the bunch, the Drupal core 404 implementation showing a favorable reduction, and Fast 404 showing the shortest stack. We can also extrapolate that adding various contrib modules will push the stacks even higher with unmitigated 404s.

Click each image below for an interactive flame graph.

Unmitigated 404s

Unmitigated 404 Flame Graph

Drupal Core 404 Implementation

Drupal 404 Flame Graph

Fast 404

Fast 404 Flame Graph

Rickshaw Graph

Searching for interesting online datasets, one of the first you’ll encounter is Death Row data from the Texas Department of Criminal Justice. There are several datasets listed on the Death Row Information page including Executed Offenders.

To convert this information into an infographic, the first step is converting the HTML table to a format that’s easy to load and parse. Toward this end I used the Ruby gem, Wombat, to scrape the Executed Offenders table and convert it to JSON.

The following code builds a hash from the Executed Offenders table:

table = Wombat.crawl do
  base_url "http://www.tdcj.state.tx.us"
  path "/death_row/dr_executed_offenders.html"

  headline xpath: "//h1"

  people 'xpath=//table/tbody/tr', :iterator do
    execution "xpath=./td[1]"
    first_name "xpath=./td[5]"
    last_name "xpath=./td[4]"
    race "xpath=./td[9]"
    age "xpath=./td[7]"
    date "xpath=./td[8]"
    county "xpath=./td[10]"
    statement "xpath=./td[3]/a/@href"
  end
end

The above code iterates through all of the table rows (tr) and grabs data from the data cells we’re interested in.

We could stop with the above code in terms of parsing, but at the time I generated this script I was also thinking about analyzing final statements. Since last statements are stored on separate pages referenced from the Executed Offenders table, this next code section scrapes each last statement and replaces the statement link from the table with the actual statement.

data = {}
table['people'].each do |x|
  last_words = Wombat.crawl do
    base_url "http://www.tdcj.state.tx.us"
    path "/death_row/#{x['statement']}"
    statement xpath: "//p[text()='Last Statement:']/following-sibling::p"
  end
  x['statement'] = last_words['statement']
  x['gender'] = 'male'
  unless x['execution'].nil?
    data[x['execution']] = x
    data[x['execution']].delete('execution')
  end
end

At the tail end of the above code block is a bit of cleanup to remove duplicate data and to slightly shrink the hash.

The next code block writes the hash to disk as a JSON array:

File.open('dr.json', 'w') do |f|
  f.puts data.to_json
end

Now that we have the data in an easily digestible format, the next step is to generate a display. I used the Rickshaw JavaScript toolkit - a D3.js wrapper - to convert the data into an infographic.

I repurposed much of the Rickshaw Interactive Real-Time Data example. The main crux of this project was parsing the JSON data into the correct format for use with Rickshaw.

I used CoffeeScript to define and compile JS assets. A limitation of Rickshaw is an inability to cope with unset values (I initially expected these might be treated as zeros). With this in mind, the first step was to populate every possible x-axis value with zeros to avoid errors.

Below are three functions that initialize all of the data points with zeros:

timeAxis = ->
  time = {}
  for year in [1982..2014]
    time[year] = 0
  time

races = ->
  ['White', 'Black', 'Hispanic', 'Other']

preload = ->
  time = {}
  for t in races()
    time[t] = timeAxis()
  time

The last steps are to add the real data, and to build up the chart components. I read in the JSON file with jQuery ($.getJSON file, (data) ->), and ran the data through a couple of quick conversions before building the chart:

pop = (data) ->
  struct = preload()
  for k, v of data
    yr = /\d{4}$/.exec v['date']
    struct[v['race']][yr[0]]++
  struct

tally = (arg) ->
  count = {}
  for t in races()
    count[t] = []
  for a, b of pop
    for r, s of b
      z = new Date(r)
      m = z.getTime()/1000
      h = { x: m, y: s }
      count[a].push h
  count

I’ll spare you the chart code here since it’s fairly lengthy and well documented. But the full code for this project is available here. And the final product can be viewed here. Note that the default chart zoom shows the year ‘2000’ twice. I haven’t looked into this much yet, but the correct year (2010) appears on the second ‘2000’ value on zooming in.

Overall, I found Rickshaw to be a fun library with an excellent API. It does have limitations, but is a good choice for representing time series data. If you need more options for chart type, see NVD3.js or straight D3.js.

Over the last couple of weekends I converted my blog from Octopress to straight Jekyll (still hosting on S3). There wasn’t any particular reason behind the move, but I was curious to know more about the differences between the two platforms, wanted to try out a new theme, and just generally enjoy these types of migrations.

Overall, there aren’t many differences between the two platforms. As many have stated before, the major difference is that Octopress comes with more functionality out of the box, but at the cost of increased complexity. Octopress is a great way to get into static sites, but after gaining some experience I really enjoyed paring down and digging into Jekyll.

There were a couple of fun tasks with the conversion, mostly with regard to setting up redirects and deprecating xml documents.

Redirects

One of the main differences between the old and new versions of my site is the way tags are handled. On the old site they were found at blog/categories/{category}, but on the new site they are at tags/#{category}. There are several plugins for generating redirects, and in my case I wanted to make sure I could automate the process, and set up redirects for a defined set of paths. The Jekyll Pageless Redirects plugin got me most of the way there.

The Jekyll Pageless Redirects plugin introduced a couple of issues, with resolution detailed in the following issue thread:

Basically you’ll want to apply the changes from here, and ensure that paths are specified as follows (note the leading forward slash):

/origin : /destination

With a working redirect implementation, the next step was to define the list of redirects. Here’s the one-liner bash script I used:

ls -1 path/to/categories | while read tag ; do echo "/blog/categories/$tag : /tags/#$tag" >> _redirects.yml ; done

The above script starts by listing all of the category directories in the original site, pipes these one-by-one into a while loop that echos each value in the desired YAML format and appends the result to the _redirects.yml file.

Deprecating Feeds

Similarly to category redirects, the old site generated an xml feed for each category. These feeds are not included in the new Jekyll site, and I don’t see a need for them to continue. Rather than shut them down entirely, it’s easy enough to hijack the redirect plugin to perform the small additional task of deprecating each of these feed items. This way most feed consumption services will interpret that this resource is officially gone.

I added the below code to the plugin and was good to go:

# snip

retire_xml = 'atom.xml'

# snip

retire_xml_path = File.join(alias_dir, retire_xml)

# snip

File.open(File.join(fs_path_to_dir, retire_xml), 'w') do |file|
  file.write(xml_template)
end

# snip

(retire_xml_path.split('/').size + 1).times do |sections|
  @site.static_files << PagelessRedirectFile.new(@site, @site.dest, retire_xml_path.split('/')[1, sections + 1].join('/'), '')
end

# snip

def xml_template()
  <<-EOF
<?xml version="1.0"?>
<redirect>
  <newLocation />
</redirect>
  EOF
end

# snip

And with that I pushed up to S3 and am fully on Jekyll, with the small exception that I still use the Octopress gem to generate new posts.