What did you learn today?

Being a good software developer requires you to continuously learn new languages, skills, techniques and tools. Here are a few tips to get you started and keep you learning.


Branch out beyond your existing core skills

You most certainly know how to do some things very well.

You have a programming language that you know by heart. There is a framework you always use to build websites. You have your environment and tools setup exactly the way you like.

You are productive. Being productive makes you happy. It provides a sense of accomplishment. It pays the bills.

As humans, we are naturally biased to use the tools that we know well in order to solve a problem at hand. We do this because it is immediately rewarding.

On the other hand, learning is often hard, confusing and time consuming. If anything, we are naturally discouraged from learning.

Because technology rapidly changes, the value of your existing knowledge rapidly declines.

The only way to deal with this fact is to always be putting yourself in a position that requires you to learn and use new skills.

If you do this enough, you will soon find that the act of mastering a new skill and the accompanying new found productivity will become a preferable reward.

Learn everything you can through deep immersion

The fastest way to learn a new spoken language is to go to a country where that language is spoken and learn while immersed within every aspect of the culture.

The same holds true for learning a new programming language. The fastest way to learn is to immerse yourself in the relevant programming community.

You can do that by:

* reading blog posts and books
* watching videos and listening to podcasts
* attending or speaking at user groups and conferences,
* connecting with other developers over Twitter and IRC
* reading and writing code for open sources projects
* joining a team at work already using the language on an existing project

It can be counter-intuitive and even scary to jump right into the deep end of a pool, but it really is the quickest way to learn to swim.

Copy code that other developers have written

Copy-and-Paste programmers are often derided, but every good developer has well worn copy and paste keys.

It can be really challenging to write code from scratch when using a new technology.

Instead, it is better to copy working code written by other developers, even if you do not fully understand the code.

Having a working example will allow you to tinker, test, and deconstruct the code until you can understand each piece.

While you should not copy and paste code directly for production use, it is a great way to write prototype code and learn new programming patterns.

Read, write, read, write, read, write

When you begin to learn a new language, you will most likely spend most of your time reading code.

When you begin to become proficient in using the language, you will most likely spend most of your time writing code.

Both of these situations are common anti-patterns.

You can greatly accelerate your learning by writing code that utilizes the patterns and techniques that you read about, right after reading about them.

In order to continue learning after becoming proficient, you need to read code other people have written in order to find new patterns and techniques to use in your own code.

You should always be reading as much code as you write and vice versa.

Show off your code, get feedback, revise

It is a well known psychological bias that when we are faced with a decision, we more strongly believe one choice is correct, after we have made that choice, simply because we made the choice.

Writing code involves a lot of small decisions. These facts together represents a significant obstacle to improving the code we write.

With each decision you make, you become more and more confident in the outcome, and can often come to the point of being unable to see any other way to write the code.

Because we can so easily become blind to our own code, we must get other developers to review our code and suggest alternative approaches and improvements that might be hard for us to otherwise see.

Authors never write books without editors and rewrites. Developers never write code without reviewers and revisions.

Teach in order to master

The best way to master a new technology is to find an opportunity to teach someone that technology.

There are many ways to do this, some easy, and some more involved:

* telling a colleague
* giving a demonstration
* writing a blog post
* writing documentation
* giving a presentation
* organizing a workshop
* pair programming
* code reviews

Because each of these activities will require you to spend as much time organizing your own thoughts as time spent actually teaching, you are sure to benefit as much as your pupils.

Adapted from a blog post of mine on the Square Root internal engineering blog

Dynamic matchers in RSpec

RSpec has a neat feature that can improve the readability of your tests called Dynamic Predicate Matchers. These are matchers that are created on the fly for the particular class under test. Consider the following simple class:

class Foo

  attr_accessor :bar, :baz

  def valid?
    bar == true && baz == false
  end

end

The valid? method is a predicate method. Predicate methods are, by convention, methods that end with a question mark and return a boolean. They are frequently methods that report on the internal state of an object. In a test, RSpec will automatically generate matchers that leverage those predicate methods. Here are two example tests that use a dynamically generated matcher be_valid:

RSpec.describe Foo do

  it "is valid when bar is true" do
    foo = described_class.new
    foo.bar = true
    foo.baz = false
    expect(foo).to be_valid
  end

  it "is invalid when bar is false" do
    foo = described_class.new
    foo.bar = false
    foo.baz = false
    expect(foo).to_not be_valid
  end

end

The expectations read much more like English. Compare

expect(foo).to be_valid

to the alternative

expect(foo.valid?).to be true
Hasten the import of large tables into mySQL

You may find that someday you are working on a production application, and you want to do some testing on your local environment using production data. Furthermore, you may find that this application has a very large mySQL database, with tables that have many millions of rows. So, you export that database file from your production environment into a SQL file using mysqldump and copy it to your local computer. However, when you go to import that database into mySQL like so cat database_dump.sql | mysql -uroot it takes many hours to import.

This is not unusual for large database, but there may be something that can be easily done to significantly cut down on the import time. Now I am neither a DBA, nor a mySQL wizard, and so with all that follows: buyer beware. It seems from some research online, that there is only one true answer on how to optimize mySQL: It depends. Not only that. It depends on many, many things including, mySQL configuration, available memory, usage patterns, the operating and file system, schema design, table size, and even possibly what you had for lunch the day prior.

Now let's say that you are working with a database which contains a large, heavily indexed innoDB table. If you inspect that SQL dump, you will find a CREATE TABLE statement. It might look something like this:

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `address_detail_id` int(11) NOT NULL,
  `billing_detail_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name` (`id`,`name`),
  KEY `address` (`address_detail_id`),
  KEY `billing` (`billing_detail_id`),
  KEY `foreign` (`id`,`address_detail_id`,`billing_detail_id`),
  KEY `covering` (`id`,`name`,`address_detail_id`,`billing_detail_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Shortly thereafter will be a large number of INSERT INTO statements pumping the data into your newly created table. Note that this table has a number of KEY attributes, indicating the presence of indexes. Indexes can speed up select queries, but they do so by trading storage space for speed. You might simply think of an index as a presorted version of the data in your table that makes it easier for mySQL to find a particular piece when searching for it.

When you insert data into an indexed table, mySQL must not only store your data, but also must sort it. It turns out that for small innoDB tables, this can be done efficiently to the point that it does not significantly impact the time to insert data. Now apparently, or rather reportedly, inserting data into indexed MyISAM tables, even smaller ones, is slower that inserting them into a table without those indexes. Thus, this explains why you will fined statements like these in your SQL file surrounding the insert statements for any given table:

/*!40000 ALTER TABLE `users` DISABLE KEYS */;
...
/*!40000 ALTER TABLE `users` ENABLE KEYS */;

It signals to mySQL to disable the indexes while inserting data, for speed. The indexes are built when they are re-enabled. Note that those commands are not commented out, but rather that is the syntax for conditional execution of commands depending on the server version. Apparently this command was introduced in MySQL 4.0.0.

Now for whatever reason, this command does not disable indexing on innoDB tables, even though it is included in the dump. Possibly that is because innoDB is quite good at inserting data into indexed tables. However, for large innoDB tables, it does not insert data as rapidly into indexed tables as compared to inserting data into unindexed tables.

So, I ran a benchmark on my local Mac OS X (10.10) environment with a default MySQL 5.6.22 install. The benchmark (1) created an indexed innoDB table (2) inserted varying amounts of generated data (3) exported the data using mysqldump (4) measured the time to import that SQL dump. As can be seen in the plot below in red, at around 3M rows, the rate of inserts dropped significantly.

Comparison Chart

At this point in time, I really don't know what is causing the slowdown. From my light research, I might hazard a guess that the memory requirements of the indexing algorithm grow beyond the available resources. So, the exact position of the knee and amount of slowdown likely depends on the configuration and hardware. Here is the benchmarking code if you would like to try it out.

To speedup the import process for large indexed innoDB tables, I created a tool called Hasten. This tool alters a SQL dump so that it will import faster. It does this by removing the indexes from all table definitions and then adding the indexes back at the end of the import. If you review the plot above, you will see that there is a dramatic reduction in import time for large tables. Hasten is written in Ruby and if you have Ruby on your system you only need to install the gem

gem install hasten

and then insert Hasten into your import command like so

cat DUMPFILE | hasten | mysql -uUSER -pPASSWORD DATABASE
Useful Sublime Text 3 Packages for a Rubyist

Sublime Text is an extensible editor. To maximize my productivity, I have found that using Packages and organizing a set of custom key binding has been very important. There are a large number of Packages available that add support for code highlighting and snippets in various languages, graphical theming, linting, autocompletion, and custom build tools. These can all be found on Package Control, a web directory of Packages. Here is a list of my most used Packages and the custom key bindings that I have setup to utilize them.

Origami

Origami is a package that augments functionality around creating and manipulating panes. It provides a set of commands to create, move to, size and zoom them. I already have a set of key bindings memorized for pane navigation in Tmux and so I've set them up similarly in Sublime:

{ "keys": ["ctrl+b", "p"], "command": "prev_view" },
{ "keys": ["ctrl+b", "n"], "command": "next_view" },
{ "keys": ["ctrl+b", "o"], "command": "focus_neighboring_group" },
{ "keys": ["ctrl+b", "z"], "command": "zoom_pane", "args": {"fraction": 0.8} },
{ "keys": ["ctrl+b", "s"], "command": "resize_pane", "args": {"orientation": "cols"} },

{ "keys": ["ctrl+b", "c"], "command": "create_pane_with_file", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "x"], "command": "destroy_pane", "args": {"direction": "self"} },

{ "keys": ["ctrl+b", "up"], "command": "travel_to_pane", "args": {"direction": "up"} },
{ "keys": ["ctrl+b", "right"], "command": "travel_to_pane", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "down"], "command": "travel_to_pane", "args": {"direction": "down"} },
{ "keys": ["ctrl+b", "left"], "command": "travel_to_pane", "args": {"direction": "left"} },

{ "keys": ["ctrl+b", "shift+up"], "command": "carry_file_to_pane", "args": {"direction": "up"} },
{ "keys": ["ctrl+b", "shift+right"], "command": "carry_file_to_pane", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "shift+down"], "command": "carry_file_to_pane", "args": {"direction": "down"} },
{ "keys": ["ctrl+b", "shift+left"], "command": "carry_file_to_pane", "args": {"direction": "left"} },

Github Tools

Github Tools is a package that provides commands to interact with the Github repository that you are editing in Sublime. I find myself frequently needing to share references to code with colleagues. Github tools makes it easy to generate a URL on Github to code you have selected in Sublime. It also provides some useful commands to create, edit, and load Gists directly in Sublime. I group all my Github commands behind a single meta key ctrl+g in the style of Tmux:

{ "keys": ["ctrl+g", "g"], "command": "public_gist_from_selection" },
{ "keys": ["ctrl+g", "p"], "command": "private_gist_from_selection" },
{ "keys": ["ctrl+g", "o"], "command": "open_gist_in_editor" },
{ "keys": ["ctrl+g", "w"], "command": "open_gist_in_browser" },
{ "keys": ["ctrl+g", "v"], "command": "open_remote_url" },
{ "keys": ["ctrl+g", "c"], "command": "copy_remote_url" },
{ "keys": ["ctrl+g", "b"], "command": "blame" },
{ "keys": ["ctrl+g", "h"], "command": "history" },

CTags

CTags provides a way to easily generate, navigate and search an index of language objects found in your active Sublime project. This is most useful for navigating directly to function or constant definitions in files. This Package requires that you install and configure a tag generation tool. The default setup is configured for Exuberant CTags, but I use Ripper Tags for Ruby and configure it as follows using RVM:

{ "command": "source $HOME/.bashrc && rvm-auto-ruby -S ripper-tags" }

and setup key bindings behind the meta key ctrl+t.

{ "keys": ["ctrl+t", "t"], "command": "navigate_to_definition" },
{ "keys": ["ctrl+t", "f"], "command": "search_for_definition" },
{ "keys": ["ctrl+t", "r"], "command": "rebuild_tags" },

Shell Commands

Shell Command is a package that allows you to execute arbitrary commands in a shell and place that output in a scratch buffer (rather than a panel) making it easily viewable. In its most flexible usage, you simply type the command in a pop-up window. After the output has been generated in a scratch buffer, you can rerun the command in the same window with a context specific key binding. I have setup the key bindings behind the meta key ctrl+c:

{ "keys": ["ctrl+c", "c"], "command": "shell_command" },
{
  "keys": ["c"],
  "command": "shell_command_refresh",
  "context": [{ "key": "setting.ShellCommand" }]
},

By default the shell does not include your shell configuration. So in order to use commands such as Bundle or Rake, I have setup a custom key binding to allow me to run commands with my configured version of Ruby through RVM:

{
  "keys": ["ctrl+c", "r"],
  "command": "shell_command",
  "args": {
    "command_prefix": "source $HOME/.bashrc && rvm-auto-ruby -S",
    "prompt": "Shell Command"
  }
},

The real power of Command Shell is to setup custom key bindings for your most frequently used shell commands such as viewing a process list or tailing particular logs. For example:

{ // Process list
  "keys": ["ctrl+c", "p"],
  "command": "shell_command",
  "args": {
    "command": "ps xcro user,pid,%cpu,cputime,%mem,command | head -n 28",
  }
},

will show a process list. Then, custom key bindings for the Shell Command context can be used to take action on the output of the command. For example, with the following key binding, you can kill a process by selecting the process number in the buffer and typing 'k'.

{ // Send SIGKILL to a process number selected
  // in a Shell Command Window
  "keys": ["k"],
  "command": "shell_command",
  "args": {
    "command": "kill -9",
    "region": "arg"
  },
  "context": [{ "key": "setting.ShellCommand" }]
},

There is a lot more flexibility and room for customization provided by this package, so I encourage you to check out Shell Command.

Replacement File Browser

File Browser is an excellent replacement for the default file Sublime Sidebar. In particular is adds numerous key bindings for creating and manipulating files, eliminating the need to use the mouse for directory navigation and basic file operations. Here is the key binding to open the FileBrowser at my preferred location on the left hand side:

{
  "keys": ["ctrl+d"],
  "command": "dired",
  "args": {
    "immediate": true,
    "single_pane": true,
    "other_group": "left",
    "project": true
  }
},

but it can also be setup on the right hand side:

SublimeFileBrowser Screenshot2

Web Access

I have find the following four Packages very handy for accessing web content based on content selected inside of Sublime. I have setup the key bindings behind the meta key ctrl+w:

Open URL

Open URL allows you to open your web browser to the URL highlighted in Sublime.

{ "keys": ["ctrl+w", "o"], "command": "open_url" },

Google Search

Google Search allows you to google any content highlighted in Sublime.

{ "keys": ["ctrl+w", "g"], "command": "google_search" },

Goto Documentation

Goto Documentation allows you to intelligently search for help documentation on the web using the automatically determined scope of the highlighted text in Sublime. In other words, if you are editing a Ruby file, it will search the Ruby core documentation.

{ "keys": ["ctrl+w", "h"], "command": "goto_documentation" },

HTTP Requester

HTTP Requester is an amazing package that allows you to execute arbitrary HTTP requests and to get the request response in a scratch buffer. It is very useful for interacting with APIs. It supports making requests using all the HTTP verbs, setting headers, and completing forms.

{ "keys": ["ctrl+w", "e"], "command": "http_requester" },

You can simply select a URL or a detailed response request. For example, selecting the following text in a buffer and triggering a request

POST http://posttestserver.com/post.php
Content-type: application/x-www-form-urlencoded
POST_BODY:
variable1=avalue&variable2=1234&variable3=anothervalue

will Post a form to the specified URL and return the body of the request response in a new scratch buffer with detailed response information, like so:

200 OK
Date:Wed, 31 Dec 2014 20:08:45 GMT
Server:Apache
Access-Control-Allow-Origin:*
Vary:Accept-Encoding
Content-Length:141
Content-Type:text/html

Latency: 77ms
Download time:0ms

Successfully received 3 post variables.

Rendering

Here are three packages that I use to work with Markdown and SQL.

Markdown Preview

Markdown Preview is a Package that will render a Markdown document that you are editing and open it in your browser. It supports either the Python or Github renderers. Because I primarily use Markdown to edit Markdown in Github repositories, I prefer the latter.

{
  "keys": ["ctrl+m"],
  "command": "markdown_preview",
  "args": {
    "target": "browser",
    "parser": "github"
  }
},

SQL

First, SQL Beautifier simply improves the formatting of SQL. I find it extremely useful when working with long queries taken from logs or profilers. Simply select a poorly formatted query in Sublime and trigger the formatter.

{ "keys": ["ctrl+s", "b"], "command": "sql_beautifier" },

Then, SQL Exec is a Package that allows you to execute queries selected in Sublime against a SQL database and returns those queries in a panel view. It requires a bit of tedious configuration of your database connections, but is useful for working in a relatively stable development environment. For more serious work with SQL I prefer SQL Pro.

  { "keys": ["ctrl+s", "c"], "command": "sql_list_connection" },
  { "keys": ["ctrl+s", "e"], "command": "sql_execute" },
  { "keys": ["ctrl+s", "h"], "command": "sql_history" },
  { "keys": ["ctrl+s", "q"], "command": "sql_query" },
  { "keys": ["ctrl+s", "s"], "command": "sql_show_records" },
  { "keys": ["ctrl+s", "d"], "command": "sql_desc" },

BuildView

Sublime has a convenient build system that allows you trigger (super+b) shell command to build a file or execute a test suite. The output of the build command is piped into a Sublime Panel. I prefer to have the output of a build placed into a scratch buffer instead and that is exactly the functionality that the BuildView Package provides. To use it you must override your build key binding.

{
  "keys": ["super+b"],
  "command": "build",
  "context": [{
    "key": "build_fake",
    "operator": "equal",
    "operand": true
  }]
},

Linting

I find that I am using linting in Ruby and JavaScript more and more frequently. There are various linting packages available for these languages (and other too), but I have found the following two Packages to be the best for me.

Rubocop

The Rubocop packageprovides bindings for the Rubocup static code analyzer for Ruby. You first need to install and configure Rubocup, which can take a bit of effort to get it configured for your preferred style. By default the Rubocop package automatically marks issues in your Ruby buffer, but I prefer to disable this

{
  "mark_issues_in_view": false,
}

and instead bind a key to trigger the Rubocup analysis.

{
  "keys": ["ctrl+l", "r"],
  "command": "chain",
  "args": {
    "commands": [
      ["rubocop_check_single_file"],
      ["hide_panel", {"cancel": true}]
    ]
  }
},

Normally, the Rubocup output will be piped to a Sublime Panel, but because I use BuildView, the output is piped to a scratch buffer instead. For whatever reason, it annoyingly leaves the panel open. To solve this problem, I use the Chain of Command Package to trigger a hide_panel command after triggering Rubocop.

JSLint

The JSLint Package provides linting from Douglas Crockford's JSLint Quality Tool for Javascript. It requires you to have installed and configured Node.JS on your system and for it to be in your executable path. By default, it will run each time a JavaScript file is saved. I prefer to instead disable this feature

{
    "run_on_save" : false
}

and instead bind a key to trigger the JSLint analysis.

{
  "keys": ["ctrl+l", "j"],
  "command": "chain",
  "args": {
    "commands": [
      ["jslint"],
      ["hide_panel", {"cancel": true}]
    ]
  }
},

Again, note the use of the Chain of Command Package to trigger a hide_panel command after triggering JSLint.

RSpec Testing

I most frequently use RSpec for testing and the RSpec Package provides a build system configuration, syntax highlighting, code snippets, and a useful key binding that allows you to bounce back and forth between a file and its spec file.

{ "keys": ["super+period"], "command": "open_rspec_file", "args": {} },

Key Bindings

Lastly, to learn and remember all of these key maps, I use the Keymaps Package. It provides a nice cheat sheet that summarized all of the available key bindings as well a convenient search window useful for when you have forgotten a particular key binding.

{ "keys": ["ctrl+?"], "command": "cheat_sheet" },
{ "keys": ["ctrl+/"], "command": "find_keymap" },
Hash Tricks in Ruby

Here are a few tricks for using Hashes in Ruby.

Sort your hash

As of Ruby 1.9 Hashes became ordered by default due to a change in their implementation. However, the method sort for Hashes returns an array of [key, value] pairs, likely as a hold over from when Hashes were unordered.

hash = {f: 4, a: 2, r: 1 }
hash.sort # => [[:a, 2], [:f, 4], [:r, 1]]

To sort a hash and get a hash back there are a few approaches:

Hash[hash.sort]
hash.sort.to_h # Ruby >= 2.1
hash.sort_by{ |k, v| k }.to_h # sort by key
# => {:a=>2, :f=>4, :r=>1}
hash.sort_by{ |k, v| v }.to_h # sort by value
# => {:r=>1, :a=>2, :f=>4}

Hashes all the way down

Sometimes you need to create a tree like data structure. We can take advantage of Hashes in Ruby to accomplish this elegantly. The Hash constructor accepts a default block that will be executed when the hash is accessed by a key that does not have a corresponding hash value. Take for example this identity hash, that returns the corresponding hash value for a key if the value has been set, otherwise it returns the key itself.

identity = Hash.new { |hash, key| key }
identity[:a] = 1
identity[:a] #=> 1
identity[:b] #=> :b

Going one step further in the default block, we can store the value object in the hash so that subsequent calls fetch the object from the hash instead of creating a new one each time.

identity = Hash.new { |hash, key| hash[key] = key }
value = identity[:a]
value # => :a
value.object_id # => 362728
identity[:a].object_id # => 362728

Now if instead of returning the key, we return a new hash, we have a two level tree using nested hashes.

tree = Hash.new { |hash, key| hash[key] = {} }
tree[:a] #=> {}
tree[:a][:x] = 'Foo'
tree[:a][:y] = 'Bar'
tree[:b][:x] = 'Baz'
tree[:b][:y] = 'Qux'
tree # => {
  :a => {
    :x => 'Foo',
    :y => 'Bar'
  }
  :b => {
    :x => 'Baz',
    :y => 'Qux'
  }
}

But note that the depth is limited to two levels because the nested hashes return nil for unknown keys.

tree[:a][:z][:j] # => NoMethodError: undefined method `[]' for nil:NilClass

We can address this by assuring that all hashes in the tree initialize new hashes when an unknown key is accessed. This can be accomplished by reusing the default block of the root node of the tree for each new hash that we construct. The Hash method default_proc provides us access to the default block as a Proc object. If each time we construct a new hash, we pass the default proc of the parent hash, we get a tree that grows endlessly.

teams = Hash.new { |hash, key| hash[key] = Hash.new(&hash.default_proc) }

Note that we pass the default proc as a block to the Hash constructor by converting it using the & operator. This technique allows us to construct arbitrarily sized tree structures on the fly. It is especially useful if we do not know exactly how deep the tree needs to be in advance, or if it needs to grow in size over time.

teams[:hockey][:western][:pacific] = ["sharks", "oilers"]
teams[:hockey][:western][:central] = ["blues", "stars"]
teams[:hockey][:eastern][:metropolitan] = ["penguins", "flyers"]
teams[:hockey][:eastern][:atlantic] = ["redwings", "bruins"]

teams # => {
  :hockey => {
    :western => {
      :pacific => [
        [0] "sharks",
        [1] "oilers"
      ],
      :central => [
        [0] "blues",
        [1] "stars"
      ]
    },
    :eastern => {
      :metropolitan => [
        [0] "penguins",
        [1] "flyers"
      ],
      :atlantic => [
        [0] "redwings",
        [1] "bruins"
      ]
    }
  }
}

Memoizing return values of methods with parameters

It makes sense to store the result of a costly calculation when it is likely to be needed again in the future. In the context of a class, it is a Ruby idiom to store this value in an instance variable:

class Numbers
  def pi
    @pi ||= begin
      ... costly calculations ...
    end
  end
end

This technique, called memoization, hides the fact that all calls after the first call to the method will fetch the computed value from the instance variable rather then compute the number again.

When a method takes one or more parameters, we can use the default block of a hash to achieve memoization in a way that is parameter dependent.

class Numbers
  def greatest_common_denominator(*args)
    @gcd ||= Hash.new do |hash, array|
      hash[array] = begin
        ... costly calculations ...
      end
    end
    @gcd[args.sort]
  end
end

Here, a new hash is stored in the instance variable and when the method is called, the arguments to the method, in the form of an array, are used as the key to the hash. If those particular arguments have not been previously passed to the method and thus the hash, the hash will call the default block to compute and store the value in the hash. Any subsequent calls using those parameters will fetch the previously computed value from the hash instead of computing the value again. Note that for methods where the ordering of parameters is not important, like the method in the above example, we sort the arguments before keying the hash to further reduce the number of times the calculation must be made.

String Templates

The % String operator is useful for inserting data into strings with a specifiable format. For example, formatting a floating point number

"Pi = %.5f" % Math::PI   # => "Pi = 3.14159"

or zero padding integers

"%04d" % 45 # => "0045"

Less well known is that % also accepts a Hash. Hash keys in the string that are called out with a %{} are replaced by their corresponding hash values. I call this the Madlibs feature because it creates a simple string templating system.

variables = {:animal => 'fox', :action => 'jumps'}
template = "The quick brown %{animal} %{action} over the lazy dog"
puts template % variables
# => The quick brown fox jumps over the lazy dog

Word Substitution

The gsub String method replaces text in a string. It accepts a Regex to define the match and a string to define the replacement.

quote = 'The quick brown fox jumps over the lazy dog'
puts quote.gsub(/brown/, 'red')
# => "The quick red fox jumps over the lazy dog"

This works for a single [match, replacement] pair. If we want to make multiple replacements in a string, we can take advantage of the fact that gsub can accept a replacement hash. When a match is found, the replacement is taken as the value from the hash when the match is used as a key.

By matching on any word /\w+/ and using an identity hash populated with the desired replacements, gsub provides an clean way to make an arbitrary number of word substitutions in a string.

replacements = {'dog' => 'pig', 'fox' => 'cat'}
replacements.default_proc = ->(h, k) { k }
puts quote.gsub(/\w+/, replacements)
# => "The quick brown cat jumps over the lazy pig"

Cataloging

A hash can be used to catalog objects from a collection by a given attribute. If we have a collection of objects

Book = Struct.new(:title, :author)
books = [
  Book.new('The Stand', 'Stephen King'),
  Book.new('The Shining', 'Stephen King'),
  Book.new('Green Eggs and Ham', 'Dr. Seuss'),
  Book.new('The World of Ice & Fire', 'George R. R. Martin')
]

those objects can be cataloged by building a hash of arrays, where the arrays are initialized via the default block only as needed.

def catalog(collection, by:)
  catalog = Hash.new { |hash, key| hash[key] = [] }
  collection.each_with_object(catalog) do |item, catalog|
    catalog[item.send(by)] << item
  end
end

puts catalog(books, by: :author) # =>
{
  "Stephen King"=>[
    #<struct Book title="The Stand", author="Stephen King">,
    #<struct Book title="The Shining", author="Stephen King">
  ],
  "Dr. Seuss"=>[
    #<struct Book title="Green Eggs and Ham", author="Dr. Seuss">
  ],
  "George R. R. Martin"=>[
    #<struct Book title="The World of Ice & Fire", author="George R. R. Martin">
  ]
}
Smart strategies for the strategy pattern

The Strategy Pattern can make the behavior of a class extensible without requiring modification of the class definition. Does that sound strange? Consider the following very simple example

require 'json'
require 'yaml'

class Document

  attr_accessor :body

  def initialize(body)
    self.body = body
  end

  def parse_json
    JSON.parse(body)
  end

  def parse_yaml
    YAML.load(body)
  end

end

This Document class can be used to parse both JSON and YAML content in order to create Ruby objects (hashes in this example).

doc = Document.new <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse_json #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

doc = Document.new <<EOS
---
  'a': 'one'
  'b': 'two'
  'c': 'three'
EOS
puts doc.parse_yaml #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

Now let us say that we want to add the ability to parse XML content. The current design requires the addition of a parse_xml method. One way to avoid modification of the document class would be to choose a design based on the Strategy Pattern. Instead of specifying the parsing algorithm in the class, we inject the parsing algorithm into the class. This will decouple the Document class from the parsing algorithm. In the following example, we inject a lambda that encapsulates a parsing algorithm.

require 'json'
require 'yaml'

class Document

  attr_accessor :body, :parser

  def parse
    parser.call(body)
  end

end

doc = Document.new
doc.parser = ->(body) { JSON.parse(body) }
doc.body = <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

While the Document class does not currently do much, its definition is decoupled from that of the parsing algorithm. This allows us to create other parsing strategies, for example ones that handle YAML or XML content, and to use those strategies with the Document class unmodified.

Especially when dealing with more complex algorithms, it is common to create classes to define the strategies. For example

require 'json'
require 'yaml'

class Document

  attr_accessor :body, :parser

  def parse
    parser.parse body
  end

end

class JSONStrategy

  def self.parse(body)
    JSON.parse(body)
  end

end

doc = Document.new
doc.parser = JSONStrategy
doc.body = <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

While the design has its advantages, it requires the programmer to know which strategies can be used with the Document class. Ideally, we would like to have the flexibility to extend the abilities of the Document class without modification of the definition and make the class intelligent enough to know which strategies are available and usable at any given time.

Imagine an AutoParser that can auto select an appropriate strategy (from a list of known strategies) given a particular document. The usage of the AutoParser might look like this

doc = AutoParser::Document.new <<EOS
---
  'a': 'one'
  'b': 'two'
  'c': 'three'
EOS
puts doc.strategy #=> AutoParser::Strategies::YAML
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

doc = AutoParser::Document.new <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.strategy #=> AutoParser::Strategies::JSON
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

Here the YAML strategy is chosen for the YAML document and the JSON strategy for the JSON document, without the need to specify the document format in advance.

To achieve this, we move the Document class into a module called AutoParser, and place the strategy classes into a submodule called Strategies.

require 'json'
require 'yaml'

module AutoParser

  class Document

    attr_accessor :body
    attr_writer :strategies

    def initialize(body)
      self.body = body
    end

    def strategies
      @strategies || AutoParser::Strategies.to_a
    end

    def strategy
      strategies.detect{ |strategy| strategy.available?(body) }
    end

    def parse
      strategy.parse body
    end

  end

  module Strategies

    def self.to_a
      self
        .constants
        .map { |c| self.const_get c }
        .select { |o| o.is_a? Class }
    end

    class Base

      def self.parse(body)
        raise
      end

      def self.available?(body)
        !!parse(body)
      rescue
        false
      end

    end

    class JSON < Base

      def self.parse(body)
        ::JSON.parse(body)
      end

    end

    class YAML < Base

      def self.parse(body)
        ::YAML.load(body)
      end

    end

  end

end

Simultaneously, we add an available? method to each strategy class (In this case done through inheritance from a base class; the method is the same for both strategies). This method is queried by the Document class to determine if a strategy is appropriate to be used on the given Document body. All strategies in the Strategies module will be considered until one is found that reports availability. However, an array of target strategies can also be injected when instantiating a Document. In this way the class is extensible without requiring modification or injection.

A plethora of ways to instantiate a Ruby object

Ruby is a very flexible language and there are many ways to instantiate an object. There are pros and cons for each making them more or less appropriate in various use cases. Consider the task of defining a Paragraph class to track the style of a paragraph DOM element. A simple class definition and usage pattern might look like this

class Paragraph

  attr_accessor :font, :size, :weight, :justification

end

p = Paragraph.new
p.font = 'Times'
p.size = 14
p.weight = 300
p.justification = 'right'

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 14, 300, right

The instantiated object uses instance variables to maintain the state and defines public getter and setter methods that allow you to update the paragraph style at any time. This is a very flexible approach, but it does not enforce a complete style definition. You might run into problems if a consumer requires such and does not appropriately handle properties with nil values. To address this concern, it is not unusual to enforce completeness by setting up all state upon instantiating the object through the use of an initializer.

class Paragraph

  def initialize(font, size, weight, justification)
    @font = font
    @size = size
    @weight = weight
    @justification = justification
  end

end

p = Paragraph.new('Times', 14, 300, 'right')

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 14, 300, right

In this example, Ruby will check the number of parameters passed to the initialize method against its arity, which insures that all the the style attributes are set upon instantiation. However, this approach is already becoming unwieldy due to the number of parameters, the strict parameter ordering requirement, and the need to memorize the ordering of the parameters. A Ruby idiom that addresses these concerns passes a single hash to the initialize method. For example

class Paragraph

  def initialize(style)
    @font = style.fetch(:font, 'Helvetica')
    @size = style.fetch(:size, 12)
    @weight = style.fetch(:weight, 200)
    @justification = style.fetch(:justification, 'right')
  end

end

p = Paragraph.new(font: 'Times', weight: 300)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 12, 300, right

This approach reduces the cognitive load on the developer by allowing the attributes to be set with an unordered list of key/value pairs. It also minimizes the number of pairs required by setting reasonable defaults for each style attribute.

Alternatively, in Ruby 2.1, we can take advantage of Keyword Arguments to clarify the method signature.

class Paragraph

  def initialize(font: 'Helvetica',
                 size: 12,
                 weight: 200,
                 justification: 'right')

    %w{font size weight justification}.each do |attribute|
      eval "@#{attribute} = #{attribute}"
    end

  end

end

p = Paragraph.new(font: 'Times', weight: 300)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 12, 300, right

Here the method parameters and their defaults are captured in the method signature instead of being buried in the method definition. This could improve the usability of class, especially if an automated documentation system is in use.

Sometimes, you may want to encourage a more declarative instantiation. Enlisting the use of a meaningfully named Struct to capture object state can help achieve this. For example

class Paragraph

  Style = Struct.new :font, :size, :weight, :justification

  def style
    @style ||= Style.new('Helvetica', 12, 200, 'right')
  end

  def initialize(&block)
    yield style
  end

end

p = Paragraph.new do |style|
  style.font = 'Times'
  style.size = 16
  style.weight = 300
end

puts "#{p.style.font}, #{p.style.size}, #{p.style.weight}, #{p.style.justification}"
# => Times, 16, 300, right

While not much different from the first example (using only attribute accessors), the usage makes it clear that these are style attributes which are being initialized. If the style method is made private, Paragraph becomes immutable, which may be advantageous in some cases.

Taking this one step further, a custom Domain Specific Language (DSL) can be created to achieve a more human readable interface.

class Paragraph

  Style = Struct.new :font, :size, :weight, :justification

  def style
    @style ||= Style.new('Helvetica', 12, 200, 'right')
  end

  def initialize &block
    instance_eval &block
  end

  def write(parameters)
    style.font = parameters.fetch(:using, 'Helvetica')
    style.size = parameters.fetch(:at, 12)
  end

end

p = Paragraph.new do
  write using: 'Times', at: 14
end

puts "#{p.style.font}, #{p.style.size}, #{p.style.weight}, #{p.style.justification}"
# => Times, 14, 200, right

Sometimes we don't have control over how an object is instantiated. The class might be defined in a third party library or already in use in our own code, making it difficult to change. In such a case, we can use the Builder pattern by defining a class that creates objects for us. In this way, we can create an interface of our own choosing. For example, let us imagine that the Paragraph class is defined as follows

class Paragraph

  def initialize(font, size, weight, justification)
    @font = font
    @size = size
    @weight = weight
    @justification = justification
  end

end

and cannot be altered. We can define a Builder class that creates Paragraph objects for us, but allows us to set the style attributes in a block.

require 'ostruct'

class Builder

  def self.configure(klass, &block)
    return unless block_given?
    struct = OpenStruct.new
    struct.instance_eval &block
    defaults[klass] = struct.to_h
  end

  def self.create(klass, &block)
    struct = OpenStruct.new defaults[klass]
    struct.instance_eval &block if block_given?
    parameters = defaults[klass].keys.map{ |k| struct[k] }
    klass.new(*parameters)
  end

  private

    def self.defaults
      @@defaults ||= {}
    end

end

With this in place, we can set sensible defaults, which are tracked by the Builder. The pre-existing Paragraph class has no defaults.

Builder.configure(Paragraph) do
  self.font = 'Helvetica'
  self.size = 14
  self.weight = 200
  self.justification = 'right'
end

We can then see that when a Paragraph is created, it reflects those defaults.

p = Builder.create(Paragraph)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Helvetica, 14, 200, right

and that those defaults can be overridden at creation time.

p = Builder.create(Paragraph) do
  self.font = 'Times'
  self.size = 16
end

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 16, 200, right

Thus, with relatively little extra work and no impact on the existing paragraph class, we can improve the way in which we instantiate Paragraph objects, adding features such as the ability to have default attribute values.

Simple internal DSLs in Ruby

It seem that creating a Doman Specific Language (DSL) is both considered all the rage and an overused scourge. In Ruby, it is really easy to create one, and I suspect that is why they are a popular tool for Rubyists. Although I've used many DSLs I have never have built one of my own. I have always had the desire to write my own programming language but am very daunted by the difficulty of crafting an elegant language that does not break down for all but the simplest cases, let alone writing an efficient language parser.

Anyway, if we focus on writing an Internal DSL, one which is built in and leverages a core language, we can accomplish this in Ruby with a simple instance_eval.

module DSL
  def self.enable(klass, &block)
    container = klass.new
    container.instance_eval(&block)
  end
end

Here I create a DSL module with a single enable method that accepts a class that defines the DSL methods and a block of code. A new instance of the class specifying the DSL is created and the block that is passed in is evaluated in the context of the class, thus making the DSL methods available within the block.

If we wanted to create a DSL for a pseudo reverse Polish notation (RPN) calculator, we would simply define a class with methods that define the operations in the language. For example:

class Calculator

  def initialize
    self.stack = []
  end

  def push value
    stack.push value
  end

  def add
    calculate { stack.pop + stack.pop }
  end

  def subtract
    calculate { stack.pop - stack.pop }
  end

  def multiply
    calculate { stack.pop * stack.pop }
  end

  def divide
    calculate do
      a = stack.pop
      b = stack.pop
      b / a
    end
  end

  private

    attr_accessor :stack

    def calculate &block
      result = block.call
      stack.push result
      return result
    end

end

Then using the DSL is as simple as calling DSL.enable with the Calculator class and a block of RPN as shown in the following RSpec tests. Note that the result of the RPN operations are given as the output of the call to DSL.enable.

describe 'Calculator' do

  it 'should add two numbers' do

    result = DSL.enable Calculator do
      push 1
      push 2
      add
    end

    expect(result).to eq(3)

  end

  it 'should divide two numbers' do

    result = DSL.enable Calculator do
      push 6
      push 2
      divide
    end

    expect(result).to eq(3)

  end

  it 'should handle multiple operations' do

    result = DSL.enable Calculator do
      push 3
      push 6
      push 2
      divide
      multiply
    end

    expect(result).to eq(9)

  end

end

Not only does implementing the DSL in this way provide access to the operators, but it can also hold state by way of instance variables (stack in this example).

Review of Metaprogramming Ruby 2

I recently read Metaprogramming Ruby 2 and gave an overview presentation to the PeopleAdmin engineering team. While this book is listed as an advanced text for Ruby developers, it contains an extensive explanation of an important part of Ruby, the Object Model. While the book covers the Object Model as a lead-in to discussing metaprogramming, I believe this explanation would be useful for any developer except those just learning Ruby.

The first part of the book covers various aspects of the Object Model including the organization of classes and modules, the ins and outs of methods, blocks and procs, as well as the process of method and constant lookup. This is done through an easy to read story of two developers pairing to solve a series of programming problems which serves nicely to present code examples that demonstrate the target concepts.

The second part of the book tells three stories that demonstrate the pros and cons of metaprogramming in practice. The first centers around the way in which ActiveRecord developers leveraged metaprogramming to make the ActiveRecord API elegant and to incrementally improve the performance of the library. Any developer who has worked in Rails will find this retrospective discussing the evolution of ActiveRecord from 1.0 to 4.0 interesting in its own right. The other stories focus on ActiveSupport Concerns and the use and abuse of the alias_method_chain method in the context of Rails.

Metaprogramming is considered a dirty word in many circles. Although these techniques are very powerful, that power can, without care, come with serious costs. The author does a good job of not only arguing that metaprogramming is a tool that should be in a Rubyist's tool belt, but also provides appropriate usage patterns and makes the reader aware of best practices that help to minimize these costs.

This is one of the things that made me glad to have read through this book. The author names each metaprogramming patterns (he calls them spells) and collects them into an appendix (a grimoire). Although I have used many of these metaprogramming approaches in various projects, I did extract and name them. Now that I see them as patterns, I suspect it will be easier to reach into the tool belt and apply them to future problems.

Generating Descriptive Statistics in Ruby and Rails

The core Ruby libraries do not provide an easy way to calculate simple descriptive statistics on collections of numbers. However, this can be easily achieved using the DescriptiveStatistics Gem. First, start by installing the gem gem install descriptive_statistics. Then, once you require DescriptiveStatistics, all objects that extend Enumerable will begin to respond to the new statistical methods. For example

require 'descriptive_statistics'
data = [2,6,9,3,5,1,8,3,6,9,2]
data.number # => 11.0
data.sum # => 54.0
data.mean # => 4.909090909090909

data = {a: 1, b: 2, c: 3, d:4, e: 5}
data.mean #=> 3.0
data.variance #=> 2.0

require 'set'
data= Set.new([1,2,3,4,5])
data.median #=> 3.0
data.standard_deviation #=> 1.4142135623730951

data = Range.new(1,5)
data.sum #=> 15.0
data.mean #=> 3.0

Statistical methods also accept blocks, which can be used to make calculations on individual attributes of objects in a collection. For example

require 'descriptive_statistics'
LineItem = Struct.new(:price, :quantity)
cart = [ LineItem.new(2.50, 2), LineItem.new(5.10, 9), LineItem.new(4.00, 5) ]
total_items = cart.sum(&:quantity) # => 16.0
total_price = cart.sum{ |i| i.price * i.quantity } # => 70.9

DescriptiveStatistics can be used with Ruby on Rails but some care must be taken. The ActiveSupport library, which is required by Ruby on Rails, extends the Ruby core with a number of useful additional methods. One of these methods sum conflicts with that provided by DescriptiveStatistics.

To use DescriptiveStatistics with Ruby on Rails, you will need to use one of the safe methods described in the Readme which do not monkey patch the Enumerable module. The simplest method is to use the module methods directly. First, add DescriptiveStatistics to your Gemfile, requiring the safe extension.

source 'https://rubygems.org'

gem 'rails', '4.1.7'
gem 'descriptive_statistics', '~> 2.4.0', :require => 'descriptive_statistics/safe'

Then after a bundle install, the DescriptiveStatistics module methods will be available to operate on collections of objects, including ActiveRecord objects.

DescriptiveStatistics.mean([1,2,3]) # => 2.0
DescriptiveStatistics.mean(User.all, &:age) => 19.428571428571427

Alternatively, you can extend DescriptiveStatistics on an individual collection and call the methods as needed.

users = User.all.extend(DescriptiveStatistics)
mean_age = users.mean(&:age) # => 19.428571428571427
mean_age_in_dog_years = users.mean { |user| user.age / 7.0 } # => 2.7755102040816326

This approach will superseed ActiveSupport defined methods only on the extended collection and avoid any potential conflicts on other collections where the ActiveSupport methods will still be available.