Do it with style

Writing code that follows established style guides, using programming idioms and heeding established best practices as part of the development process can significantly improve the developer workflow and the quality of the resulting software.


Odds are that much much more time will be spent by your colleagues reading your code than time spent by you writing that code.

Consider that not only must your code be reviewed and tested by your colleagues, it will almost certainly be modified, multiple times, in the future by other developers, possibly long after you have stopped working on it.

Furthermore, as new developers join the team, they will likely read your code just to become familiar with its operation.

If you write a library that gets open sourced, your code may be read by hundreds or even thousands of other developers.

When you write code, you can save your company many more man hours by making your code easy to read and understand, than you can by writing code as fast as possible.

You can make your code easy to read and understand by following the company style, using common idioms in the programming language, and following established best practices.

Do it with style

Programming style is a set of rules established and agreed upon by a team of developers that serves to constrain the syntax of the code that is developed by that team.

When the entire team follows the style set out in an agreed upon style guide, developers immediately know how to write code, and can more easily read that code.

Good style, in some cases, can be subjective. In other cases, good style has been determined over time through experience.

What is important is that a team chooses a single style and follows that style.

Use idioms

A programming idiom is a commonly used way to write a simple block of code that accomplishes a single function.

These recurring patterns in code can be easily recognized and reused.

When idioms are used as building blocks of more complex functionality, the complexity is more easily understood.

Some languages like Ruby support diversity and flexibility as an explicit goal, whereas other languages such as JavaScript are under constant evolution and as a result support many ways to achieve the same goal.

Although this flexibility can be powerful in some circumstances, to make your code more easy to understand, it is best to stick with standard idioms unless required to do otherwise.

Learning to recognize and use idioms in a language requires that you spend as much time reading and analyzing code written by other developers as you do writing your own code.

If you take the time to learn these idioms, you will find that your code is not only more easy to understand, but you will more easily write code using these patterns.

Follow best practices

Best practices are a set of guidelines that have been developed over time and justified based on experience that aim to improve the quality, enhance maintainability, minimize complexity and improve extensibility of code.

Best practices set out the best way to develop software among many possible ways, and often only become obvious when the complexity of the code reaches a certain level.

In some cases, following best practices requires writing more code and taking more time to develop software than might otherwise be required.

Because of this, it is easy to fall into the trap of writing code in way that ignores these best practices, especially when writing code under time pressure.

However, your efforts to follow these best practices, even in the early stages of development or when implementing simple functionality will become automatic and no longer slow you down.

Put it into practice

As developers we have too many things to remember and writing functional code is already challenging as it is.

Thus it is not always easy for any of us to keep good style, common idioms and best practices in mind when writing code.

So, when you review code in pull requests, please take the time to think about these factors and make helpful suggestions based on them to the author.

Adapted from a blog post of mine on the Square Root internal engineering blog

Buy or Build: Third-party code evaluation

When tasked with implementing a new feature of any significant size, you are often faced with the decision of whether to use a library from a third party, or write your own solution. Here are a few factors that you should consider when making your choice.

The State of Development

  • How long has the library existed?
  • Is the library actively developed?
  • How often are new versions of the library released?
  • Has the library had a stable release?
  • Are breaking changes made to the library frequently?
  • Does the project follow semantic versioning?
  • Is there a published road map for future work with dates?
  • Are there pull requests that are frequently merged, or left unmerged?
  • Are there outstanding Github issues?

Library Dependencies

  • How many dependencies does the library have?
  • Are the dependencies up to date?
  • Are the required libraries of equal quality?

Quality

  • How well written is the library?
  • Does the library follow an established style guide?
  • Are code quality metrics available?
  • What is the level of test coverage?
  • Does the library have any known bugs?
  • Does the library follow industry standards?

Technology

  • Is the library based on a sound algorithmic approach?
  • What are the CPU, memory and network demands?
  • Is the approach novel, untested or well established?
  • Are there any potential security risks that could be introduced by the library?

Documentation / Ease of Use

  • Does the library have documentation, or will you be required to read the code to use it?
  • Are there learning resources such as books, training videos, or blog posts that can help you learn to use the library?
  • How long will it take you to learn the library's API?
  • Are there people on your team already familiar with the library?

Popularity / Community

  • What is the reputation of the authors for writing software?
  • How many contributors are there on the project?
  • What is the leadership model of the development team?
  • If the library is hosted on Github, how many stars and forks does it have?
  • Is the project company sponsored or are contributors paid to work on the library?
  • Is this library used in production by a companies?

Your Use Case

  • Does the library meet all the requirements?
  • Does the library provide more functionality than you will need?
  • Is the implementation efficient enough to handle your needs now, and into the future?
  • Can the library support extension in the future?
  • Are there switching costs that would prohibit moving to another library if needed?
  • Is there a license that enables reuse?
  • Does the library have licensing fees?

Adapted from a blog post of mine on the Square Root internal engineering blog

The code is only one half of a commit

Taking the time to add meaningful commit messages along with your code can create a lot of value for your team. These commit messages will help other developers review, understand and extend your code.


While either debugging existing code or adding new features that must integrate with existing code, I have often found it useful to review the Git history. Git blame gives us the commit SHA and author for each line of code in a file. Selecting the commit SHA of a relevant line of code, and then using Git log to show the related commit history can provide very useful context for the code by way of the commit messages. Ideally, the history is full of descriptive and relevant commit messages that help you to understand the motivation for the changes made to the code.

A clean commit history and concise commit messages can help other developers understand, debug and extend code.

Achieving this benefit requires an extra step in your workflow. Because it makes sense to focus on getting the code right first, I suggest that the final step in your workflow before opening a pull request is to update your commit history by rewriting your commit messages and squashing unnecessary commits.

To revise commit history there are a number of Git tools which can be useful. Git commit with the --amend flag will allow you to revise the files and commit message included in the last commit. Git rebase with the --interactive flag will, among other things, allow you to choose individual commit messages to update, as well as allow you to combine (squash) or split existing commits.

Because these tools recreate the commit history of your branch, if you have already shared the branch with a shared repository such as Github, you will need to use Git push with the --force-with-lease flag. This will overwrite your previous history on the shared repository making it available for other developers to review and use. The --force-with-lease flag helps to assure you don't accidentally overwrite someone else's commits on a shared branch. It is a best practice to only push with force to a branch that you alone are working on.

Using Git intentionally with the goal of creating clear and meaningful commit messages can be extremely useful for any developers working with your code in the future. While, like any form of code documentation, crafting it can take time, neglecting to do so will build up technical debt. Accruing this debt will gain time savings now, but will make working with your code harder and slower in the future.

The de facto standard for formatting Git commit messages is given by Tim Pope:

Capitalized, short (50 chars or less) summary

More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.

Write your commit message in the imperative: "Fix bug" and not "Fixed bug" or "Fixes bug." This convention matches up with commit messages generated by commands like git merge and git revert.

It is important to not only explain the changes being made to the code but also the motivation for these particular changes. This is your opportunity as the code author to explain why you chose to solve the problem the way you did to all future developers. If you answer the following questions for each of your commits, your code will be easier understand, debug and extend in the future.

  • Why is this change necessary?
  • How does this change address the issue?
  • What are the side effects of this change?

Adapted from a blog post of mine on the Square Root internal engineering blog

Enterprise Failure

Developing enterprise software is a complicated and error prone activity. Expect failures to occur. As developers, the single most important thing we can do to minimize failure it to always be reducing the complexity of our software.


Being an enterprise software developer is hard. Sometimes it feels like we are placing each new feature on a leaning Jenga tower that is about to tip over and fall to the ground.

Weekly, we are tasked with adding new features to a complex system built by other people, of various skill levels, over a period of many years, always under a time constraint.

Paralyzed in our fear of breaking something, we submit the smallest possible code update that will get that feature working.

The problem with this approach is that just like in Jenga, it gets harder and harder and at some point impossible to add new features without something breaking.

Why do we do it anyway? Because, if we break something now, we know full well that accusatory eyes from around the company are on us as the responsible party.

This is wrong thinking.

Enterprise software is complex. Failures will occur. This is expected.

It is no single person's fault when something breaks; it is an Enterprise failure.

Every person in the company is responsible - developers, testers, architects, managers, vice presidents and chief executives.

However, every person has a role in preventing failure. Our role, as developers, is not simply to add new features. Our role is also to build failure resistant software.

The single most valuable thing we can do to make our software resist failure is to reduce complexity. Complex software is hard to understand, modify, extend, debug and test.

Each and every time you submit code, the overall code base should be less complex, not more complex. This is not easy to achieve. It is a constant battle. It is the crux of our job.

Through constant refactoring, conscientious design, and the application of established best practices, we can reduce complexity to do our part to reduce failures.

Don't fear failures, fear the complexity that leads to failure.

Adapted from a blog post of mine on the Square Root internal engineering blog

What did you learn today?

Being a good software developer requires you to continuously learn new languages, skills, techniques and tools. Here are a few tips to get you started and keep you learning.


Branch out beyond your existing core skills

You most certainly know how to do some things very well.

You have a programming language that you know by heart. There is a framework you always use to build websites. You have your environment and tools setup exactly the way you like.

You are productive. Being productive makes you happy. It provides a sense of accomplishment. It pays the bills.

As humans, we are naturally biased to use the tools that we know well in order to solve a problem at hand. We do this because it is immediately rewarding.

On the other hand, learning is often hard, confusing and time consuming. If anything, we are naturally discouraged from learning.

Because technology rapidly changes, the value of your existing knowledge rapidly declines.

The only way to deal with this fact is to always be putting yourself in a position that requires you to learn and use new skills.

If you do this enough, you will soon find that the act of mastering a new skill and the accompanying new found productivity will become a preferable reward.

Learn everything you can through deep immersion

The fastest way to learn a new spoken language is to go to a country where that language is spoken and learn while immersed within every aspect of the culture.

The same holds true for learning a new programming language. The fastest way to learn is to immerse yourself in the relevant programming community.

You can do that by:

* reading blog posts and books
* watching videos and listening to podcasts
* attending or speaking at user groups and conferences,
* connecting with other developers over Twitter and IRC
* reading and writing code for open sources projects
* joining a team at work already using the language on an existing project

It can be counter-intuitive and even scary to jump right into the deep end of a pool, but it really is the quickest way to learn to swim.

Copy code that other developers have written

Copy-and-Paste programmers are often derided, but every good developer has well worn copy and paste keys.

It can be really challenging to write code from scratch when using a new technology.

Instead, it is better to copy working code written by other developers, even if you do not fully understand the code.

Having a working example will allow you to tinker, test, and deconstruct the code until you can understand each piece.

While you should not copy and paste code directly for production use, it is a great way to write prototype code and learn new programming patterns.

Read, write, read, write, read, write

When you begin to learn a new language, you will most likely spend most of your time reading code.

When you begin to become proficient in using the language, you will most likely spend most of your time writing code.

Both of these situations are common anti-patterns.

You can greatly accelerate your learning by writing code that utilizes the patterns and techniques that you read about, right after reading about them.

In order to continue learning after becoming proficient, you need to read code other people have written in order to find new patterns and techniques to use in your own code.

You should always be reading as much code as you write and vice versa.

Show off your code, get feedback, revise

It is a well known psychological bias that when we are faced with a decision, we more strongly believe one choice is correct, after we have made that choice, simply because we made the choice.

Writing code involves a lot of small decisions. These facts together represents a significant obstacle to improving the code we write.

With each decision you make, you become more and more confident in the outcome, and can often come to the point of being unable to see any other way to write the code.

Because we can so easily become blind to our own code, we must get other developers to review our code and suggest alternative approaches and improvements that might be hard for us to otherwise see.

Authors never write books without editors and rewrites. Developers never write code without reviewers and revisions.

Teach in order to master

The best way to master a new technology is to find an opportunity to teach someone that technology.

There are many ways to do this, some easy, and some more involved:

* telling a colleague
* giving a demonstration
* writing a blog post
* writing documentation
* giving a presentation
* organizing a workshop
* pair programming
* code reviews

Because each of these activities will require you to spend as much time organizing your own thoughts as time spent actually teaching, you are sure to benefit as much as your pupils.

Adapted from a blog post of mine on the Square Root internal engineering blog

Dynamic matchers in RSpec

RSpec has a neat feature that can improve the readability of your tests called Dynamic Predicate Matchers. These are matchers that are created on the fly for the particular class under test. Consider the following simple class:

class Foo

  attr_accessor :bar, :baz

  def valid?
    bar == true && baz == false
  end

end

The valid? method is a predicate method. Predicate methods are, by convention, methods that end with a question mark and return a boolean. They are frequently methods that report on the internal state of an object. In a test, RSpec will automatically generate matchers that leverage those predicate methods. Here are two example tests that use a dynamically generated matcher be_valid:

RSpec.describe Foo do

  it "is valid when bar is true" do
    foo = described_class.new
    foo.bar = true
    foo.baz = false
    expect(foo).to be_valid
  end

  it "is invalid when bar is false" do
    foo = described_class.new
    foo.bar = false
    foo.baz = false
    expect(foo).to_not be_valid
  end

end

The expectations read much more like English. Compare

expect(foo).to be_valid

to the alternative

expect(foo.valid?).to be true
Hasten the import of large tables into mySQL

You may find that someday you are working on a production application, and you want to do some testing on your local environment using production data. Furthermore, you may find that this application has a very large mySQL database, with tables that have many millions of rows. So, you export that database file from your production environment into a SQL file using mysqldump and copy it to your local computer. However, when you go to import that database into mySQL like so cat database_dump.sql | mysql -uroot it takes many hours to import.

This is not unusual for large database, but there may be something that can be easily done to significantly cut down on the import time. Now I am neither a DBA, nor a mySQL wizard, and so with all that follows: buyer beware. It seems from some research online, that there is only one true answer on how to optimize mySQL: It depends. Not only that. It depends on many, many things including, mySQL configuration, available memory, usage patterns, the operating and file system, schema design, table size, and even possibly what you had for lunch the day prior.

Now let's say that you are working with a database which contains a large, heavily indexed innoDB table. If you inspect that SQL dump, you will find a CREATE TABLE statement. It might look something like this:

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `address_detail_id` int(11) NOT NULL,
  `billing_detail_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name` (`id`,`name`),
  KEY `address` (`address_detail_id`),
  KEY `billing` (`billing_detail_id`),
  KEY `foreign` (`id`,`address_detail_id`,`billing_detail_id`),
  KEY `covering` (`id`,`name`,`address_detail_id`,`billing_detail_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Shortly thereafter will be a large number of INSERT INTO statements pumping the data into your newly created table. Note that this table has a number of KEY attributes, indicating the presence of indexes. Indexes can speed up select queries, but they do so by trading storage space for speed. You might simply think of an index as a presorted version of the data in your table that makes it easier for mySQL to find a particular piece when searching for it.

When you insert data into an indexed table, mySQL must not only store your data, but also must sort it. It turns out that for small innoDB tables, this can be done efficiently to the point that it does not significantly impact the time to insert data. Now apparently, or rather reportedly, inserting data into indexed MyISAM tables, even smaller ones, is slower that inserting them into a table without those indexes. Thus, this explains why you will fined statements like these in your SQL file surrounding the insert statements for any given table:

/*!40000 ALTER TABLE `users` DISABLE KEYS */;
...
/*!40000 ALTER TABLE `users` ENABLE KEYS */;

It signals to mySQL to disable the indexes while inserting data, for speed. The indexes are built when they are re-enabled. Note that those commands are not commented out, but rather that is the syntax for conditional execution of commands depending on the server version. Apparently this command was introduced in MySQL 4.0.0.

Now for whatever reason, this command does not disable indexing on innoDB tables, even though it is included in the dump. Possibly that is because innoDB is quite good at inserting data into indexed tables. However, for large innoDB tables, it does not insert data as rapidly into indexed tables as compared to inserting data into unindexed tables.

So, I ran a benchmark on my local Mac OS X (10.10) environment with a default MySQL 5.6.22 install. The benchmark (1) created an indexed innoDB table (2) inserted varying amounts of generated data (3) exported the data using mysqldump (4) measured the time to import that SQL dump. As can be seen in the plot below in red, at around 3M rows, the rate of inserts dropped significantly.

Comparison Chart

At this point in time, I really don't know what is causing the slowdown. From my light research, I might hazard a guess that the memory requirements of the indexing algorithm grow beyond the available resources. So, the exact position of the knee and amount of slowdown likely depends on the configuration and hardware. Here is the benchmarking code if you would like to try it out.

To speedup the import process for large indexed innoDB tables, I created a tool called Hasten. This tool alters a SQL dump so that it will import faster. It does this by removing the indexes from all table definitions and then adding the indexes back at the end of the import. If you review the plot above, you will see that there is a dramatic reduction in import time for large tables. Hasten is written in Ruby and if you have Ruby on your system you only need to install the gem

gem install hasten

and then insert Hasten into your import command like so

cat DUMPFILE | hasten | mysql -uUSER -pPASSWORD DATABASE
Grep your bash history commands and execute one

While working in the Bash shell it is common to want to repeat a command that you have recently executed. Bash keeps a history of executed commands in a history file .bash_history that you can access by simply typing history.

> history
1  ls
2  cd ~
3  ls .*
4  cat .bash_history
5  history

This will output a list of commands prefixed with an identification number. If you only want to see the last N entries in the history, type history N.

> history 4
3  ls .*
4  cat .bash_history
5  history
6  history 4

To execute a command from your history, you can use the history expansion ! followed by the identification number.

> !4
cat .bash_history

Note that the !4 expands to cat .bash_history which is echoed to the terminal before being executed. You can also use !! as a shortcut for executing the last command. This avoids having to type the identification number, which is often more than one character, depending on the length of your history.

A more convenient method of executing a command is to use the ! expansion followed by a matching string. For example:

> !cat
cat .bash_history

executes that last command to begin with cat. Note that the matching string cannot contain any spaces.

You can get a lot of mileage out of these expansions, but you may run into a couple problems. First, your history will grow. Reviewing all those entries for the one you want can be tedious, especially given that there will be many duplicate commands. Second, the identification numbers will get longer and less convenient to type.

To solve the first problem, you can pipe the output of history to grep so that you only review only those commands that match a pattern. For example:

history | grep mplayer

will show all the previous incantations of mplayer. A convenient alias that you can add to your .bashrc file (located in your home directory) is:

alias gh='history | grep '

which will shorten the previous command to:

gh mplayer

This is quite useful, but you will note that there are still duplicate entries and the numbers are not necessarily consecutive. To address these problems, I have created a shell function that will return a list of the top ten commands matching a specified pattern and make it very easy to execute one of them. For example:

> ghf brew
1 brew install rcm
2 brew install karabiner
3 brew install z
4 brew install wget mplayer
5 brew install wget --with-iri
6 brew install wget
7 brew install pv
8 brew install phantomjs
9 brew install mplayer
10 brew install imagemagick

and then I can use the !! shell expansion to choose one of the 10 commands to execute:

> !! 5
brew install wget --with-iri

Note the space between the !! and the identification number.

Here is the full text of the ghf function, which can be added to your .bashrc file so that ghf is available in your shell. I hope you find it useful!

# ghf - [G]rep [H]istory [F]or top ten commands and execute one
# usage:
#  Most frequent command in recent history
#   ghf
#  Most frequent instances of {command} in all history
#   ghf {command}
#  Execute {command-number} after a call to ghf
#   !! {command-number}
function latest-history { history | tail -n 50 ; }
function grepped-history { history | grep "$1" ; }
function chop-first-column { awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}' ; }
function add-line-numbers { awk '{print NR " " $0}' ; }
function top-ten { sort | uniq -c | sort -r | head -n 10 ; }
function unique-history { chop-first-column | top-ten | chop-first-column | add-line-numbers ; }
function ghf {
  if [ $# -eq 0 ]; then latest-history | unique-history; fi
  if [ $# -eq 1 ]; then grepped-history "$1" | unique-history; fi
  if [ $# -eq 2 ]; then
    `grepped-history "$1" | unique-history | grep ^$2 | chop-first-column`;
  fi
}
Useful Sublime Text 3 Packages for a Rubyist

Sublime Text is an extensible editor. To maximize my productivity, I have found that using Packages and organizing a set of custom key binding has been very important. There are a large number of Packages available that add support for code highlighting and snippets in various languages, graphical theming, linting, autocompletion, and custom build tools. These can all be found on Package Control, a web directory of Packages. Here is a list of my most used Packages and the custom key bindings that I have setup to utilize them.

Origami

Origami is a package that augments functionality around creating and manipulating panes. It provides a set of commands to create, move to, size and zoom them. I already have a set of key bindings memorized for pane navigation in Tmux and so I've set them up similarly in Sublime:

{ "keys": ["ctrl+b", "p"], "command": "prev_view" },
{ "keys": ["ctrl+b", "n"], "command": "next_view" },
{ "keys": ["ctrl+b", "o"], "command": "focus_neighboring_group" },
{ "keys": ["ctrl+b", "z"], "command": "zoom_pane", "args": {"fraction": 0.8} },
{ "keys": ["ctrl+b", "s"], "command": "resize_pane", "args": {"orientation": "cols"} },

{ "keys": ["ctrl+b", "c"], "command": "create_pane_with_file", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "x"], "command": "destroy_pane", "args": {"direction": "self"} },

{ "keys": ["ctrl+b", "up"], "command": "travel_to_pane", "args": {"direction": "up"} },
{ "keys": ["ctrl+b", "right"], "command": "travel_to_pane", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "down"], "command": "travel_to_pane", "args": {"direction": "down"} },
{ "keys": ["ctrl+b", "left"], "command": "travel_to_pane", "args": {"direction": "left"} },

{ "keys": ["ctrl+b", "shift+up"], "command": "carry_file_to_pane", "args": {"direction": "up"} },
{ "keys": ["ctrl+b", "shift+right"], "command": "carry_file_to_pane", "args": {"direction": "right"} },
{ "keys": ["ctrl+b", "shift+down"], "command": "carry_file_to_pane", "args": {"direction": "down"} },
{ "keys": ["ctrl+b", "shift+left"], "command": "carry_file_to_pane", "args": {"direction": "left"} },

Github Tools

Github Tools is a package that provides commands to interact with the Github repository that you are editing in Sublime. I find myself frequently needing to share references to code with colleagues. Github tools makes it easy to generate a URL on Github to code you have selected in Sublime. It also provides some useful commands to create, edit, and load Gists directly in Sublime. I group all my Github commands behind a single meta key ctrl+g in the style of Tmux:

{ "keys": ["ctrl+g", "g"], "command": "public_gist_from_selection" },
{ "keys": ["ctrl+g", "p"], "command": "private_gist_from_selection" },
{ "keys": ["ctrl+g", "o"], "command": "open_gist_in_editor" },
{ "keys": ["ctrl+g", "w"], "command": "open_gist_in_browser" },
{ "keys": ["ctrl+g", "v"], "command": "open_remote_url" },
{ "keys": ["ctrl+g", "c"], "command": "copy_remote_url" },
{ "keys": ["ctrl+g", "b"], "command": "blame" },
{ "keys": ["ctrl+g", "h"], "command": "history" },

CTags

CTags provides a way to easily generate, navigate and search an index of language objects found in your active Sublime project. This is most useful for navigating directly to function or constant definitions in files. This Package requires that you install and configure a tag generation tool. The default setup is configured for Exuberant CTags, but I use Ripper Tags for Ruby and configure it as follows using RVM:

{ "command": "source $HOME/.bashrc && rvm-auto-ruby -S ripper-tags" }

and setup key bindings behind the meta key ctrl+t.

{ "keys": ["ctrl+t", "t"], "command": "navigate_to_definition" },
{ "keys": ["ctrl+t", "f"], "command": "search_for_definition" },
{ "keys": ["ctrl+t", "r"], "command": "rebuild_tags" },

Shell Commands

Shell Command is a package that allows you to execute arbitrary commands in a shell and place that output in a scratch buffer (rather than a panel) making it easily viewable. In its most flexible usage, you simply type the command in a pop-up window. After the output has been generated in a scratch buffer, you can rerun the command in the same window with a context specific key binding. I have setup the key bindings behind the meta key ctrl+c:

{ "keys": ["ctrl+c", "c"], "command": "shell_command" },
{
  "keys": ["c"],
  "command": "shell_command_refresh",
  "context": [{ "key": "setting.ShellCommand" }]
},

By default the shell does not include your shell configuration. So in order to use commands such as Bundle or Rake, I have setup a custom key binding to allow me to run commands with my configured version of Ruby through RVM:

{
  "keys": ["ctrl+c", "r"],
  "command": "shell_command",
  "args": {
    "command_prefix": "source $HOME/.bashrc && rvm-auto-ruby -S",
    "prompt": "Shell Command"
  }
},

The real power of Command Shell is to setup custom key bindings for your most frequently used shell commands such as viewing a process list or tailing particular logs. For example:

{ // Process list
  "keys": ["ctrl+c", "p"],
  "command": "shell_command",
  "args": {
    "command": "ps xcro user,pid,%cpu,cputime,%mem,command | head -n 28",
  }
},

will show a process list. Then, custom key bindings for the Shell Command context can be used to take action on the output of the command. For example, with the following key binding, you can kill a process by selecting the process number in the buffer and typing 'k'.

{ // Send SIGKILL to a process number selected
  // in a Shell Command Window
  "keys": ["k"],
  "command": "shell_command",
  "args": {
    "command": "kill -9",
    "region": "arg"
  },
  "context": [{ "key": "setting.ShellCommand" }]
},

There is a lot more flexibility and room for customization provided by this package, so I encourage you to check out Shell Command.

Replacement File Browser

File Browser is an excellent replacement for the default file Sublime Sidebar. In particular is adds numerous key bindings for creating and manipulating files, eliminating the need to use the mouse for directory navigation and basic file operations. Here is the key binding to open the FileBrowser at my preferred location on the left hand side:

{
  "keys": ["ctrl+d"],
  "command": "dired",
  "args": {
    "immediate": true,
    "single_pane": true,
    "other_group": "left",
    "project": true
  }
},

but it can also be setup on the right hand side:

SublimeFileBrowser Screenshot2

Web Access

I have find the following four Packages very handy for accessing web content based on content selected inside of Sublime. I have setup the key bindings behind the meta key ctrl+w:

Open URL

Open URL allows you to open your web browser to the URL highlighted in Sublime.

{ "keys": ["ctrl+w", "o"], "command": "open_url" },

Google Search

Google Search allows you to google any content highlighted in Sublime.

{ "keys": ["ctrl+w", "g"], "command": "google_search" },

Goto Documentation

Goto Documentation allows you to intelligently search for help documentation on the web using the automatically determined scope of the highlighted text in Sublime. In other words, if you are editing a Ruby file, it will search the Ruby core documentation.

{ "keys": ["ctrl+w", "h"], "command": "goto_documentation" },

HTTP Requester

HTTP Requester is an amazing package that allows you to execute arbitrary HTTP requests and to get the request response in a scratch buffer. It is very useful for interacting with APIs. It supports making requests using all the HTTP verbs, setting headers, and completing forms.

{ "keys": ["ctrl+w", "e"], "command": "http_requester" },

You can simply select a URL or a detailed response request. For example, selecting the following text in a buffer and triggering a request

POST http://posttestserver.com/post.php
Content-type: application/x-www-form-urlencoded
POST_BODY:
variable1=avalue&variable2=1234&variable3=anothervalue

will Post a form to the specified URL and return the body of the request response in a new scratch buffer with detailed response information, like so:

200 OK
Date:Wed, 31 Dec 2014 20:08:45 GMT
Server:Apache
Access-Control-Allow-Origin:*
Vary:Accept-Encoding
Content-Length:141
Content-Type:text/html

Latency: 77ms
Download time:0ms

Successfully received 3 post variables.

Rendering

Here are three packages that I use to work with Markdown and SQL.

Markdown Preview

Markdown Preview is a Package that will render a Markdown document that you are editing and open it in your browser. It supports either the Python or Github renderers. Because I primarily use Markdown to edit Markdown in Github repositories, I prefer the latter.

{
  "keys": ["ctrl+m"],
  "command": "markdown_preview",
  "args": {
    "target": "browser",
    "parser": "github"
  }
},

SQL

First, SQL Beautifier simply improves the formatting of SQL. I find it extremely useful when working with long queries taken from logs or profilers. Simply select a poorly formatted query in Sublime and trigger the formatter.

{ "keys": ["ctrl+s", "b"], "command": "sql_beautifier" },

Then, SQL Exec is a Package that allows you to execute queries selected in Sublime against a SQL database and returns those queries in a panel view. It requires a bit of tedious configuration of your database connections, but is useful for working in a relatively stable development environment. For more serious work with SQL I prefer SQL Pro.

  { "keys": ["ctrl+s", "c"], "command": "sql_list_connection" },
  { "keys": ["ctrl+s", "e"], "command": "sql_execute" },
  { "keys": ["ctrl+s", "h"], "command": "sql_history" },
  { "keys": ["ctrl+s", "q"], "command": "sql_query" },
  { "keys": ["ctrl+s", "s"], "command": "sql_show_records" },
  { "keys": ["ctrl+s", "d"], "command": "sql_desc" },

BuildView

Sublime has a convenient build system that allows you trigger (super+b) shell command to build a file or execute a test suite. The output of the build command is piped into a Sublime Panel. I prefer to have the output of a build placed into a scratch buffer instead and that is exactly the functionality that the BuildView Package provides. To use it you must override your build key binding.

{
  "keys": ["super+b"],
  "command": "build",
  "context": [{
    "key": "build_fake",
    "operator": "equal",
    "operand": true
  }]
},

Linting

I find that I am using linting in Ruby and JavaScript more and more frequently. There are various linting packages available for these languages (and other too), but I have found the following two Packages to be the best for me.

Rubocop

The Rubocop packageprovides bindings for the Rubocup static code analyzer for Ruby. You first need to install and configure Rubocup, which can take a bit of effort to get it configured for your preferred style. By default the Rubocop package automatically marks issues in your Ruby buffer, but I prefer to disable this

{
  "mark_issues_in_view": false,
}

and instead bind a key to trigger the Rubocup analysis.

{
  "keys": ["ctrl+l", "r"],
  "command": "chain",
  "args": {
    "commands": [
      ["rubocop_check_single_file"],
      ["hide_panel", {"cancel": true}]
    ]
  }
},

Normally, the Rubocup output will be piped to a Sublime Panel, but because I use BuildView, the output is piped to a scratch buffer instead. For whatever reason, it annoyingly leaves the panel open. To solve this problem, I use the Chain of Command Package to trigger a hide_panel command after triggering Rubocop.

JSLint

The JSLint Package provides linting from Douglas Crockford's JSLint Quality Tool for Javascript. It requires you to have installed and configured Node.JS on your system and for it to be in your executable path. By default, it will run each time a JavaScript file is saved. I prefer to instead disable this feature

{
    "run_on_save" : false
}

and instead bind a key to trigger the JSLint analysis.

{
  "keys": ["ctrl+l", "j"],
  "command": "chain",
  "args": {
    "commands": [
      ["jslint"],
      ["hide_panel", {"cancel": true}]
    ]
  }
},

Again, note the use of the Chain of Command Package to trigger a hide_panel command after triggering JSLint.

RSpec Testing

I most frequently use RSpec for testing and the RSpec Package provides a build system configuration, syntax highlighting, code snippets, and a useful key binding that allows you to bounce back and forth between a file and its spec file.

{ "keys": ["super+period"], "command": "open_rspec_file", "args": {} },

Key Bindings

Lastly, to learn and remember all of these key maps, I use the Keymaps Package. It provides a nice cheat sheet that summarized all of the available key bindings as well a convenient search window useful for when you have forgotten a particular key binding.

{ "keys": ["ctrl+?"], "command": "cheat_sheet" },
{ "keys": ["ctrl+/"], "command": "find_keymap" },
Hash Tricks in Ruby

Here are a few tricks for using Hashes in Ruby.

Sort your hash

As of Ruby 1.9 Hashes became ordered by default due to a change in their implementation. However, the method sort for Hashes returns an array of [key, value] pairs, likely as a hold over from when Hashes were unordered.

hash = {f: 4, a: 2, r: 1 }
hash.sort # => [[:a, 2], [:f, 4], [:r, 1]]

To sort a hash and get a hash back there are a few approaches:

Hash[hash.sort]
hash.sort.to_h # Ruby >= 2.1
hash.sort_by{ |k, v| k }.to_h # sort by key
# => {:a=>2, :f=>4, :r=>1}
hash.sort_by{ |k, v| v }.to_h # sort by value
# => {:r=>1, :a=>2, :f=>4}

Hashes all the way down

Sometimes you need to create a tree like data structure. We can take advantage of Hashes in Ruby to accomplish this elegantly. The Hash constructor accepts a default block that will be executed when the hash is accessed by a key that does not have a corresponding hash value. Take for example this identity hash, that returns the corresponding hash value for a key if the value has been set, otherwise it returns the key itself.

identity = Hash.new { |hash, key| key }
identity[:a] = 1
identity[:a] #=> 1
identity[:b] #=> :b

Going one step further in the default block, we can store the value object in the hash so that subsequent calls fetch the object from the hash instead of creating a new one each time.

identity = Hash.new { |hash, key| hash[key] = key }
value = identity[:a]
value # => :a
value.object_id # => 362728
identity[:a].object_id # => 362728

Now if instead of returning the key, we return a new hash, we have a two level tree using nested hashes.

tree = Hash.new { |hash, key| hash[key] = {} }
tree[:a] #=> {}
tree[:a][:x] = 'Foo'
tree[:a][:y] = 'Bar'
tree[:b][:x] = 'Baz'
tree[:b][:y] = 'Qux'
tree # => {
  :a => {
    :x => 'Foo',
    :y => 'Bar'
  }
  :b => {
    :x => 'Baz',
    :y => 'Qux'
  }
}

But note that the depth is limited to two levels because the nested hashes return nil for unknown keys.

tree[:a][:z][:j] # => NoMethodError: undefined method `[]' for nil:NilClass

We can address this by assuring that all hashes in the tree initialize new hashes when an unknown key is accessed. This can be accomplished by reusing the default block of the root node of the tree for each new hash that we construct. The Hash method default_proc provides us access to the default block as a Proc object. If each time we construct a new hash, we pass the default proc of the parent hash, we get a tree that grows endlessly.

teams = Hash.new { |hash, key| hash[key] = Hash.new(&hash.default_proc) }

Note that we pass the default proc as a block to the Hash constructor by converting it using the & operator. This technique allows us to construct arbitrarily sized tree structures on the fly. It is especially useful if we do not know exactly how deep the tree needs to be in advance, or if it needs to grow in size over time.

teams[:hockey][:western][:pacific] = ["sharks", "oilers"]
teams[:hockey][:western][:central] = ["blues", "stars"]
teams[:hockey][:eastern][:metropolitan] = ["penguins", "flyers"]
teams[:hockey][:eastern][:atlantic] = ["redwings", "bruins"]

teams # => {
  :hockey => {
    :western => {
      :pacific => [
        [0] "sharks",
        [1] "oilers"
      ],
      :central => [
        [0] "blues",
        [1] "stars"
      ]
    },
    :eastern => {
      :metropolitan => [
        [0] "penguins",
        [1] "flyers"
      ],
      :atlantic => [
        [0] "redwings",
        [1] "bruins"
      ]
    }
  }
}

Memoizing return values of methods with parameters

It makes sense to store the result of a costly calculation when it is likely to be needed again in the future. In the context of a class, it is a Ruby idiom to store this value in an instance variable:

class Numbers
  def pi
    @pi ||= begin
      ... costly calculations ...
    end
  end
end

This technique, called memoization, hides the fact that all calls after the first call to the method will fetch the computed value from the instance variable rather then compute the number again.

When a method takes one or more parameters, we can use the default block of a hash to achieve memoization in a way that is parameter dependent.

class Numbers
  def greatest_common_denominator(*args)
    @gcd ||= Hash.new do |hash, array|
      hash[array] = begin
        ... costly calculations ...
      end
    end
    @gcd[args.sort]
  end
end

Here, a new hash is stored in the instance variable and when the method is called, the arguments to the method, in the form of an array, are used as the key to the hash. If those particular arguments have not been previously passed to the method and thus the hash, the hash will call the default block to compute and store the value in the hash. Any subsequent calls using those parameters will fetch the previously computed value from the hash instead of computing the value again. Note that for methods where the ordering of parameters is not important, like the method in the above example, we sort the arguments before keying the hash to further reduce the number of times the calculation must be made.

String Templates

The % String operator is useful for inserting data into strings with a specifiable format. For example, formatting a floating point number

"Pi = %.5f" % Math::PI   # => "Pi = 3.14159"

or zero padding integers

"%04d" % 45 # => "0045"

Less well known is that % also accepts a Hash. Hash keys in the string that are called out with a %{} are replaced by their corresponding hash values. I call this the Madlibs feature because it creates a simple string templating system.

variables = {:animal => 'fox', :action => 'jumps'}
template = "The quick brown %{animal} %{action} over the lazy dog"
puts template % variables
# => The quick brown fox jumps over the lazy dog

Word Substitution

The gsub String method replaces text in a string. It accepts a Regex to define the match and a string to define the replacement.

quote = 'The quick brown fox jumps over the lazy dog'
puts quote.gsub(/brown/, 'red')
# => "The quick red fox jumps over the lazy dog"

This works for a single [match, replacement] pair. If we want to make multiple replacements in a string, we can take advantage of the fact that gsub can accept a replacement hash. When a match is found, the replacement is taken as the value from the hash when the match is used as a key.

By matching on any word /\w+/ and using an identity hash populated with the desired replacements, gsub provides an clean way to make an arbitrary number of word substitutions in a string.

replacements = {'dog' => 'pig', 'fox' => 'cat'}
replacements.default_proc = ->(h, k) { k }
puts quote.gsub(/\w+/, replacements)
# => "The quick brown cat jumps over the lazy pig"

Cataloging

A hash can be used to catalog objects from a collection by a given attribute. If we have a collection of objects

Book = Struct.new(:title, :author)
books = [
  Book.new('The Stand', 'Stephen King'),
  Book.new('The Shining', 'Stephen King'),
  Book.new('Green Eggs and Ham', 'Dr. Seuss'),
  Book.new('The World of Ice & Fire', 'George R. R. Martin')
]

those objects can be cataloged by building a hash of arrays, where the arrays are initialized via the default block only as needed.

def catalog(collection, by:)
  catalog = Hash.new { |hash, key| hash[key] = [] }
  collection.each_with_object(catalog) do |item, catalog|
    catalog[item.send(by)] << item
  end
end

puts catalog(books, by: :author) # =>
{
  "Stephen King"=>[
    #<struct Book title="The Stand", author="Stephen King">,
    #<struct Book title="The Shining", author="Stephen King">
  ],
  "Dr. Seuss"=>[
    #<struct Book title="Green Eggs and Ham", author="Dr. Seuss">
  ],
  "George R. R. Martin"=>[
    #<struct Book title="The World of Ice & Fire", author="George R. R. Martin">
  ]
}
Smart strategies for the strategy pattern

The Strategy Pattern can make the behavior of a class extensible without requiring modification of the class definition. Does that sound strange? Consider the following very simple example

require 'json'
require 'yaml'

class Document

  attr_accessor :body

  def initialize(body)
    self.body = body
  end

  def parse_json
    JSON.parse(body)
  end

  def parse_yaml
    YAML.load(body)
  end

end

This Document class can be used to parse both JSON and YAML content in order to create Ruby objects (hashes in this example).

doc = Document.new <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse_json #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

doc = Document.new <<EOS
---
  'a': 'one'
  'b': 'two'
  'c': 'three'
EOS
puts doc.parse_yaml #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

Now let us say that we want to add the ability to parse XML content. The current design requires the addition of a parse_xml method. One way to avoid modification of the document class would be to choose a design based on the Strategy Pattern. Instead of specifying the parsing algorithm in the class, we inject the parsing algorithm into the class. This will decouple the Document class from the parsing algorithm. In the following example, we inject a lambda that encapsulates a parsing algorithm.

require 'json'
require 'yaml'

class Document

  attr_accessor :body, :parser

  def parse
    parser.call(body)
  end

end

doc = Document.new
doc.parser = ->(body) { JSON.parse(body) }
doc.body = <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

While the Document class does not currently do much, its definition is decoupled from that of the parsing algorithm. This allows us to create other parsing strategies, for example ones that handle YAML or XML content, and to use those strategies with the Document class unmodified.

Especially when dealing with more complex algorithms, it is common to create classes to define the strategies. For example

require 'json'
require 'yaml'

class Document

  attr_accessor :body, :parser

  def parse
    parser.parse body
  end

end

class JSONStrategy

  def self.parse(body)
    JSON.parse(body)
  end

end

doc = Document.new
doc.parser = JSONStrategy
doc.body = <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

While the design has its advantages, it requires the programmer to know which strategies can be used with the Document class. Ideally, we would like to have the flexibility to extend the abilities of the Document class without modification of the definition and make the class intelligent enough to know which strategies are available and usable at any given time.

Imagine an AutoParser that can auto select an appropriate strategy (from a list of known strategies) given a particular document. The usage of the AutoParser might look like this

doc = AutoParser::Document.new <<EOS
---
  'a': 'one'
  'b': 'two'
  'c': 'three'
EOS
puts doc.strategy #=> AutoParser::Strategies::YAML
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

doc = AutoParser::Document.new <<EOS
{
  "a": "one",
  "b": "two",
  "c": "three"
}
EOS
puts doc.strategy #=> AutoParser::Strategies::JSON
puts doc.parse #=> {"a"=>"one", "b"=>"two", "c"=>"three"}

Here the YAML strategy is chosen for the YAML document and the JSON strategy for the JSON document, without the need to specify the document format in advance.

To achieve this, we move the Document class into a module called AutoParser, and place the strategy classes into a submodule called Strategies.

require 'json'
require 'yaml'

module AutoParser

  class Document

    attr_accessor :body
    attr_writer :strategies

    def initialize(body)
      self.body = body
    end

    def strategies
      @strategies || AutoParser::Strategies.to_a
    end

    def strategy
      strategies.detect{ |strategy| strategy.available?(body) }
    end

    def parse
      strategy.parse body
    end

  end

  module Strategies

    def self.to_a
      self
        .constants
        .map { |c| self.const_get c }
        .select { |o| o.is_a? Class }
    end

    class Base

      def self.parse(body)
        raise
      end

      def self.available?(body)
        !!parse(body)
      rescue
        false
      end

    end

    class JSON < Base

      def self.parse(body)
        ::JSON.parse(body)
      end

    end

    class YAML < Base

      def self.parse(body)
        ::YAML.load(body)
      end

    end

  end

end

Simultaneously, we add an available? method to each strategy class (In this case done through inheritance from a base class; the method is the same for both strategies). This method is queried by the Document class to determine if a strategy is appropriate to be used on the given Document body. All strategies in the Strategies module will be considered until one is found that reports availability. However, an array of target strategies can also be injected when instantiating a Document. In this way the class is extensible without requiring modification or injection.

A plethora of ways to instantiate a Ruby object

Ruby is a very flexible language and there are many ways to instantiate an object. There are pros and cons for each making them more or less appropriate in various use cases. Consider the task of defining a Paragraph class to track the style of a paragraph DOM element. A simple class definition and usage pattern might look like this

class Paragraph

  attr_accessor :font, :size, :weight, :justification

end

p = Paragraph.new
p.font = 'Times'
p.size = 14
p.weight = 300
p.justification = 'right'

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 14, 300, right

The instantiated object uses instance variables to maintain the state and defines public getter and setter methods that allow you to update the paragraph style at any time. This is a very flexible approach, but it does not enforce a complete style definition. You might run into problems if a consumer requires such and does not appropriately handle properties with nil values. To address this concern, it is not unusual to enforce completeness by setting up all state upon instantiating the object through the use of an initializer.

class Paragraph

  def initialize(font, size, weight, justification)
    @font = font
    @size = size
    @weight = weight
    @justification = justification
  end

end

p = Paragraph.new('Times', 14, 300, 'right')

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 14, 300, right

In this example, Ruby will check the number of parameters passed to the initialize method against its arity, which insures that all the the style attributes are set upon instantiation. However, this approach is already becoming unwieldy due to the number of parameters, the strict parameter ordering requirement, and the need to memorize the ordering of the parameters. A Ruby idiom that addresses these concerns passes a single hash to the initialize method. For example

class Paragraph

  def initialize(style)
    @font = style.fetch(:font, 'Helvetica')
    @size = style.fetch(:size, 12)
    @weight = style.fetch(:weight, 200)
    @justification = style.fetch(:justification, 'right')
  end

end

p = Paragraph.new(font: 'Times', weight: 300)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 12, 300, right

This approach reduces the cognitive load on the developer by allowing the attributes to be set with an unordered list of key/value pairs. It also minimizes the number of pairs required by setting reasonable defaults for each style attribute.

Alternatively, in Ruby 2.1, we can take advantage of Keyword Arguments to clarify the method signature.

class Paragraph

  def initialize(font: 'Helvetica',
                 size: 12,
                 weight: 200,
                 justification: 'right')

    %w{font size weight justification}.each do |attribute|
      eval "@#{attribute} = #{attribute}"
    end

  end

end

p = Paragraph.new(font: 'Times', weight: 300)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 12, 300, right

Here the method parameters and their defaults are captured in the method signature instead of being buried in the method definition. This could improve the usability of class, especially if an automated documentation system is in use.

Sometimes, you may want to encourage a more declarative instantiation. Enlisting the use of a meaningfully named Struct to capture object state can help achieve this. For example

class Paragraph

  Style = Struct.new :font, :size, :weight, :justification

  def style
    @style ||= Style.new('Helvetica', 12, 200, 'right')
  end

  def initialize(&block)
    yield style
  end

end

p = Paragraph.new do |style|
  style.font = 'Times'
  style.size = 16
  style.weight = 300
end

puts "#{p.style.font}, #{p.style.size}, #{p.style.weight}, #{p.style.justification}"
# => Times, 16, 300, right

While not much different from the first example (using only attribute accessors), the usage makes it clear that these are style attributes which are being initialized. If the style method is made private, Paragraph becomes immutable, which may be advantageous in some cases.

Taking this one step further, a custom Domain Specific Language (DSL) can be created to achieve a more human readable interface.

class Paragraph

  Style = Struct.new :font, :size, :weight, :justification

  def style
    @style ||= Style.new('Helvetica', 12, 200, 'right')
  end

  def initialize &block
    instance_eval &block
  end

  def write(parameters)
    style.font = parameters.fetch(:using, 'Helvetica')
    style.size = parameters.fetch(:at, 12)
  end

end

p = Paragraph.new do
  write using: 'Times', at: 14
end

puts "#{p.style.font}, #{p.style.size}, #{p.style.weight}, #{p.style.justification}"
# => Times, 14, 200, right

Sometimes we don't have control over how an object is instantiated. The class might be defined in a third party library or already in use in our own code, making it difficult to change. In such a case, we can use the Builder pattern by defining a class that creates objects for us. In this way, we can create an interface of our own choosing. For example, let us imagine that the Paragraph class is defined as follows

class Paragraph

  def initialize(font, size, weight, justification)
    @font = font
    @size = size
    @weight = weight
    @justification = justification
  end

end

and cannot be altered. We can define a Builder class that creates Paragraph objects for us, but allows us to set the style attributes in a block.

require 'ostruct'

class Builder

  def self.configure(klass, &block)
    return unless block_given?
    struct = OpenStruct.new
    struct.instance_eval &block
    defaults[klass] = struct.to_h
  end

  def self.create(klass, &block)
    struct = OpenStruct.new defaults[klass]
    struct.instance_eval &block if block_given?
    parameters = defaults[klass].keys.map{ |k| struct[k] }
    klass.new(*parameters)
  end

  private

    def self.defaults
      @@defaults ||= {}
    end

end

With this in place, we can set sensible defaults, which are tracked by the Builder. The pre-existing Paragraph class has no defaults.

Builder.configure(Paragraph) do
  self.font = 'Helvetica'
  self.size = 14
  self.weight = 200
  self.justification = 'right'
end

We can then see that when a Paragraph is created, it reflects those defaults.

p = Builder.create(Paragraph)

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Helvetica, 14, 200, right

and that those defaults can be overridden at creation time.

p = Builder.create(Paragraph) do
  self.font = 'Times'
  self.size = 16
end

puts "#{p.font}, #{p.size}, #{p.weight}, #{p.justification}"
# => Times, 16, 200, right

Thus, with relatively little extra work and no impact on the existing paragraph class, we can improve the way in which we instantiate Paragraph objects, adding features such as the ability to have default attribute values.

Simple internal DSLs in Ruby

It seem that creating a Doman Specific Language (DSL) is both considered all the rage and an overused scourge. In Ruby, it is really easy to create one, and I suspect that is why they are a popular tool for Rubyists. Although I've used many DSLs I have never have built one of my own. I have always had the desire to write my own programming language but am very daunted by the difficulty of crafting an elegant language that does not break down for all but the simplest cases, let alone writing an efficient language parser.

Anyway, if we focus on writing an Internal DSL, one which is built in and leverages a core language, we can accomplish this in Ruby with a simple instance_eval.

module DSL
  def self.enable(klass, &block)
    container = klass.new
    container.instance_eval(&block)
  end
end

Here I create a DSL module with a single enable method that accepts a class that defines the DSL methods and a block of code. A new instance of the class specifying the DSL is created and the block that is passed in is evaluated in the context of the class, thus making the DSL methods available within the block.

If we wanted to create a DSL for a pseudo reverse Polish notation (RPN) calculator, we would simply define a class with methods that define the operations in the language. For example:

class Calculator

  def initialize
    self.stack = []
  end

  def push value
    stack.push value
  end

  def add
    calculate { stack.pop + stack.pop }
  end

  def subtract
    calculate { stack.pop - stack.pop }
  end

  def multiply
    calculate { stack.pop * stack.pop }
  end

  def divide
    calculate do
      a = stack.pop
      b = stack.pop
      b / a
    end
  end

  private

    attr_accessor :stack

    def calculate &block
      result = block.call
      stack.push result
      return result
    end

end

Then using the DSL is as simple as calling DSL.enable with the Calculator class and a block of RPN as shown in the following RSpec tests. Note that the result of the RPN operations are given as the output of the call to DSL.enable.

describe 'Calculator' do

  it 'should add two numbers' do

    result = DSL.enable Calculator do
      push 1
      push 2
      add
    end

    expect(result).to eq(3)

  end

  it 'should divide two numbers' do

    result = DSL.enable Calculator do
      push 6
      push 2
      divide
    end

    expect(result).to eq(3)

  end

  it 'should handle multiple operations' do

    result = DSL.enable Calculator do
      push 3
      push 6
      push 2
      divide
      multiply
    end

    expect(result).to eq(9)

  end

end

Not only does implementing the DSL in this way provide access to the operators, but it can also hold state by way of instance variables (stack in this example).

Intentional Git: May the Git --force be with you

I recently gave a presentation on the fundamentals of Git workflow to the engineering team at PeopleAdmin. I have been spending more and more time learning and working with Git. The reasons for this are twofold. First, our Git repositories are large and have a long history. The code has been written and revised over the last 7 years. Second, the number of developers working on a project is always greater than one. Thus it is not unusual to run into the need to deal with code conflicts using advanced Git tools.

While either debugging existing code or adding new features that must integrate with existing code, I have often found it useful to review the Git history. Git blame gives us the commit SHA and author for each line of code in a file. Selecting the commit SHA of a relevant line of code, and then using Git log to show the related commit history can provide very useful context for the code by way of the commit messages. Ideally, the history is full of descriptive and relevant commit messages that help you to understand the motivation for the changes made to the code.

The de facto standard for formatting Git commit messages is given by Tim Pope:

Capitalized, short (50 chars or less) summary

More detailed explanatory text, if necessary.  Wrap it to about 72
characters or so.  In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body.  The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase can get confused if you run the
two together.

Write your commit message in the imperative: "Fix bug" and not "Fixed bug"
or "Fixes bug."  This convention matches up with commit messages generated
by commands like git merge and git revert.

It is important to not only explain the changes but also the motivation for said changes. Questions that Thoughtbot requires to be addressed in commit messages are given by Caleb Thompson:

* Why is this change necessary?
* How does this change address the issue?
* What are the side effects of this change?

Our team, like many, follows a Github/Git workflow that involves feature branching, pull requests, code review and automated merging into a master branch. This workflow significantly smooths the process of developing software. However, it does not place much, if any, pressure on crafting well formatted or meaningful commit messages. There are three views for each Pull Request (PR), the Discussion, Commits and Files Changed tabs. The Discussion tab captures the PR description, links to commits and any comments made during review. The Commits tab shows a list of links to commits and their respective commit summary lines. The Files Changes tab shows a code diff for the entire PR.

When providing a code review, it is common for a developer to review only the PR description and the code diff for the entire PR without examining any of the individual commit messages. There are many reasons for this. First, the description is front and center and answers many of those questions that should be answered in the commit messages. Second, accessing the full commit messages requires multiple clicks, and takes you away from the PR page. Finally, it is not uncommon, especially with developers new to working in Git, to only add (not revise) commits and to do so with very brief and sometimes meaningless commit messages. For example, I'm sure you have seen commits that only fix typos. When this linear commit process leads to changing code previously added in the same branch, to revise typos or more seriously to revise implementational details, viewing earlier commits would show code additions that do not necessarily reflect the state of the code in master once the PR is merged.

Although the Github process does not encourage it, a clean commit history and concise commit messages can help other developers understand, debug and extend your code. Achieving this benefit will require conscious effort and extra steps in your workflow. Because it makes sense to focus on getting the code right first, I suggest that the final step in your workflow (before opening a PR) is updating the commit history. To revise commit history there are a number of lesser used Git tools which can be useful. Git commit with the --amend flag will allow you to revise the files and commit message included in the last commit. Git rebase with the --interactive flag will, among other things, allow you to choose individual commit messages to update, as well as allow you to combine (squash) or split existing commits. Because these tools recreate the commit history of your local branch, if you have already shared the branch with a shared repository (for example on Github), you will need to use Git push with the --force flag. This will overwrite your previous history on the shared repository making it available for other developers to review and use.

When multiple developers work together on a single feature, the chances for conflicts increase. More often than not, the developers commit directly to a single feature branch, and attempt to resolve conflicts in the branch along the way. In practice, this approach works well enough when the developers are effectively communicating and working closely together. However, it has the side affect of further encouraging a linear commit style. This is due to the fact that Git is a distributed revision control system. When one developer makes a local commit and pushes it to a shared repository, it becomes available for other developers to pull into their local repository. If the first developer alters this commit and then pushes the commit to the shared repository, there is some chance that a second developer will have already progressed, having added code that depends on the first developer's original commit. Thus the branches in the local repositories have diverged, and the differences will need to be resolved.

To resolve the differences, the second developer can attempt to pull in the first developer's new commit and adjust the code additions to suit, or the the second developer can force push the branch to the shared repository, overwriting the first developer's new commit. Either way, this results in a less than optimal way to collaborate. An often expressed Git practice is "Do not rewrite public history". This is because updating commits in a shared repository can lead to conflicts that are not necessarily easy to resolve. However, this does not prevent you from rewriting the commit history once all the code is complete, just before issuing a pull request. Do not be afraid of using the Git --force, just be sure to use it at the appropriate time.

Using Git intentionally with the goal of creating clear and meaningful commit messages can be extremely useful for any developers working with your code in the future. While, like any form of code documentation, crafting it can take time, neglecting to do so will build up technical debt. Accruing this debt will gain time savings now, but will make working with your code harder and slower in the future. Whether you choose to craft quality commit messages or not, it should be an conscious decision driven by a cost/benefit analysis and the needs of your business. Do not let your choice of or unfamiliarity with your tools dictate this decision for you.

Review of Metaprogramming Ruby 2

I recently read Metaprogramming Ruby 2 and gave an overview presentation to the PeopleAdmin engineering team. While this book is listed as an advanced text for Ruby developers, it contains an extensive explanation of an important part of Ruby, the Object Model. While the book covers the Object Model as a lead-in to discussing metaprogramming, I believe this explanation would be useful for any developer except those just learning Ruby.

The first part of the book covers various aspects of the Object Model including the organization of classes and modules, the ins and outs of methods, blocks and procs, as well as the process of method and constant lookup. This is done through an easy to read story of two developers pairing to solve a series of programming problems which serves nicely to present code examples that demonstrate the target concepts.

The second part of the book tells three stories that demonstrate the pros and cons of metaprogramming in practice. The first centers around the way in which ActiveRecord developers leveraged metaprogramming to make the ActiveRecord API elegant and to incrementally improve the performance of the library. Any developer who has worked in Rails will find this retrospective discussing the evolution of ActiveRecord from 1.0 to 4.0 interesting in its own right. The other stories focus on ActiveSupport Concerns and the use and abuse of the alias_method_chain method in the context of Rails.

Metaprogramming is considered a dirty word in many circles. Although these techniques are very powerful, that power can, without care, come with serious costs. The author does a good job of not only arguing that metaprogramming is a tool that should be in a Rubyist's tool belt, but also provides appropriate usage patterns and makes the reader aware of best practices that help to minimize these costs.

This is one of the things that made me glad to have read through this book. The author names each metaprogramming patterns (he calls them spells) and collects them into an appendix (a grimoire). Although I have used many of these metaprogramming approaches in various projects, I did extract and name them. Now that I see them as patterns, I suspect it will be easier to reach into the tool belt and apply them to future problems.

Generating Descriptive Statistics in Ruby and Rails

The core Ruby libraries do not provide an easy way to calculate simple descriptive statistics on collections of numbers. However, this can be easily achieved using the DescriptiveStatistics Gem. First, start by installing the gem gem install descriptive_statistics. Then, once you require DescriptiveStatistics, all objects that extend Enumerable will begin to respond to the new statistical methods. For example

require 'descriptive_statistics'
data = [2,6,9,3,5,1,8,3,6,9,2]
data.number # => 11.0
data.sum # => 54.0
data.mean # => 4.909090909090909

data = {a: 1, b: 2, c: 3, d:4, e: 5}
data.mean #=> 3.0
data.variance #=> 2.0

require 'set'
data= Set.new([1,2,3,4,5])
data.median #=> 3.0
data.standard_deviation #=> 1.4142135623730951

data = Range.new(1,5)
data.sum #=> 15.0
data.mean #=> 3.0

Statistical methods also accept blocks, which can be used to make calculations on individual attributes of objects in a collection. For example

require 'descriptive_statistics'
LineItem = Struct.new(:price, :quantity)
cart = [ LineItem.new(2.50, 2), LineItem.new(5.10, 9), LineItem.new(4.00, 5) ]
total_items = cart.sum(&:quantity) # => 16.0
total_price = cart.sum{ |i| i.price * i.quantity } # => 70.9

DescriptiveStatistics can be used with Ruby on Rails but some care must be taken. The ActiveSupport library, which is required by Ruby on Rails, extends the Ruby core with a number of useful additional methods. One of these methods sum conflicts with that provided by DescriptiveStatistics.

To use DescriptiveStatistics with Ruby on Rails, you will need to use one of the safe methods described in the Readme which do not monkey patch the Enumerable module. The simplest method is to use the module methods directly. First, add DescriptiveStatistics to your Gemfile, requiring the safe extension.

source 'https://rubygems.org'

gem 'rails', '4.1.7'
gem 'descriptive_statistics', '~> 2.4.0', :require => 'descriptive_statistics/safe'

Then after a bundle install, the DescriptiveStatistics module methods will be available to operate on collections of objects, including ActiveRecord objects.

DescriptiveStatistics.mean([1,2,3]) # => 2.0
DescriptiveStatistics.mean(User.all, &:age) => 19.428571428571427

Alternatively, you can extend DescriptiveStatistics on an individual collection and call the methods as needed.

users = User.all.extend(DescriptiveStatistics)
mean_age = users.mean(&:age) # => 19.428571428571427
mean_age_in_dog_years = users.mean { |user| user.age / 7.0 } # => 2.7755102040816326

This approach will superseed ActiveSupport defined methods only on the extended collection and avoid any potential conflicts on other collections where the ActiveSupport methods will still be available.

The Hard Part of Software Development is not the Software

Talk presented at the February 2014 Austin on Rails Meeting

I am programmer but in the tradition of Michael Weston I am a total hack. I have been programming for 25 years and in that time I have learned how to get things done, but the one thing I know is that I am awful software developer. However, I want to get better.

There are a lot of ways that I have tried to get better over the years. I learned to use new languages, new frameworks, new APIs, new patterns, new libraries, new tools. All of these things helped me get stuff done.

Unfortunately, none of these things have really made me a better software developer. The one thing I haven't done much of is develop software as part of a large team. I think we all can understand that there are a lot of benefits to working with a team on software including:

  • Accomplish big and complex projects
  • Opportunities to learn from each other
  • Camaraderie is motivating
  • Someone to help you out when you make a mistake

So about 6 months ago I joined PeopleAdmin as a Ruby Developer. PeopleAdmin is a company that provides HR solutions to Universities using Software as a Service Model. We have a very large and complex Rails codebase that is 7+ years old and we host terabytes worth of data. The Engineering team consists of about 25 people. It is a top notch team and I'm honored to be part of it.

Based on all of your experiences, you probably roughly already know what takes to makes a good team; knowledgable and reliable people, good communication, trust and respect, and ownership. It's probably an uncontroversial statement that being a good team member is a very important part of being a good software developer, but working with other people can be hard and sometimes things go wrong, very wrong.

There are many reasons for this including:

  • Some people seek individual recognition / gain
  • Being polite is hard, it takes extra effort
  • All you really care about is the software
  • That person you just naturally do not get along with

So I want to tell you two stories from my experiences at PeopleAdmin that demonstrate lessons that I have learned from the mistakes that I made working with the team. They center on these two themes, recognition of excellence and empathy for failure.

Recognition of excellence

So when I first joined peopleadmin, one of the other developers was wrapping up a project getting it into production. He had used a new language Go and a SOA approach to solve a problem that had been a sore spot in the product for a long while. It improved our throughput and added additional functionality to our application which was a big win.

The one thing I noticed was the heaping helping of praise he received over and over from the team. At the time I did not think, "oh he did great work". I thought "I would love that to be me receiving the praise". I took a cursory look at the work and thought "I can do that". And honestly I do not even think that I complemented him on what was actually really excellent work.

Since starting at PeopleAdmin I've been working on a large project that is nearing completion now. At a recent demonstration of the project, I received a heaping helping of praise. That was a really good feeling. However, a new junior developer made an offhand comment in passing that made me think he didn't appreciate the complexity of the problem or value the hard work I had put into creating a solution.

And it clicked. It was only then that I realized the mistake I had made in judging the value my colleagues work from before. I underestimated the amount of work and complexity of the problem from the outside looking in. It looked simple. One take away is that we are naturally biased to think simple things are simple to create.

Another take away from the story is that recognition of excellence is not a zero sum game. This has always been a hard one for me to wrap my head around and I think it's a natural bias to believe that recognizing someone else's excellence and your own excellence are mutually incompatible. I think that is partly because the competitive nature of our society so that bias is built into the DNA of a lot of us.

Although it might be natural to compete with your team mates, it is actually counter productive. One of the great benefits of being on a team is their support and praise. You need to do the same.

Empathy for failure

Second story. Early on in my time at PeopleAdmin, one of the developers committed code that broke the build. They sheepishly admitted to it and went about fixing it. And I thought "that was a silly mistake, they should have been more careful" and I may have downgraded my perception of the developer's skills. I was new, trying to get a feel the skill levels of the various developers on the team.

Shortly thereafter, I am 90% sure that by messing around with something I knew little to nothing about, I corrupted our staging environment. This caused a mountain of extra work for the team and a delay to all our QA work in progress. I was thinking through all the things that set me up to fail, excuses and such.

What I realized was that I exhibited the fundamental attribution error. It is a well known and pervasive psychological fact where we tend to attribute the cause of the failure to the actor's shortcomings more frequently than the situational factors that lead to the failure.

However this only hold true for judging others, not yourself. We overwhelmingly attribute our own failures to situational factors. This is the actor/observer bias.

For example when you see someone trip, you think they are clumsy, but when you trip, you ask who put that in my way?

A more relevant example, when you see a bug, it is easy to say, "who's the moron?" rather than "oh man, this is a really complex system we're working on".

Take aways

  • Recognize excellence freely because creating simple things is hard
  • Have empathy for failure because failure is, more often than not, due the circumstances surrounding the failure

Internalize these and you will be a better team member and thus a better software developer!