Writing a Gem with native extensions

29 August 2018

There are many reasons you might want to write a Gem using native extensions. Performance is perhaps the most obvious. CPU heavy tasks, such as number crunching, can be re-written in C, to be many times faster than the equivalent Ruby code. For the daring among us, you can use multiple threads, GPUs, etc.

Another is to re-use existing code. Whether that be legacy code that is critical to your business, or a third-party library that happens to do exactly what you need, a native extension can give you access to that functionality from Ruby.

Examples

Let’s make this more concrete with some examples of well-known gems that rely on native extensions:

Byebug - a debugger for Ruby, that uses Ruby’s TracePoint API for execution control and the Debug Inspector API for call stack navigation. Written as a C extension for speed.
nokigiri - an HTML and XML parser. Uses native libraries for speed and ensure standards compliance.
RMagick - bindings for the ImageMagick image manipulation library.
sqlite3 - bindings for the SQLite3 database engine.

My motivation

My motivation for learning about native extensions is improve the performance, and memory footprint, of a data structure I implemented using Ruby. This data structure is called a partially persistently tree.

A persistent data structure preserves the previous version of itself when modified, and partial persistence implies that only the latest version can be updated (as opposed to full persistence, where updates can be performed on any previous version). So what I’ve implemented is essentially a versioned tree.

To improve performance, and reduce memory usage, I have been reimplementing the core functionality in C++, exposing it to Ruby via a native extension. This code is currently incomplete, but the challenge has been interesting enough to warrant writing this post, and to do a talk at the Melbourne Ruby meetup.

Clipboard access

Persistent data structures are a vast subject, so to make this all more accessible, the remainder of this post steps through a more contained example. We’ll use a C library called libclipboard to access a user’s clipboard from Ruby code.

For those who want to jump ahead and see all of the code in one place, you can find it on GitHub at:

https://github.com/tristanpenman/simple-clipboard

The libclipboard API

libclipboard is a cross-platform clipboard library, which provides the following functions for interacting with a user’s clipboard:

clipboard_new - create a context through which to access the user’s clipboard
clipboard_free - free any memory allocated by clipboard_new
clipboard_text - read the contents of the clipboard as text, if possible
clipboard_set_text - replace the contents of the clipboard with new text

Before we can use libclipboard from our own code, we’ll need to install it.

Installing libclipboard

The following instructions assume you are working in a UNIX-based environment, with git, cmake and a compiler toolchain installed. On Mac OS X, this can be achieved by installing XCode (and command-line tools), Homebrew, and then using Homebrew to install CMake.

With these pre-requisites met, you should be able to run these commands to compile and install libclipboard:

git clone https://github.com/jtanx/libclipboard
cd libclipboard
mkdir build
cd build
cmake ..
make -j4
sudo make install

Once that is complete, you can test that it is working:

./bin/clip_sample1 -s hello
./bin/clip_sample1

The second command should print out ‘hello’.

Extending Ruby using C

We’re going to use C to create a module called SimpleClipboard. This module will contain two methods, get_text and set_text. get_text will return the current contents of the clipboard, as text. set_text will replace the contents of the clipboard, but its return value will be the previous contents of the clipboard.

Big picture

Here’s the code we’re working towards:

#include <ruby.h>
#include <libclipboard.h>
#include "extconf.h"

static clipboard_c *cb = NULL;

VALUE set_text(VALUE _self, VALUE val) {
    Check_Type(val, T_STRING);
    VALUE result = Qnil;
    char *text = clipboard_text(cb);
    if (NULL != text) {
        result = rb_str_new(text, strlen(text));
        free(text);
    }
    if (false == clipboard_set_text(cb, StringValueCStr(val))) {
        rb_raise(rb_eRuntimeError, "Failed to write to clipboard.");
    }
    return result;
}

VALUE get_text(VALUE _self) {
    VALUE result = Qnil;
    char *text = clipboard_text(cb);
    if (NULL != text) {
        result = rb_str_new(text, strlen(text));
        free(text);
    }
    return result;
}

void Init_simple_clipboard() {
    cb = clipboard_new(NULL);
    if (NULL == cb) {
        rb_raise(rb_eRuntimeError, "Failed to create clipboard context.");
    }
    VALUE mod = rb_define_module("SimpleClipboard");
    rb_define_module_function(mod, "get_text", get_text, 0);
    rb_define_module_function(mod, "set_text", set_text, 1);
}

If you aren’t fluent in C, this will look… complex. To break this down, we’ll start from the bottom.

Initialisation

When Ruby first loads a native extension, it will look for a function called Init_{extname} where {extname} is the name of the extension. This gives the extension an opportunity to define modules, classes, etc. and to do any other initialisation that is required. We will call our extension ‘simple_clipboard’, so this function will be named Init_simple_clipboard.

Here, we define a module called ‘SimpleClipboard’, and store a reference to it in mod. We then define two module methods, get_text and set_text, that take 0 arguments, and 1 argument, respectively:

void Init_simple_clipboard() {
    cb = clipboard_new(NULL);
    if (NULL == cb) {
        rb_raise(rb_eRuntimeError, "Failed to create clipboard context.");
    }
    VALUE mod = rb_define_module("SimpleClipboard");
    rb_define_module_function(mod, "get_text", get_text, 0);
    rb_define_module_function(mod, "set_text", set_text, 1);
}

Note that we also call clipboard_new to setup a context through which to access the clipboard. This is required by the libclipboard library. If this fails, we raise a RuntimeError.

Reading from the clipboard

Moving further up, we have the C implementation of the get_text method:

VALUE get_text(VALUE _self) {
    VALUE result = Qnil;
    char *text = clipboard_text(cb);
    if (NULL != text) {
        result = rb_str_new(text, strlen(text));
        free(text);
    }
    return result;
}

This C function returns a VALUE, which can refer to any Ruby value. We set a default return value of nil. We then call clipboard_text to get the current contents of the clipboard, which may not necessarily be set. The other tricky thing here is that we can’t just return the string (char *text) returned from libclipboard. We first need to turn it into a VALUE using rb_str_new. rb_str_new takes two arguments - a pointer to a string (an array of char), and the number of characters to take from that array.

Once we’re done with the string, we free it, and then we can return the VALUE we created using rb_str_new.

Note that we use the Ruby convention of prepending an underscore to the name of unused parameters.

Writing to the clipboard

Writing to the clipboard is similar, but first line of the function is worth highlighting:

Check_Type(str, T_STRING);

val is the value given to the function by the Ruby interpreter. This must be a string, and the Check_Type macro is used to raise an ArgumentError if that is not the case.

extconf.rb

Finally, some Ruby!

The ‘extconf.rb’ file should contain:

require 'mkmf'

$LOCAL_LIBS << '-lclipboard'

if RUBY_PLATFORM =~ /darwin/
  $LDFLAGS << '-framework AppKit'
end

create_header
create_makefile 'simple_clipboard'

Once you’ve created this file, you can run it:

ruby extconf.rb

This configures the build parameters needed to compile our native extension, and generates several files:

extconf.h
Makefile
mkmf.log

The one we care about right now is ‘Makefile’. The make command will look for this file in the current directory, and use the definitions within it to compile our native extension:

Running make on Mac, you’ll see something like this:

compiling simple_clipboard.c
linking shared-object simple_clipboard.bundle

Now we can use IRB to load the simple_clipboard extension:

require './simple_clipboard'
SimpleClipboard.methods

You should see a method list something like this:

=> [:get_text, :set_text, :<=>, :module_exec, :class_exec, :<=, :>=,
:==, :===, :include?, :included_modules, :ancestors, :name,
:public_instance_methods, :instance_methods, :private_instance_methods,
:protected_instance_methods, :const_get, :constants, :const_defined?,
:const_set, :class_variables, :class_variable_get, :remove_class_variable,
:class_variable_defined?, :class_variable_set, :private_constant, ...

To test out set_text, first copy some text (such as ‘this text’). Now, run the following in IRB:

SimpleClipboard.set_text 'test'

The result should be the current value of the clipboard (‘this text’, or whatever you placed on the clipboard). Now call:

SimpleClipboard.get_text

The result should now be ‘test’.

Putting this into a gem

Structure

This is the structure for our simple_clipboard gem:

ext/
    simple_clipboard/
    extconf.rb
    simple_clipboard.c
lib/
    simple_clipboard/
    version.rb
    simple_clipboard.rb
simple_clipboard.gemspec
LICENSE
README.md

The key difference from a regular gem is that we have a ‘ext’ directory, for any native extensions.

Our ‘simple_clipboard.gemspec’ file also looks pretty normal. We only have to add a line to specify extensions, and add our .c source file to the list of files to be bundled in the gem:

require File.expand_path("../lib/simple_clipboard/version", __FILE__)

Gem::Specification.new do |s|
  s.name = 'simple_clipboard'
  s.version = SimpleClipboard::VERSION
  s.date = '2018-07-24'
  s.summary = 'Simple clipboard example gem'
  s.authors = ['Tristan Penman']
  s.email = ['tristan@tristanpenman.com']
  s.licenses = ['MIT']
  s.homepage = 'https://www.github.com/tristanpenman/simple-clipboard'
  s.extensions = ['ext/simple_clipboard/extconf.rb']
  s.files = [
    'ext/simple_clipboard/simple_clipboard.c',
    'lib/simple_clipboard.rb',
    'lib/simple_clipboard/version.rb'
  ]
  s.require_paths = ['lib']
end

lib/simple_clipboard/version.rb:

module SimpleClipboard
  VERSION = '0.0.1'
end

lib/simple_clipboard.rb:

# Ensure that native extension is loaded
require 'simple_clipboard/simple_clipboard'
require 'simple_clipboard/version'

Build

Building the gem is straight-forward. Note that the native extension is not actually compiled at this point.

gem build simple_clipboard.gemspec

No surprises in the output:

  Successfully built RubyGem
  Name: simple_clipboard
  Version: 0.0.1
  File: simple_clipboard-0.0.1.gem

Install

Installing the gem is when things get more interesting:

gem install simple_clipboard-0.0.1.gem

We can see in the output that this is when the native extension is actually compiled. And this is why installing gems with native extensions is a common pain point for new Ruby users:

  Building native extensions.  This could take a while...
  Successfully installed simple_clipboard-0.0.1
  Parsing documentation for simple_clipboard-0.0.1
  Done installing documentation for simple_clipboard after 0 seconds
  1 gem installed

Once the gem is installed, we can try it out in IRB:

2.3.3 :001 > require 'simple_clipboard'
 => true
2.3.3 :002 > SimpleClipboard::VERSION
 => "0.0.1"
2.3.3 :003 > SimpleClipboard::get_text
 => "gem install simple_clipboard-0.0.1.gem"
2.3.3 :004 > SimpleClipboard::set_text "Hello world"
 => "gem install simple_clipboard-0.0.1.gem"
2.3.3 :004 > SimpleClipboard::get_text
 => "Hello world"

Testing

Testing a native extension is a little trickier than usual. Before running any tests, we need to compile our code and put the bundle somewhere that Ruby can find it (ideally not with our system-level gems).

Here is the approach that I used to get RSpec working (the choice of RSpec is due to the fact that the test suite for my partially persistent tree was written using RSpec). We’re going to create a Rakefile, and use the Rake::ExtensionTask class from the ‘rake-compiler’ gem to automatically compile our native extension before running tests.

To do this, we need two new files - a Gemfile and a Rakefile:

Gemfile:

source 'https://rubygems.org'

# Dependencies specified in simple_clipboard.gemspec
gemspec

Rakefile:

require "bundler/gem_tasks"
require "rspec/core/rake_task"
require 'rake/extensiontask'

desc "simple_clipboard test suite"
RSpec::Core::RakeTask.new(:spec) do |t|
  t.pattern = "spec/*_spec.rb"
  t.verbose = true
end

gemspec = Gem::Specification.load('simple_clipboard.gemspec')
Rake::ExtensionTask.new do |ext|
  ext.name = 'simple_clipboard'
  ext.source_pattern = "*.{c,h}"
  ext.ext_dir = 'ext/simple_clipboard'
  ext.lib_dir = 'lib/simple_clipboard'
  ext.gem_spec = gemspec
end

task :default => [:compile, :spec]

We also need to update ‘simple_clipboard.gemspec’ to include some additional development dependencies:

require File.expand_path("../lib/simple_clipboard/version", __FILE__)

Gem::Specification.new do |s|
  s.name = 'simple_clipboard'
  s.version = SimpleClipboard::VERSION
  s.date = '2018-07-24'
  s.summary = 'Simple clipboard example gem'
  authors = ['Tristan Penman']
  s.email = 'tristan@tristanpenman.com'
  s.licenses = ['MIT']
  s.homepage = 'https://www.github.com/tristanpenman/simple-clipboard'
  s.extensions = ['ext/simple_clipboard/extconf.rb']
  s.files = [
    'ext/simple_clipboard/simple_clipboard.c',
    'lib/simple_clipboard.rb',
    'lib/simple_clipboard/version.rb'
  ]
  s.require_paths = ['lib']

  # Required to run tests
  s.add_development_dependency "rspec", ">= 2.13.0"
  s.add_development_dependency "rake", ">= 1.9.1"
  s.add_development_dependency "rake-compiler", ">= 0.8.3"
end

Doing this gives us several rake tasks we can use for native extension development:

The default task (just running rake) will both compile the code, and run tests
Running rake compile will just compile the code
And running rake spec will just run the test suite

The reason we don’t simply make compile a dependency of spec, is to avoid unnecessary compile steps if we a just writing/iterating on tests.

Resources

I’m going to wrap up this post with a list of resources that I have found helpful while learning about Ruby internals, and doing native extension development.

Ruby Under a Microscope

One of my favourite references has been Pat Shaughnessy’s book Ruby Under a Microscope. While not specifically about native extensions, this book goes into plenty of detail about the inner workings of several Ruby implementations. This is kind of knowledge that will help guide your intuition about Ruby performance.

Official Documentation

The Ruby and RubyGems documentation is also a good place to start:

Blogs and web sites

Aaron Bedra’s Extending Ruby guide:

http://aaronbedra.com/extending-ruby

Chris Lalancette’s in-depth series on writing Ruby extensions in C, which covers the following topics:

Ruby Native Extensions in C, starter gem

I found this repo on GitHub useful when figuring out how to get RSpec to work:

https://github.com/neilslater/ruby_nex_c

Complete example

Finally, you can find the complete code for the ‘simple_clipboard’ gem on GitHub:

https://github.com/tristanpenman/simple-clipboard