Extending Ruby with C++

16 September 2018

In my previous post, I covered the basics of writing a Ruby gem with native extensions. We saw how C can be used to enhance the functionality of the Ruby language. This post takes the idea a step further, and looks at how to extend Ruby with C++.

Throughout this post, we’ll look at several examples, all of which are available on GitHub.

Essentials

Let’s start with the essentials. As C++ programmers, what do we need to know to write high quality Ruby extensions?

Ruby

Well, it doesn’t hurt to know Ruby. Since Ruby is a dynamicly-typed, object-oriented language, with garbage collection built into the VM, it requires a different kind of thinking than regular C++ development. If nothing else, remember that when you write a Ruby extension in C or C++, you are essentially injecting concepts into a shamelessly object-oriented programming environment.

Ruby C API

It is also important to be familiar with the Ruby C API. This API is extensive enough that, for most projects, you will only need to use a small portion of it. For this post, here are some key concepts that you will find useful:

Ruby values, which are dynamically typed, are represented in C using a data type called VALUE. Classes and modules are also values.
Whenever you define a Ruby class or module, you will want to assign it to a VALUE variable in your C/C++ code. This is so that you can pass it as an argument to other Ruby API functions.
Values are reference counted.

The Ruby C API also includes some macros for working with VALUEs. For example the following two macros can be used for type-checking:

RB_TYPE_P(obj, T_STRING) - returns true if obj is a String
Check_Type(obj, T_STRING); - raises a TypeError (in the Ruby VM) unless obj is a String

The Definitive Guide to Ruby’s C API goes into much more detail about this (and much more).

Three main approaches

This post looks at several ways we can extend Ruby using native code:

Inlined native code
Foreign Function Interface
Native extensions

For each of these, we’ll work through an example, look at some pros and cons, and the degree to which C++ is supported.

Inlined native code

The first approach we’ll look at is including inline C++ code in a Ruby script. This is made possible by the RubyInline gem, which can be installed using the command:

gem install RubyInline

To show how this works, we’ll start with a relatively simple example - using functions from the <iostream> header to write a string to the console (more specifically, stdout). This is the script we’ll start with:

require 'inline'

module Simon
  # This is using a Ruby block to build inlined functionality
  inline(:C) do |builder|
    builder.add_compile_flags '-x c++', '-lstdc++'
    builder.include '<iostream>'
    builder.c_singleton '
    void says(const char * str) {
        std::cout << str << std::endl;
    }'
  end
end

This code creates a class called Simon. In the call to builder.c_singleton, RubyInline uses some clever pattern matching to figure out that it should create a singleton method called says, that takes a string as an argument. There is a similar method builder.c which can be used to define instance methods.

The call to builder.add_compile_flags is also important, because those flags are required to compile C++ code.

As a slightly less trival example, this code uses a random number generator from <random> to implement a dice-roll simulator:

require 'inline'

module Dice
  inline(:C) do |builder|
    builder.add_compile_flags '-x c++', '-std=c++14', '-lstdc++'
    builder.include '<random>'

    builder.prefix '
      // Seed with a real random value, if available
      static std::random_device r;
      // Choose a random mean between 1 and 6
      static std::default_random_engine e1(r());
      static std::uniform_int_distribution<int> uniform_dist(1, 6);'

    builder.c_singleton '
      VALUE roll(int count) {
          VALUE result = rb_ary_new2(count);
          for (int i = 0; i < count; ++i) {
              VALUE v = INT2NUM(uniform_dist(e1));
              rb_ary_store(result, i, v);
          }
          return result;
      }'
  end
end

In this case, we’ve used builder.prefix to provide additional code that will be included in the C++ translation unit, but will not be automatically exposed as a Ruby method.

Another variation we see here is that the roll function uses Ruby’s VALUE type directly, rather than relying on RubyInline’s type mapping.

Using this module is as simple as:

# Print the outcome of three dice rolls
puts Dice.roll 3

Pros and cons

Perhaps the most compelling reason to use RubyInline is that code generation can be completely dynamic. This would be dangerous for production code, but has pretty interesting applications in areas such as computational art. This is where the combination of Ruby’s flexibility with C++’s performance could really shine.

There are several drawback to this approach. First is that compilation happens at runtime, incurring some initial overhead. Furthermore, compilation failures and syntax errors will only be detected at runtime. And despite some clever type-mapping built in to RubyInline, you will still need a robust understanding of Ruby’s C API.

Foreign Function Interface

Another option is a Foreign Function Interface (FFI), such as the ffi gem.

A Foreign Function Interface allows code written in one language (in this case, Ruby) to access functions written in another language (e.g. C, C++ or Rust).

There are some limitations to this approach, in particular the requirement that native code be available as a shared library on the user’s operating system. This can be great for native code or libraries that are already packaged in this way, but less practical if you have to write additional glue code in C++.

For our simple example, you will need to install the ‘ffi’ gem:

gem install ffi

Unfortunately, we cannot use <iostream> here, but we can instead invoke the C library function puts via FFI:

require 'ffi'

module Simon
  # Include ffi functionality as a 'mixin'
  extend FFI::Library

  # Link with libc
  ffi_lib 'c'

  # Define a method called 'says' that takes a string and prints it to stdout using 'puts'
  attach_function :says, :puts, [ :string ], :int
end

Simon.says 'Hello'

Pros and cons

Be aware that, while the example above is simple, FFI techniques are a broad and complex topic. There are subtleties in their implementation that you will want to understand before using them in critical code, e.g. exceptions, mapping of data types.

Despite these concerns, there are some good reasons for using FFI. Because the code that you write is ordinary Ruby, it is easier to port FFI code to other Ruby implementations, such as JRuby and Rubinius. And as long as your end users have installed the ‘ffi’ gem, your Ruby code can depend on it for C library interoperability without compiling code.

Native extensions

Finally, we have native extensions.

A native extension is typically a collection of C code, that is compiled into a bundle that can then be loaded by the Ruby VM. Although these bundles could in theory be provided to users directly, they are not platform-agnostic, and managing all of the different versions would be a burden on developers. So native extensions are typically compiled as part of the installation of a Ruby Gem. Examples of Gems that take this approach are:

nokigiri - an HTML and XML parser. Uses native libraries for speed and ensure standards compliance.
RMagick - bindings for the ImageMagick image manipulation library.
sqlite3 - bindings for the SQLite3 database engine.

We won’t cover the steps necessary to include a native extension in a Ruby Gem here - I have covered that in an earlier post. But we will take a look at how you can compile a native extension that includes C++ code.

Basics

To begin with, we need two files:

extconf.rb
{ext_name}.c - where {ext_name} is the name of our extension in snake_case.

For this example the extension name will be ‘simon_native’, so ‘extconf.rb’ should contain:

require 'mkmf'
create_makefile 'simon_native'

The extension itself is written in C, using Ruby’s C API. And since our extension is called ‘simon_native’, we will need a file called ‘simon_native.c’, containing:

#include <stdio.h>
#include <ruby.h>

VALUE says(VALUE _self, VALUE str) {
    Check_Type(str, T_STRING);
    puts(StringValueCStr(str));
    return Qnil;
}

void Init_simon_native() {
    VALUE mod = rb_define_module("Simon");
    const int num_args = 1;
    rb_define_module_function(mod, "says", says, num_args);
}

I won’t go into the details of Ruby’s C API here, except to direct your attention to the name of the second function in that file (Init_simon_native). This function will be called by the Ruby VM when the extension is loaded, and is named using the convention of Init_{ext_name} so that the Ruby VM can actually find it.

We can generate a Makefile for the extension using:

ruby extconf.rb

Then we can compile the code:

make

Once this is compiled, we can use IRB to test the extension:

require './simon_native'
Simon.says 'Hello'

Note that unlike regular ‘require’ statements, this one is prefixed with ‘./’, so that the Ruby VM knows to look in the current directory for the extension.

Rice basics

Time to introduce a library called Rice. Rice is first and foremost a C++ wrapper for Ruby’s C API. Using Rice, we can re-write the example above using C++:

#include <iostream>
#include "rice/Module.hpp"

using namespace Rice;

void says(const char * str) {
    std::cout << str << std::endl;
}

extern "C"
void Init_simon_native_rice() {
    define_module("Simon")
        .define_module_function("says", &says);
}

To compile this, put that code in a file called ‘simon_native_rice.cpp’ (simon_native_rice is the name that we’ll give to this version of the extension). You will also need to update ‘extconf.rb’:

require 'mkmf-rice'
create_makefile 'simon_native_rice'

Now we can generate the new Makefile:

ruby extconf.rb

And compiled the code:

make

In IRB, we can test out the extension:

require './simon_native_rice'
Simon.says 'Hello'

Rice wrappers

What if we want to wrap an existing C++ class? Do we have to manually define a Ruby class, along with a constructor, and wrappers for the functions we want to expose?

Sort of… While wrapping the class is unavoidable, Rice allows us to achieve this more compactly. Assume we have ‘simon.hpp’:

#pragma once
#include <iostream>

class Simon {
public:
    void says(const char * str) {
        std::cout << str << std::endl;
    }
};

Rice allows us to wrap this using function templates:

#include "rice/Data_Type.hpp"
#include "rice/Constructor.hpp"
#include "simon.hpp"

using namespace Rice;

extern "C"
void Init_simon_native_rice_wrapper() {
    Data_Type<Simon> rb_cSimon =
        define_class<Simon>("Simon")
            .define_constructor(Constructor<Simon>())
            .define_method("says", &Simon::says);
}

We’ll put this in a file called ‘simon_native_rice_wrapper.cpp’ (that’s a mouthful), and update ‘extconf.rb’ appropriately:

require 'mkmf-rice'
create_makefile 'simon_native_rice_wrapper'

After following the same build steps as before, we can test this in IRB, seeing that it behaves like a Ruby class:

require './simon_native_rice_wrapper'
simon = Simon.new
simon.says 'Hello'

An interesting part of this example is the call to define_constructor, and what happens when you run the code after commenting or removing that line:

2.3.3 :001 > require './simon_native_rice_wrapper'
=> true
2.3.3 :002 > simon = Simon.new
TypeError: allocator undefined for Simon
    from (irb):2:in `new'
    from (irb):2
    ...

Normally, when you create a class in Ruby, it is given a default constructor (the initialize method). Calls to new on the class will allocate memory for the class instance, then call initialize, with the variable self pointing to the new instance.

In the call to define_class above, a Ruby class is created, but Rice will immediately undef the default allocator and initialize method. This provides a clean slate for your code to come and define its own behaviour for class construction.

Pros and cons

The obvious benefit of Rice is that we can write actual native extensions using C++. This gives us maximum flexibility in terms of how we write and interact with native code and the libraries on a user’s system. That comes at the cost of some added complexity.

Also worth noting is that, even if you end up using Rice’s more advanced wrappers for C++ classes, you will still need to understand Ruby’s C API to write robust and performant native extensions. And on top of this, you’ll need to study Rice in detail, to see how all the pieces fit together.

References

There’s a lot to digest in this post, and even more online, or in print form. While I have provided various links throughout this post, I’m going to list a few general resources that I have found helpful:

For anyone doing serious work with the Ruby, I recommend reading Pat Shaughnessy’s book Ruby Under a Microscope. This will give you an understanding of the inner workings of MRI, JRuby and Rubinius, and will hopefully lead to your code being more performant and robust.
Maxwell Anselm’s The Definitive Guide to Ruby’s C API is a good introduction to Ruby’s C API and the concepts you need to understand when writing native extensions.
And when you’re reading to package your code as a Gem, Aaron Bedra’s Extending Ruby will help you get started there.

Tristan Penman

Extending Ruby with C++

Essentials

Ruby

Ruby C API

Three main approaches

Inlined native code

Pros and cons

Further reading

Foreign Function Interface

Pros and cons

Further reading

Native extensions

Basics

Rice basics

Rice wrappers

Pros and cons

Further reading

References