Tristan Penman's Blog

Roll Your Own Ruby Type Checking: Part 1

26 December 2022
Last Updated: 10 April 2023

Earlier this year, I was talking with a colleague about Sorbet and other type checking features that are available for Ruby. Although we were with familiar with usage of gradual typing in other languages such as TypeScript and Python, neither of us really knew how this would work under the hood.

This post begins with an example from the Sorbet documentation, and builds up the machinery to support a simplistic type checking implementation. While our implementation will not be production-ready like Sorbet, I hope that the exploration will be enlightening.

Later posts will refine the type checking implementation, making it more versatile, and capable of being used with more sophisticated Ruby programs.

Type Checking

But first, what is Type Checking. Or more specifically, what does it mean to check types?

In a dynamically-typed language such as Ruby, the types of variables, parameters, etc, are not known until runtime. That is, they can vary depending on how your program is run. A variable that is a String in some cases, may be Numeric in other cases. The purpose of a type-checker is to detect bugs that may occur due to mis-matching types. An example of this is passing a String as an argument when a number is expected.

Where does Sorbet fit into this?

Sorbet

Sorbet is a powerful type checker, built specifically for Ruby. It is implemented using Ruby language features, unlike other type checkers, which are often based on pre-processors and other tricks. The Sorbet ecosystem also provides plugins for IDE integration and additional development tooling.

Let’s look at a simple example, showing how Sorbet is used… When we visit the Sorbet website we’re greeting with this little example, illustrating just how easy it is to add type checking to both new and existing code:

class Main
  extend T::Sig

  sig do
    params(
      name: String
    ).returns(
      Integer
    )
  end
  def main(name)
    puts "Hello, #{name}!"
    name.length
  end
end

This could also be written more compactly:

class Main
  extend T::Sig

  sig { params(name: String).returns(Integer) }
  def main(name)
    puts "Hello, #{name}!"
    name.length
  end
end

We can see from this example that Sorbet provides type checking via a method called sig, which looks a bit like an annotation that you might see in a language such as Java.

As is common in the Ruby ecosystem, this one line type annotation is built upon some non-trivial Ruby magic. In this post, we’ll attempt to re-invent our own annotation machinery, and use it to build a simple type checker inspired by Sorbet.

Repeater

For the examples in this post, we’ll work with a toy class called Repeater. All this class does is repeat a string str a given number times count, with each copy of the string separated by separator. Each version of this class will be given its own number, e.g. Repeater1, Repeater2.

The original implementation looks like this:

class Repeater
  def repeat(str, count, separator: '') do
    Array.new(count, str).join(separator)
  end
end

This class is very easy to test from IRB, or from a script:

puts Repeater.new.repeat('test', 3, ', ')

In this case, we’ll get this output:

test, test, test

Now let’s say we want to add some functionality to this class, so that we print its positional arguments before the body is executed, and print out the return value before it is returned to the caller.

Here’s the most direct way we could implement this:

class Repeater1
  def repeat(str, count, separator: '')
    puts "before: [\"#{str}\", #{count}]"
    ret = Array.new(count, str).join(separator)
    puts "after: #{ret}"
    ret
  end
end

Now If we were to run the following:

puts Repeater1.new.repeat('test', 3, separator: ', ')

It would produce:

before: ["test", 3]
after: test, test, test
test, test, test

If we now allow ourselves a little bit of indirection, this can be made more generic by lifting the method body into a lambda:

class Repeater2
  def repeat(*args, **kwargs)
    puts "before: #{args}"

    # lambda literal
    fn = -> (str, count, separator: '') do
      Array.new(count, str).join(separator)
    end

    ret = fn.call(*args, **kwargs)
    puts "after: #{ret}"
    ret
  end
end

What is unusual about Repeater2 is that the repeat method no longer uses the original parameter list. Instead, it uses *args and **kwargs. These are called rest parameter, and allow us to capture and print all arguments using puts. The code from the original repeat method is captured within a lambda literal, where we see the original parameter list.

If we were to run the following line of code:

puts Repeater2.new.repeat('test', 3, separator: ', ')

It will produce the same output as Repeater1:

before: ["test", 3]
after: test, test, test
test, test, test

Hooks

Of course, we are working towards an annotation-based style, and would like to simplify Repeater2 so that the code to run before and after the method body is more clearly separated. Ideally something like this:

before ->(*args, **kwargs) do
  puts "before: #{args}"
end
after ->(returns) do
  puts "after: #{returns}"
end
def repeat(str, count, separator: '')
  Array.new(str, count).join(separator)
end

In this snippet, before is a hook that is called with the arguments supplied to the method, but before the body of the method is executed. And after is a hook that is called after the body is executed, with its return value.

If we consider that Ruby class declarations are executed from top to bottom, and existing methods can be called from within the class body, then we see that before and after could be provided by extending a module:

class MyClass
  extend Hooks
  # methods from Hooks are available now
  # ...
end

How does the Hooks module work?

We can see from the snippets above, that we need to define a module that provides before and after methods. We’ll call this module Hooks. The before and after methods are very simple setters that store a lambda in a class variable, so that it can be used later. We’ll call this the hook context for a method.

module Hooks
  def before(tag)
    @before = tag
  end

  def after(tag)
    @after = tag
  end

We can then use Ruby’s powerful meta-programming functionality to enhance method declaration, in such a way that the next method to be defined will execute within our new hook contexr.

We’ll do this by implementing method_added, which is automatically called whenever a method is added to a class:

  def method_added(name)
    # short-circuit, to avoid infinite loop
    return unless @before || @after

    # reset hook context, but store current values
    before = @before
    @before = nil
    after = @after
    @after = nil

    # capture the original method
    meth = instance_method(name)

    # wrap the original method with calls to before and after hooks
    define_method(name) do |*args, **kwargs, &block|
      before.call(*args, **kwargs) if before
      ret = meth.bind(self).call(*args, **kwargs, &block)
      after.call(ret) if after
      ret
    end
  end
end

define_method is used to replace the newly added method with a block that executes our before and after hooks. We have to be careful to clear @before and @after before calling define_method, otherwise this code will keep triggering method_added, and we’ll end up with a stack overflow.

We can now implement Repeater3, extending the Hooks module:

class Repeater3
  extend Hooks

  before -> (*args, **kwargs) do
    puts "before: #{args}"
  end
  after -> (returns) do
    puts "after: #{returns}"
  end
  def repeat(str, count, separator: '')
    Array.new(count, str).join(separator)
  end
end

Lets check that it works as expected, using the same test code as earlier:

puts Repeater3.new.repeat('test', 3, ', ')

This should produce exactly the same output:

before: ["test", 3]
after: test, test, test
test, test, test

Type Checking

The next step is to develop this into a convenient, if somewhat primitive, type checking system.

Our first thought may be to use our Hooks module to write bespoke type checking code like this:

before do |str, count, separator|
  raise "invalid str" unless str.is_a? String
  raise "invalid count" unless count.is_a? Numeric
  raise "invalid separator" \
    unless separator.is_a? String || separator.nil?
end
after begin |ret|
  raise "invalid return value" unless ret.is_a? String
end
def repeat(str, count, separator: '')
  Array.new(str, count).join(separator)
end

While this may work, it would be time-consuming, error prone, and not particularly easy to maintain. We would like to streamline this so that we write something more compact, like this:

typedef do
  params(
    String, Numeric, separator: String
  ).returns(
    String
  )
end
def repeat(str, count, separator)
  Array.new(count, str).join(separator)
end

There’s a fair bit going on here. Firstly, we’re no longer using before and after. Instead we’re using a hook called typedef. Within the typedef call, we’re also using methods called params and returns.

This is getting a bit closer to the interface offered by Sorbet:

sig do
  params(
    str: String, count: Integer, separator: String
  ).returns(
    Integer
  )
end
def repeat(str, count, separator)
  Array.new(count, str).join(separator)
end

Method Parameters

One of the interested problems here is that we’re using both positional parameters (str and count) and keyword parameters (separator).

Sorbet uses some additional metaprogramming magic to handle these, however, for the purposes of this experiment, we’ll keep things simple by handling positional and keyword parameters separately.

Types

What we end up with is a Types module that provides two methods, params and returns which can be use to specify the types for a method. These methods both return self, which is what allows them to be chained.

Here is the new Types module in all its glory:

module Types
  def params(*arg_types, **kwarg_types)
    @arg_types, @kwarg_types = arg_types, kwarg_types
    self
  end

  def returns(ret_type)
    @ret_type = ret_type
    self
  end

  def typedef
    # Allows typedef to be passed a block to execute within the context
    # of the Types module
    yield
  end

  def method_added(name)
    # short-circuit, to avoid infinite loop
    return unless @arg_types || @kwarg_types || @ret_type

    # reset hook context, but store current values
    arg_types, kwarg_types, ret_type = @arg_types, @kwarg_types, @ret_type
    @arg_types, @kwarg_types, @ret_type = nil, nil, nil

    # capture the original method
    meth = instance_method(name)

    # wrap the original method with type checks
    define_method(name) do |*args, **kwargs, &block|
      # check positional arguments
      arg_types.each_with_index do |type, idx|
        raise "Invalid type for arg #{idx}; expected: #{arg_types[idx]}" \
          unless args[idx].is_a? type
      end

      # check keyword arguments
      kwarg_types.each do |key, type|
        raise "Invalid type for kwarg '#{key}`; expected #{kwarg_types[key]}" \
          unless kwargs[key].is_a? type
      end

      ret = meth.bind(self).call(*args, &block)

      # check return type
      raise "Invalid return type, expected #{ret.name}" \
        unless ret.is_a? ret_type

      ret
    end
  end
end

Putting it together

Now we can implement the Repeater class using our home-grown type checking system:

class Repeater4
  extend Types

  typedef do
    params(
      String, Numeric, separator: String
    ).returns(
      String
    )
  end
  def repeat(str, count, separator: '')
    Array.new(str, count).join(separator)
  end
end

If we pass valid arguments, as per previous examples, we should see the expected output:

test, test, test

But we replace the count argument with a string, we’ll see our type checking in action:

begin
  puts Repeater4.new.repeat("test", "3", separator: ", ")
rescue StandardError => e
  puts "Error: #{e}"
end

The output should look something like:

Error: Invalid type for arg in pos 1; expected: Numeric

Closing Thoughts

If you take away anything from this post, hopefully it is that Ruby allows you to build annotation-like constructs, using features that are part of the language.

These annotations can be evaluated when classes are loaded, and Ruby’s rich support for metaprogramming means that it is possible to do much more than simply annotate methods.

As for the type checker we constructed in this post, it is currently very simplistic, and not particularly useful. However it lays the groundwork for future posts, in which we’ll develop a more complete type checker.

Code

All of the code for this post can be found in my ruby-type-checking repo on GitHub.