Roll Your Own Ruby Type Checking: Part 2

13 May 2023

In my last post on Ruby Type Checking, we looked at how metaprogramming could be used to implement an annotation-style syntax in Ruby. We then leveraged these techniques to build a simple runtime type checker, heaviliy inspired by Sorbet. Unfortunately, our type checker is woefully incomplete. In this post, we take another step towards making it useful.

The code for this post can be found in my ruby-type-checking repo on GitHub.

Recap

The examples in the first post were based on a toy class called Repeater. This class includes one method, repeat, whose sole purpose is to repeat a string (str) a given number times (count), with each copy of the string separated by another string (separator). We made this more interesting by deciding that str and count should be positional parameters, while separator would be a keyword parameter.

By the end of the first post, our final type checked example looked like this:

class Repeater4
  extend Types

  typedef do
    params(
      String, Numeric, separator: String
    ).returns(
      String
    )
  end
  def repeat(str, count, separator: '')
    Array.new(str, count).join(separator)
  end
end

Limitations

Our type checker was complete enough to support the Repeater example. However, it is still limited in various ways. In particular, it is unable to represent the full range of parameter types supported by Ruby. At a high level, these include:

Rest Parameters - Rest Parameters allow an arbitrary number of arguments to be passed to a method. e.g. the method def my_func(tag, *args) requires one argument, followed by zero or more additional arguments. Our type checker does not support this.
Optional Parameters - These are positional parameters or keyword parameters with default values.
Block Parameters - Methods can accept a block as an argument, and can call yield to pass control to that block. This is commonly used for methods like each or map on arrays.

Our type system is also quite simplistic, and it isn’t difficult to imagine a couple of ways in which it could be improved:

Compound Types - It would be nice to be able to represent compound types. These are types that come from combining other types, e.g. OneOf(String, Numeric).
Array and Hash Types - It is also common to pass arrays and hashes as method arguments, and we would like some way to represent this, e.g. Array(String) or Hash(key1: String, key2: Numeric).

Addressing all of these limitation would be too much for one post, so we’ll focus on Rest Parameters and Optional Parameters for now. This alone will bring us much closer to having a useful type checker.

Adding support for Blocks Parameters, or Compound, Array and Hash Types will be the topic of a future post.

Rest Parameters

Lets begin with Rest Parameters. If we revisit the code from our first type checker, we can see that we validate positional arguments by iterating over an array of expected argument types:

def check_types(args, kwargs, arg_types, kwarg_types)
  # check positional arguments
  arg_types.each_with_index do |type, idx|
    raise "Invalid type for arg #{idx}; expected: #{arg_types[idx]}" \
      unless args[idx].is_a? type
  end

  # check keyword arguments
  kwarg_types.each do |key, type|
    raise "Invalid type for kwarg '#{key}`; expected #{kwarg_types[key]}" \
      unless kwargs[key].is_a? type
  end
end

Keyword arguments are validated using a similar approach, but in that case we’re iterating over the keys in a Hash.

This implementation assumes that a method will always accept a fixed number of positional arguments, followed by zero or more keyword arguments, like our repeat method:

def repeat(str, count, separator: '')
  Array.new(str, count).join(separator)
end

Unfortunately, this isn’t the complete picture. We may have a method like send_messages below, that accepts a single positional argument, followed by a variable number of positional arguments:

def send_messages(recipient_id, *messages)
  messages.each do |message|
    send_message(recipient_id, message)
  end
end

The asterisk (splat operator) means that, within our method body, messages will actually refer to an array containing all additional positional arguments.

How does Sorbet do it?

Clearly we’ll need to take a different approach, that allows us to annotate *messages with a type, even though it may represent zero or more positional arguments.

The way Sorbet approaches this is to assign types to parameter names, even those that are positional. Therefore, the type for a variable-length argument list such as *messages is a type that must be satisfied by all elements in the array. The Sorbet solution looks like this:

sig do
  params(
    # Integer describes a required positional argument
    recipient_id: Integer,
    # String describes every single value of the `messages` array,
    # since `messages` uses the splat operator
    messages: String
  ).void
end
def send_messages(recipient_id, *messages)
  messages.each do |message|
    send_message(recipient, message)
  end
end

We’ll need to make some changes to our type checker to support this.

Optional Parameters

The other case we are going to address is optional parameters. Optional parameters are those that have default values, and may be omitted. There are two kinds that we need to cater for:

Optional positional parameters - These must come after required positional parameters, but before additional/rest parameters and keyword parameters.
Optional keyword parameters - These must come after positional parameters, but can appear anywhere in a sequence of keyword parameters.

This makes things more complicated. At this stage, it’s worth digging into the different kinds of method parameters that Ruby supports. One way we can do this, is to use reflection to inspect the parameters for a method.

Parameter Types

Here is a simple example, which includes six different parameter types (blocks are excluded). This method uses reflection to print out the type for each of its parameters:

def test1(a, b = 2, *c, d:, e: 6, **f)
  method(__method__).parameters.each do |param|
    puts param.to_s
  end
  nil
end

When we call it:

test1(1, 2, 3, 4, d: 5, e: 6, f: 7, g: 8)

we’ll get a list of parameters and what kind they are:

[:req, :a]
[:opt, :b]
[:rest, :c]
[:keyreq, :d]
[:key, :e]
[:keyrest, :f]

This gives us six kinds:

req - required positional
opt - optional positional (has default value)
rest - additional positional arguments
keyreq - required keyword parameter
key - optional keyword parameter
keyrest - additional keyword parameters

The weird naming here is a reflection of Ruby’s history, with different keyword parameter types being introduced at different times.

Binding

Now we can go a step further, and use reflection to see how arguments are bound to parameter names:

def test2(a, b = 2, *c, d:, e: 6, **f)
  method(__method__).parameters.each do |param|
    puts [
      # parameter name
      param[1].to_s,
      # argument
      binding.local_variable_get(param[1].to_s)
    ].to_s
  end
  nil
end

Let’s see what happens when we omit arguments for any optional parameters:

test2(1, d: 5, f: 7, g: 8)

Running this, we get:

["a", 1]
["b", 2]
["c", []]
["d", 5]
["e", 6]
["f", {:f=>7, :g=>8}]

We can see that b and e have been bound to their default values. We can also see that the rest parameter c is bound to an empty array, as there were no extra positional arguments.

New Type Checker

We can put all of this together to write a new version of our type checker. Here is the core of the Types module:

module Types
  # we accept just a hash of key-values, mapping parameters to argument types
  def params(**arg_types)
    @arg_types = arg_types
    self
  end

  def returns(ret_type)
    @ret_type = ret_type
    self
  end

  def typedef
    yield
  end

  def method_added(name)
    # short-circuit, to avoid infinite loop
    return unless @arg_types || @ret_type

    # reset hook context, but store current values
    arg_types, ret_type = @arg_types, @ret_type
    @arg_types, @ret_type = nil, nil

    # capture the original method
    meth = instance_method(name)
    params = meth.parameters

    # wrap the original method with type checks
    define_method(name) do |*args, **kwargs, &block|
      Helpers::check_positional_args(args, arg_types, params)
      Helpers::check_keyword_args(kwargs, arg_types, params)
      ret = meth.bind(self).call(*args, **kwargs, &block)
      Helpers::check_return_value(ret, ret_type) if ret_type
      ret
    end
  end
end

To make this more maintainable, the actual type checking code has been moved into a Helpers module. It’s here that we implement our new name-based type checker:

module Types
  module Helpers
    class << self
      def check_positional_args(args, arg_types, params)
        arg_type = nil
        args.each_with_index do |arg, idx|
          param = params[idx]
          if param && [:req, :opt, :rest].include?(param[0])
            param_type = param[0]
            param_name = param[1]

            # Updated as long as their are positional parameter names to
            # consume. Once there aren't any more to consume, we must be
            # checking additional arguments, and can keep using whatever
            # the last type was for that.
            arg_type = arg_types[param_name] unless arg_types[param_name].nil?

            # Happens if there are positional arguments without a corresponding
            # type for the rest parameter
            raise "Missing type for #{param[0]} parameter `#{param_name}`" \
              if arg_type.nil?
          end

          raise "Invalid arg at position #{idx}, expected #{arg_type}" \
            unless arg.is_a?(arg_type)
        end
      end

      def check_keyword_args(_kwargs, _arg_types, _params)
        # TODO: not implemented yet
      end

      def check_return_value(ret, ret_type)
        raise "Invalid return type, expected #{ret_type.name}" \
          unless ret.is_a? ret_type
      end
    end
  end
end

Keyword Parameters

Finally, we can tackle validation of keyword parameters, using our new approach. This is slightly trickier than it may sound, as we would like to detect when a method is called with additional keyword arguments that are either unexpected, or of an incorrect type.

To do this, we make a copy of the keyword argument hash supplied to the method, and we remove arguments from this hash as they are determined to be correct. Anything remaining at the end must be an additional keyword argument, and should be validated using the additional keyword argument type (if specified).

We can re-open our Helpers module, to implement this:

module Types
  module Helpers
    class << self
      def check_keyword_args(kwargs, arg_types, params)
        # don't modify the original hash
        kwargs = kwargs.clone

        arg_type = nil
        params.each do |param|
          param_type = param[0]
          param_name = param[1]
          arg_type = arg_types[param_name] unless arg_types[param_name].nil?

          if param_type == :keyrest
            # only have keyrest params left, so we can break out
            break
          elsif param_type == :keyreq
            raise "Invalid value for required kw param `#{param_name}`; expected #{arg_type}" unless \
              kwargs[param_name].is_a?(arg_type)
          elsif param_type == :key
            raise "Invalid value for optional kw param `#{param_name}`; expected #{arg_type}" unless \
              !kwargs.include?(param_name) || kwargs[param_name].is_a?(arg_type)
          else
            raise "Unexpected param type: #{param_type}"
          end

          # make sure we can detect extra kw params when they're not expected
          arg_type = nil
          kwargs.delete(param_name)
        end

        raise 'Unexpected extra kw params' \
          if kwargs.keys.length > 0 && arg_type.nil?

        kwargs.keys.each do |kwarg|
          raise "Invalid value for extra kw param `#{kwarg}`; expected #{arg_type}" unless \
            kwargs[kwarg].is_a?(arg_type)
        end
      end
    end
  end
end

Closing Thoughts

What I hope you take away from this post is that very changes are required to make our type checker work with a wide range of Ruby parameter types. We saw how easy it is to use reflection to ticker with Ruby’s parameter binding logic, and it was by conducting these experiments that we discovered the tricks that would allow us to implement our type checker.

Stay tuned for the next post in this series, in which we’ll explore some of the limits of runtime type checking in Ruby…

Tristan Penman