Roll Your Own Ruby Type Checking: Part 2
In my last post on Ruby Type Checking, we looked at how metaprogramming could be used to implement an annotation-style syntax in Ruby. We then leveraged these techniques to build a simple runtime type checker, heaviliy inspired by Sorbet. Unfortunately, our type checker is woefully incomplete. In this post, we take another step towards making it useful.
Recap
The examples in the first post were based on a toy class called Repeater
. This class includes one method, repeat
, whose sole purpose is to repeat a string (str
) a given number times (count
), with each copy of the string separated by another string (separator
). We made this more interesting by deciding that str
and count
should be positional parameters, while separator
would be a keyword parameter.
By the end of the first post, our final type checked example looked like this:
class Repeater4
extend Types
typedef do
params(
String, Numeric, separator: String
).returns(
String
)
end
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
end
Limitations
Our type checker was complete enough to support the Repeater
example. However, it is still limited in various ways. In particular, it is unable to represent the full range of parameter types supported by Ruby. At a high level, these include:
- Rest Parameters - Rest Parameters allow an arbitrary number of arguments to be passed to a method. e.g. the method
def my_func(tag, *args)
requires one argument, followed by zero or more additional arguments. Our type checker does not support this. - Optional Parameters - These are positional parameters or keyword parameters with default values.
- Block Parameters - Methods can accept a block as an argument, and can call
yield
to pass control to that block. This is commonly used for methods likeeach
ormap
on arrays.
Our type system is also quite simplistic, and it isn’t difficult to imagine a couple of ways in which it could be improved:
- Compound Types - It would be nice to be able to represent compound types. These are types that come from combining other types, e.g.
OneOf(String, Numeric)
. - Array and Hash Types - It is also common to pass arrays and hashes as method arguments, and we would like some way to represent this, e.g.
Array(String)
orHash(key1: String, key2: Numeric)
.
Addressing all of these limitation would be too much for one post, so we’ll focus on Rest Parameters and Optional Parameters for now. This alone will bring us much closer to having a useful type checker.
Adding support for Blocks Parameters, or Compound, Array and Hash Types will be the topic of a future post.
Rest Parameters
Lets begin with Rest Parameters. If we revisit the code from our first type checker, we can see that we validate positional arguments by iterating over an array of expected argument types:
def check_types(args, kwargs, arg_types, kwarg_types)
# check positional arguments
arg_types.each_with_index do |type, idx|
raise "Invalid type for arg #{idx}; expected: #{arg_types[idx]}" \
unless args[idx].is_a? type
end
# check keyword arguments
kwarg_types.each do |key, type|
raise "Invalid type for kwarg '#{key}`; expected #{kwarg_types[key]}" \
unless kwargs[key].is_a? type
end
end
Keyword arguments are validated using a similar approach, but in that case we’re iterating over the keys in a Hash.
This implementation assumes that a method will always accept a fixed number of positional arguments, followed by zero or more keyword arguments, like our repeat
method:
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
Unfortunately, this isn’t the complete picture. We may have a method like send_messages
below, that accepts a single positional argument, followed by a variable number of positional arguments:
def send_messages(recipient_id, *messages)
messages.each do |message|
send_message(recipient_id, message)
end
end
The asterisk (splat operator) means that, within our method body, messages
will actually refer to an array containing all additional positional arguments.
How does Sorbet do it?
Clearly we’ll need to take a different approach, that allows us to annotate *messages
with a type, even though it may represent zero or more positional arguments.
The way Sorbet approaches this is to assign types to parameter names, even those that are positional. Therefore, the type for a variable-length argument list such as *messages
is a type that must be satisfied by all elements in the array. The Sorbet solution looks like this:
sig do
params(
# Integer describes a required positional argument
recipient_id: Integer,
# String describes every single value of the `messages` array,
# since `messages` uses the splat operator
messages: String
).void
end
def send_messages(recipient_id, *messages)
messages.each do |message|
send_message(recipient, message)
end
end
We’ll need to make some changes to our type checker to support this.
Optional Parameters
The other case we are going to address is optional parameters. Optional parameters are those that have default values, and may be omitted. There are two kinds that we need to cater for:
- Optional positional parameters - These must come after required positional parameters, but before additional/rest parameters and keyword parameters.
- Optional keyword parameters - These must come after positional parameters, but can appear anywhere in a sequence of keyword parameters.
This makes things more complicated. At this stage, it’s worth digging into the different kinds of method parameters that Ruby supports. One way we can do this, is to use reflection to inspect the parameters for a method.
Parameter Types
Here is a simple example, which includes six different parameter types (blocks are excluded). This method uses reflection to print out the type for each of its parameters:
def test1(a, b = 2, *c, d:, e: 6, **f)
method(__method__).parameters.each do |param|
puts param.to_s
end
nil
end
When we call it:
test1(1, 2, 3, 4, d: 5, e: 6, f: 7, g: 8)
we’ll get a list of parameters and what kind they are:
[:req, :a]
[:opt, :b]
[:rest, :c]
[:keyreq, :d]
[:key, :e]
[:keyrest, :f]
This gives us six kinds:
req
- required positionalopt
- optional positional (has default value)rest
- additional positional argumentskeyreq
- required keyword parameterkey
- optional keyword parameterkeyrest
- additional keyword parameters
The weird naming here is a reflection of Ruby’s history, with different keyword parameter types being introduced at different times.
Binding
Now we can go a step further, and use reflection to see how arguments are bound to parameter names:
def test2(a, b = 2, *c, d:, e: 6, **f)
method(__method__).parameters.each do |param|
puts [
# parameter name
param[1].to_s,
# argument
binding.local_variable_get(param[1].to_s)
].to_s
end
nil
end
Let’s see what happens when we omit arguments for any optional parameters:
test2(1, d: 5, f: 7, g: 8)
Running this, we get:
["a", 1]
["b", 2]
["c", []]
["d", 5]
["e", 6]
["f", {:f=>7, :g=>8}]
We can see that b
and e
have been bound to their default values. We can also see that the rest parameter c
is bound to an empty array, as there were no extra positional arguments.
New Type Checker
We can put all of this together to write a new version of our type checker. Here is the core of the Types
module:
module Types
# we accept just a hash of key-values, mapping parameters to argument types
def params(**arg_types)
@arg_types = arg_types
self
end
def returns(ret_type)
@ret_type = ret_type
self
end
def typedef
yield
end
def method_added(name)
# short-circuit, to avoid infinite loop
return unless @arg_types || @ret_type
# reset hook context, but store current values
arg_types, ret_type = @arg_types, @ret_type
@arg_types, @ret_type = nil, nil
# capture the original method
meth = instance_method(name)
params = meth.parameters
# wrap the original method with type checks
define_method(name) do |*args, **kwargs, &block|
Helpers::check_positional_args(args, arg_types, params)
Helpers::check_keyword_args(kwargs, arg_types, params)
ret = meth.bind(self).call(*args, **kwargs, &block)
Helpers::check_return_value(ret, ret_type) if ret_type
ret
end
end
end
To make this more maintainable, the actual type checking code has been moved into a Helpers
module. It’s here that we implement our new name-based type checker:
module Types
module Helpers
class << self
def check_positional_args(args, arg_types, params)
arg_type = nil
args.each_with_index do |arg, idx|
param = params[idx]
if param && [:req, :opt, :rest].include?(param[0])
param_type = param[0]
param_name = param[1]
# Updated as long as their are positional parameter names to
# consume. Once there aren't any more to consume, we must be
# checking additional arguments, and can keep using whatever
# the last type was for that.
arg_type = arg_types[param_name] unless arg_types[param_name].nil?
# Happens if there are positional arguments without a corresponding
# type for the rest parameter
raise "Missing type for #{param[0]} parameter `#{param_name}`" \
if arg_type.nil?
end
raise "Invalid arg at position #{idx}, expected #{arg_type}" \
unless arg.is_a?(arg_type)
end
end
def check_keyword_args(_kwargs, _arg_types, _params)
# TODO: not implemented yet
end
def check_return_value(ret, ret_type)
raise "Invalid return type, expected #{ret_type.name}" \
unless ret.is_a? ret_type
end
end
end
end
Keyword Parameters
Finally, we can tackle validation of keyword parameters, using our new approach. This is slightly trickier than it may sound, as we would like to detect when a method is called with additional keyword arguments that are either unexpected, or of an incorrect type.
To do this, we make a copy of the keyword argument hash supplied to the method, and we remove arguments from this hash as they are determined to be correct. Anything remaining at the end must be an additional keyword argument, and should be validated using the additional keyword argument type (if specified).
We can re-open our Helpers module, to implement this:
module Types
module Helpers
class << self
def check_keyword_args(kwargs, arg_types, params)
# don't modify the original hash
kwargs = kwargs.clone
arg_type = nil
params.each do |param|
param_type = param[0]
param_name = param[1]
arg_type = arg_types[param_name] unless arg_types[param_name].nil?
if param_type == :keyrest
# only have keyrest params left, so we can break out
break
elsif param_type == :keyreq
raise "Invalid value for required kw param `#{param_name}`; expected #{arg_type}" unless \
kwargs[param_name].is_a?(arg_type)
elsif param_type == :key
raise "Invalid value for optional kw param `#{param_name}`; expected #{arg_type}" unless \
!kwargs.include?(param_name) || kwargs[param_name].is_a?(arg_type)
else
raise "Unexpected param type: #{param_type}"
end
# make sure we can detect extra kw params when they're not expected
arg_type = nil
kwargs.delete(param_name)
end
raise 'Unexpected extra kw params' \
if kwargs.keys.length > 0 && arg_type.nil?
kwargs.keys.each do |kwarg|
raise "Invalid value for extra kw param `#{kwarg}`; expected #{arg_type}" unless \
kwargs[kwarg].is_a?(arg_type)
end
end
end
end
end
Closing Thoughts
Hopefully what you will take away from this post is that there weren’t all that many changes required, to make our type checker work with a wider range of Ruby parameter types. We also saw how easy it is to use reflection, to tinker directly with Ruby’s parameter binding logic. It was by conducting these experiments that we discovered the tricks we would need to implement our type checker.
Code
As before, all of the code for this post can be found in my ruby-type-checking repo on GitHub.
Next Time…
We’ve covered a lot of ground today, but there are a number of topics that we are yet to explore, which were mentioned earlier in this post. These include Block Parameters, Compound Types, as well as Array and Hash Types. These will all be explored in future posts.
In the next post in this series, we’ll explore some of the limits of runtime type checking in Ruby…