Roll Your Own Ruby Type Checking: Part 1
Earlier this year, I was talking with a colleague about Sorbet and other type-checking features that are available for Ruby. Although we were with familiar with usage of gradual typing in other languages such as TypeScript and Python, neither of us really knew how this would work under the hood.
This post begins with an example from the Sorbet documentation, and builds up the machinery to support a simplistic type-checking implementation. While our implementation will not be production-ready like Sorbet, I hope that the exploration will be enlightening.
Later posts will refine the type-checking implementation, making it more versatile, and capable of being used with more sophisticated Ruby programs.
Type-checking
But first, what does it mean to check types?
In a dynamically-typed language such as Ruby, the types of variables, parameters, etc, are typically determined at runtime. That is, they can vary depending on how your program is run. A variable that is a String
in some cases, may be Numeric
in other cases. The purpose of a type-checker is to detect bugs that may occur due to mis-matching types. An example of this is passing a String
as an argument when a number is expected.
Where does Sorbet fit into this?
Sorbet
Sorbet is a powerful type-checker, designed specifically for Ruby. It is implemented using Ruby language features, unlike other type-checkers based on pre-processors and other tricks. The Sorbet ecosystem also provides plugins for IDE integration and additional development tooling.
Let’s look at some code… When we visit the Sorbet website we’re greeting with this little example, illustrating just how easy it is to add type-checking to both new and existing code:
extend T::Sig
sig {params(name: String).returns(Integer)}
def main(name)
puts "Hello, #{name}!"
name.length
end
We can see from this example that Sorbet provides type-checking through the use of sig
annotations.
As you would probably expect from the Ruby ecosystem, this little snippet is built upon quite a bit of magic. In this post, we’ll attempt to re-invent our own annotation machinery, and use it to build a simple type-checker.
NOTE: All of this is just one way that type-checking could be implemented. This should not be considered a description of how Sorbet itself works!
Repeater
For the examples in this post, we’ll work with a toy class called Repeater
. All this class does is repeat a string str
a given number times count
, with each copy of the string separated by separator
. Each version of this class will be given its own number, e.g. Repeater1
below:
class Repeater1
def repeat(str, count, separator: '') do
Array.new(count, str).join(separator)
end
end
This class is very easy to test from IRB, or from a script:
puts Repeater1.new.repeat('test', 3, ', ')
In this case, the output will look like:
test, test, test
Now let’s say we want to add some functionality to this class, so that we print its arguments before the body is executed, and print out the return value before it is returned to the caller.
This is how we could implement this:
class Repeater2
def repeat(*args, **kwargs)
puts "before: #{args}, #{kwargs}"
fn = -> (str, count, separator: '') do
Array.new(count, str).join(separator)
end
ret = fn.call(*args)
puts "after: #{ret}"
ret
end
end
What is unusual about this class is that the repeat
method does not directly specify its parameters. Instead, it uses *args
so that we can easily print out the arguments using puts
. This will print the actual arguments, not just those that the method expects. The code from the original repeat
method is captured within a lambda literal, that specifies the actual parameters.
Now If we were to run the following line of code:
puts Repeater2.new.repeat('test', 3, ', ')
It would produce the following output:
before: ["test", 3], {:separator=>", "}
after: test, test, test
test, test, test
Hooks
Of course, we would like to simplify Repeater2
so that the code to run before and after the method body is more clearly separated, using an annotative style. Ideally something like this:
before -> (*args, **kwargs) { puts "before: #{args}, #{kwargs}" }
after -> (returns) { puts "after: #{returns}" }
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
In this snippet, before
is a hook that is called with the arguments supplied to the method, but before the body of the method is executed. And after
is a hook that is called with the value returned by the method.
If we consider that Ruby class declarations are executed from top to bottom, and existing methods can be called from within the class body, then we see that before
and after
could be provided by extending a module.
How does the Hooks module work?
We can see from the snippet above, that we need to define a module that provides before
and after
methods. We’ll call this module Hooks
. before
and after
are very simple setter methods that store a lambda in a class variable, so that it can be used later. We’ll call this the hook context for a method.
module Hooks
def before(tag)
@before = tag
end
def after(tag)
@after = tag
end
We can then use Ruby’s powerful meta-programming functionality to augment method declaration. In this case, we’re making use of method_added
, which is called whenever a method is added to a class:
def method_added(name)
# short-circuit, to avoid infinite loop
return unless @before || @after
# reset hook context, but store current values
before = @before
@before = nil
after = @after
@after = nil
# capture the original method
meth = instance_method(name)
# wrap the original method with calls to before and after hooks
define_method(name) do |*args, &block|
before.call(*args) if before
ret = meth.bind(self).call(*args, &block)
after.call(ret) if after
ret
end
end
end
define_method
is used to replace the newly added method with a block that wraps the original method with calls to our before
and after
hooks. We have to be careful to clear @before
and @after
before calling define_method
, otherwise this code will keep triggering method_added
, and we’ll end up with a stack overflow.
We can finally implement Repeater3
, extending the Hooks
module:
class Repeater3
extend Hooks
before -> (*args, **kwargs) { puts "before: #{args}, #{kwargs}" }
after -> (returns) { puts "after: #{returns}" }
def repeat(str, count, separator: '')
Array.new(count, str).join(separator)
end
end
Lets check that it works as expected, using the same test code as earlier:
puts Repeater3.new.repeat('test', 3, ', ')
This should produce exactly the same output:
before: ["test", 3], {:separator=>", "}
after: test, test, test
test, test, test
Type Checking
The next step is to develop this into a convenient, if somewhat primitive, type-checking system.
Our first thought may be to use our Hooks
module to write bespoke type-checking code like this:
before do |str, count, separator|
raise "invalid str" unless str.is_a? String
raise "invalid count" unless count.is_a? Numeric
raise "invalid separator" \
unless separator.is_a? String || separator.nil?
end
after begin |ret|
raise "invalid return value" unless ret.is_a? String
end
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
While this may work, it would be time-consuming, error prone, and not particularly easy to maintain. We would like to streamline this so that we write something more compact, like this:
typedef { params(String, Numeric, separator: String).returns(String) }
def repeat(str, count, separator)
Array.new(count, str).join(separator)
end
There’s a fair bit going on here. Firstly, we’re no longer using before
and after
. Instead we’re using a hook called typedef
. Within the typedef
call, we’re also using methods called params
and returns
.
This is getting a bit closer to the interface offered by Sorbet:
sig {
params(str: String, count: Integer, separator: String).returns(Integer)
}
def repeat(str, count, separator)
Array.new(count, str).join(separator)
end
Method Parameters
One of the interested problems here is that we’re using both positional parameters (str
and count
) and named/keyword parameters (separator
).
Sorbet uses some additional metaprogramming magic to handle these the same (this is not the only reason, but one that particularly stands out). However, for the purposes of this experiment, we’ll keep things simple by handling them separately.
Types
What we end up with is a Types
module that provides two methods, params
and returns
which can be use to specify the types for a method. These methods both return self
, which is what allows them to be chained.
Here is the new Types
module in all its glory:
module Types
def params(*arg_types, **kwarg_types)
@arg_types, @kwarg_types = arg_types, kwarg_types
self
end
def returns(ret_type)
@ret_type = ret_type
self
end
def typedef
yield
end
def method_added(name)
# short-circuit, to avoid infinite loop
return unless @arg_types || @kwarg_types || @ret_type
# reset hook context, but store current values
arg_types, kwarg_types, ret_type = @arg_types, @kwarg_types, @ret_type
@arg_types, @kwarg_types, @ret_type = nil, nil, nil
# capture the original method
meth = instance_method(name)
# wrap the original method with type checks
define_method(name) do |*args, **kwargs, &block|
# check positional arguments
arg_types.each_with_index do |type, idx|
raise "Invalid type for arg #{idx}; expected: #{arg_types[idx]}" \
unless args[idx].is_a? type
end
# check keyword arguments
kwarg_types.each do |key, type|
raise "Invalid type for kwarg '#{key}`; expected #{kwarg_types[key]}" \
unless kwargs[key].is_a? type
end
ret = meth.bind(self).call(*args, &block)
# check return type
raise "Invalid return type, expected #{ret.name}" \
unless ret.is_a? ret_type
ret
end
end
end
Putting it together
Now we can implement the Repeater
class using our home-grown type-checking system:
class Repeater4
extend Types
typedef { params(String, Numeric, separator: String).returns(String) }
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
end
If we pass valid arguments, as per previous examples, we should see the expected output:
test, test, test
But we replace the count
argument with a string, we’ll see our type-checking in action:
begin
puts Repeater4.new.repeat("test", "3", separator: ", ")
rescue StandardError => e
puts "Error: #{e}"
end
The output should look something like:
Error: Invalid type for arg in pos 1; expected: Numeric
Closing Thoughts
If you take away anything from this post, hopefully it is that Ruby allows you to build annotation-like constructs, using features that are part of the language.
These annotations can be evaluated when classes are loaded, and Ruby’s rich support for metaprogramming means that it is possible to do much more than simply annotate methods.
As for the type-checker we constructed in this post, it is currently very simplistic, and not particularly useful. However it lays the groundwork for future posts, in which we’ll develop a more complete type-checker.
Code
All of the code for this post can be found in my ruby-type-checking repo on GitHub.