Roll Your Own Ruby Type Checking: Part 1
Earlier this year, I was talking with a colleague about Sorbet and other type checking features that are available for Ruby. Although we were with familiar with usage of gradual typing in other languages such as TypeScript and Python, neither of us really knew how this would work under the hood.
This post begins with an example from the Sorbet documentation, and builds up the machinery to support a simplistic type checking implementation. While our implementation will not be production-ready like Sorbet, I hope that the exploration will be enlightening.
Later posts will refine the type checking implementation, making it more versatile, and capable of being used with more sophisticated Ruby programs.
Type Checking
But first, what is Type Checking. Or more specifically, what does it mean to check types?
In a dynamically-typed language such as Ruby, the types of variables, parameters, etc, are not known until runtime. That is, they can vary depending on how your program is run. A variable that is a String
in some cases, may be Numeric
in other cases. The purpose of a type-checker is to detect bugs that may occur due to mis-matching types. An example of this is passing a String
as an argument when a number is expected.
Where does Sorbet fit into this?
Sorbet
Sorbet is a powerful type checker, built specifically for Ruby. It is implemented using Ruby language features, unlike other type checkers, which are often based on pre-processors and other tricks. The Sorbet ecosystem also provides plugins for IDE integration and additional development tooling.
Let’s look at a simple example, showing how Sorbet is used… When we visit the Sorbet website we’re greeting with this little example, illustrating just how easy it is to add type checking to both new and existing code:
class Main
extend T::Sig
sig do
params(
name: String
).returns(
Integer
)
end
def main(name)
puts "Hello, #{name}!"
name.length
end
end
This could also be written more compactly:
class Main
extend T::Sig
sig { params(name: String).returns(Integer) }
def main(name)
puts "Hello, #{name}!"
name.length
end
end
We can see from this example that Sorbet provides type checking via a method called sig
, which looks a bit like an annotation that you might see in a language such as Java.
As is common in the Ruby ecosystem, this one line type annotation is built upon some non-trivial Ruby magic. In this post, we’ll attempt to re-invent our own annotation machinery, and use it to build a simple type checker inspired by Sorbet.
Repeater
For the examples in this post, we’ll work with a toy class called Repeater
. All this class does is repeat a string str
a given number times count
, with each copy of the string separated by separator
. Each version of this class will be given its own number, e.g. Repeater1
, Repeater2
.
The original implementation looks like this:
class Repeater
def repeat(str, count, separator: '') do
Array.new(count, str).join(separator)
end
end
This class is very easy to test from IRB, or from a script:
puts Repeater.new.repeat('test', 3, ', ')
In this case, we’ll get this output:
test, test, test
Now let’s say we want to add some functionality to this class, so that we print its positional arguments before the body is executed, and print out the return value before it is returned to the caller.
Here’s the most direct way we could implement this:
class Repeater1
def repeat(str, count, separator: '')
puts "before: [\"#{str}\", #{count}]"
ret = Array.new(count, str).join(separator)
puts "after: #{ret}"
ret
end
end
Now If we were to run the following:
puts Repeater1.new.repeat('test', 3, separator: ', ')
It would produce:
before: ["test", 3]
after: test, test, test
test, test, test
If we now allow ourselves a little bit of indirection, this can be made more generic by lifting the method body into a lambda:
class Repeater2
def repeat(*args, **kwargs)
puts "before: #{args}"
# lambda literal
fn = -> (str, count, separator: '') do
Array.new(count, str).join(separator)
end
ret = fn.call(*args, **kwargs)
puts "after: #{ret}"
ret
end
end
What is unusual about Repeater2
is that the repeat
method no longer uses the original parameter list. Instead, it uses *args
and **kwargs
. These are called rest parameter, and allow us to capture and print all arguments using puts
. The code from the original repeat
method is captured within a lambda literal, where we see the original parameter list.
If we were to run the following line of code:
puts Repeater2.new.repeat('test', 3, separator: ', ')
It will produce the same output as Repeater1
:
before: ["test", 3]
after: test, test, test
test, test, test
Hooks
Of course, we are working towards an annotation-based style, and would like to simplify Repeater2
so that the code to run before and after the method body is more clearly separated. Ideally something like this:
before ->(*args, **kwargs) do
puts "before: #{args}"
end
after ->(returns) do
puts "after: #{returns}"
end
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
In this snippet, before
is a hook that is called with the arguments supplied to the method, but before the body of the method is executed. And after
is a hook that is called after the body is executed, with its return value.
If we consider that Ruby class declarations are executed from top to bottom, and existing methods can be called from within the class body, then we see that before
and after
could be provided by extending a module:
class MyClass
extend Hooks
# methods from Hooks are available now
# ...
end
How does the Hooks module work?
We can see from the snippets above, that we need to define a module that provides before
and after
methods. We’ll call this module Hooks
. The before
and after
methods are very simple setters that store a lambda in a class variable, so that it can be used later. We’ll call this the hook context for a method.
module Hooks
def before(tag)
@before = tag
end
def after(tag)
@after = tag
end
We can then use Ruby’s powerful meta-programming functionality to enhance method declaration, in such a way that the next method to be defined will execute within our new hook contexr.
We’ll do this by implementing method_added
, which is automatically called whenever a method is added to a class:
def method_added(name)
# short-circuit, to avoid infinite loop
return unless @before || @after
# reset hook context, but store current values
before = @before
@before = nil
after = @after
@after = nil
# capture the original method
meth = instance_method(name)
# wrap the original method with calls to before and after hooks
define_method(name) do |*args, **kwargs, &block|
before.call(*args, **kwargs) if before
ret = meth.bind(self).call(*args, **kwargs, &block)
after.call(ret) if after
ret
end
end
end
define_method
is used to replace the newly added method with a block that executes our before
and after
hooks. We have to be careful to clear @before
and @after
before calling define_method
, otherwise this code will keep triggering method_added
, and we’ll end up with a stack overflow.
We can now implement Repeater3
, extending the Hooks
module:
class Repeater3
extend Hooks
before -> (*args, **kwargs) do
puts "before: #{args}"
end
after -> (returns) do
puts "after: #{returns}"
end
def repeat(str, count, separator: '')
Array.new(count, str).join(separator)
end
end
Lets check that it works as expected, using the same test code as earlier:
puts Repeater3.new.repeat('test', 3, ', ')
This should produce exactly the same output:
before: ["test", 3]
after: test, test, test
test, test, test
Type Checking
The next step is to develop this into a convenient, if somewhat primitive, type checking system.
Our first thought may be to use our Hooks
module to write bespoke type checking code like this:
before do |str, count, separator|
raise "invalid str" unless str.is_a? String
raise "invalid count" unless count.is_a? Numeric
raise "invalid separator" \
unless separator.is_a? String || separator.nil?
end
after begin |ret|
raise "invalid return value" unless ret.is_a? String
end
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
While this may work, it would be time-consuming, error prone, and not particularly easy to maintain. We would like to streamline this so that we write something more compact, like this:
typedef do
params(
String, Numeric, separator: String
).returns(
String
)
end
def repeat(str, count, separator)
Array.new(count, str).join(separator)
end
There’s a fair bit going on here. Firstly, we’re no longer using before
and after
. Instead we’re using a hook called typedef
. Within the typedef
call, we’re also using methods called params
and returns
.
This is getting a bit closer to the interface offered by Sorbet:
sig do
params(
str: String, count: Integer, separator: String
).returns(
Integer
)
end
def repeat(str, count, separator)
Array.new(count, str).join(separator)
end
Method Parameters
One of the interested problems here is that we’re using both positional parameters (str
and count
) and keyword parameters (separator
).
Sorbet uses some additional metaprogramming magic to handle these, however, for the purposes of this experiment, we’ll keep things simple by handling positional and keyword parameters separately.
Types
What we end up with is a Types
module that provides two methods, params
and returns
which can be use to specify the types for a method. These methods both return self
, which is what allows them to be chained.
Here is the new Types
module in all its glory:
module Types
def params(*arg_types, **kwarg_types)
@arg_types, @kwarg_types = arg_types, kwarg_types
self
end
def returns(ret_type)
@ret_type = ret_type
self
end
def typedef
# Allows typedef to be passed a block to execute within the context
# of the Types module
yield
end
def method_added(name)
# short-circuit, to avoid infinite loop
return unless @arg_types || @kwarg_types || @ret_type
# reset hook context, but store current values
arg_types, kwarg_types, ret_type = @arg_types, @kwarg_types, @ret_type
@arg_types, @kwarg_types, @ret_type = nil, nil, nil
# capture the original method
meth = instance_method(name)
# wrap the original method with type checks
define_method(name) do |*args, **kwargs, &block|
# check positional arguments
arg_types.each_with_index do |type, idx|
raise "Invalid type for arg #{idx}; expected: #{arg_types[idx]}" \
unless args[idx].is_a? type
end
# check keyword arguments
kwarg_types.each do |key, type|
raise "Invalid type for kwarg '#{key}`; expected #{kwarg_types[key]}" \
unless kwargs[key].is_a? type
end
ret = meth.bind(self).call(*args, &block)
# check return type
raise "Invalid return type, expected #{ret.name}" \
unless ret.is_a? ret_type
ret
end
end
end
Putting it together
Now we can implement the Repeater
class using our home-grown type checking system:
class Repeater4
extend Types
typedef do
params(
String, Numeric, separator: String
).returns(
String
)
end
def repeat(str, count, separator: '')
Array.new(str, count).join(separator)
end
end
If we pass valid arguments, as per previous examples, we should see the expected output:
test, test, test
But we replace the count
argument with a string, we’ll see our type checking in action:
begin
puts Repeater4.new.repeat("test", "3", separator: ", ")
rescue StandardError => e
puts "Error: #{e}"
end
The output should look something like:
Error: Invalid type for arg in pos 1; expected: Numeric
Closing Thoughts
If you take away anything from this post, hopefully it is that Ruby allows you to build annotation-like constructs, using features that are part of the language.
These annotations can be evaluated when classes are loaded, and Ruby’s rich support for metaprogramming means that it is possible to do much more than simply annotate methods.
As for the type checker we constructed in this post, it is currently very simplistic, and not particularly useful. However it lays the groundwork for future posts, in which we’ll develop a more complete type checker.
Code
All of the code for this post can be found in my ruby-type-checking repo on GitHub.