Tristan Penman's Blog

Roll Your Own Ruby Type Checking: Part 3

20 May 2023

In parts one and two of this series, we built a basic runtime type checker for Ruby. The design of our type checker is heavily inspired by Sorbet. So much so that we run up against some of the same limitations as Sorbet. This post focuses on one of those limitations - the inability to perform type checks on parameters with default values. To better understand this, we’ll look at several examples in both Ruby and Python.

Ruby Parameter Types

Recall from part two, the six parameter types that our type checker is currently designed to support. We used reflection to derive this list:

  • req - required positional
  • opt - optional positional (has default value)
  • rest - additional positional arguments
  • keyreq - required keyword parameter
  • key - optional keyword parameter
  • keyrest - additional keyword parameters

The two that concern us in this post are opt and key, as both of these represent parameters that have a default value.

So what’s the actual issue here? Well, it turns out that neither Sorbet, nor our type checker, are able to check the types of those default values!

Breaking Sorbet

I originally wanted to title this post Breaking Sorbet and to publish it independently of this series. But it wouldn’t be fair to call out Sorbet for this, as it is a limitation that stems from decisions made during the design of Ruby’s optional parameter types. In fact, I only discovered this issue while testing my own type checker under different scenarios.

To see how this affects type checking in Sorbet, check out the following example:

require 'sorbet-runtime'

class Main
  extend T::Sig

  sig do
    params(kw: String).void
  end
  def self.main(kw = Time.now)
    puts "kw: #{kw}"
    puts "kw.is_a?(String): #{kw.is_a?(String)}"
  end
end

Here we’re using Time.now as the default value for the main method, and this will be evaluated every time the method is called.

First we’ll try calling the main method with an invalid argument:

begin
  Main.main(123)
rescue StandardError => e
  puts "Error: #{e}"
end

This will fail as expected:

Error: Parameter 'kw': Expected type String, got type Integer with value 123
Caller: ./part-3.rb:36
Definition: ./part-3.rb:29

If we call it without an argument, it will use the default value (Time.now) that we’ve defined:

loop do
  Main.main
  sleep 1
end

Somehow, this works:

kw: 2023-05-13 09:49:07 +1000
kw.is_a?(String): false
kw: 2023-05-13 09:49:08 +1000
kw.is_a?(String): false
kw: 2023-05-13 09:49:09 +1000
kw.is_a?(String): false

We can see that Sorbet’s type check is satisfied, even though we can clearly see that the value is not a Ruby String.

But why?

So why is the type check succeeding here?

Recall that our type checker works by wrapping a method. The wrapper knows how to check the types of its arguments, before passing those as arguments to the original method. This works fine when there are arguments to check. But in the case of default values, there arguments are absent. And this means that the Ruby interpreter evaluates the expression for the default value as the original method is called. The only way we would be able to inspect that value would be to inject code into the method, which is far beyond the scope of our type checker.

In a nutshell, default values are evaluated at the time that a method is invoked.

The Python Way

Interestingly, this is not necessarily how other languages handle default values… In Python, for example, default values are evaluated at the time that a method is defined. We can run a little experiment (in Python of course) to see how this behaves:

def arr():
  return [1,2,3]

def test(a = arr()):
  a.append(4)
  print(a)

test()
test([8,9,10])
test()

You might expect this to produce the following:

[1, 2, 3, 4]
[8, 9, 10, 4]
[1, 2, 3, 4]

But instead we get this:

[1, 2, 3, 4]
[8, 9, 10, 4]
[1, 2, 3, 4, 4]

This shows us that:

  1. The array has been passed to the method body by-reference.
  2. The arr() method is only called once, at the time that the test method was defined.

This can lead to bugs that are very confusing and hard to track down, and it is notorious for tripping up novice Python programmers!

Solutions?

Is there anything we can do about this? Well, maybe…

It may be possible with some significant changes to the Ruby interpreter, such as the addition of more powerful reflection and metaprogramming functionality. This would allow purely Ruby-based type checkers greater access to the method invocation process. This could have applications for performance monitoring and other low-level problems.

Another approach that I would like to explore is the use of runtime code generation - effectively reading the bytecode that is generated for a method, and to modify it such that the type checking occurs inside the method body. However this lies far outside the scope of this series, so don’t expect to see that for while!