Tristan Penman's Blog

Preprocessing JSON... using JSON

15 November 2020

Many years ago, I was faced with the problem of constructing CloudFormation templates as part of an automated build pipeline. One of the requirements was that these JSON-formatted templates had to reference AWS resources that belonged to existing CFN stacks. There are many ways to do this, but one of the ideas I experimented with was a JSON templating system, based on simple ‘directives’, inspired by the JSON Reference specification. I’ve recently published the code as a Python package, and thought it would be interesting to write a post about what it can do. The posts finishes up with some thoughts about what it could do…

JSON Reference: the inspiration

The inspiration for this project really came from the JSON Reference specification. The idea of a JSON Reference is that you can include a special object in a JSON Document, which acts as a reference to other JSON content. External content is located using a URI scheme that is extended with JSON Pointers, which can be used to identify specific parts of a JSON document.

Here is a simple example of both concepts:

{ "$ref": "http://example.com/example.json#/foo/bar" }

When a JSON Reference is found in a document, a parser would be expected to fetch the external content, and embed it within the current document.

This got me thinking. What other keywords, or ‘directives’, could be used?

Directives

I don’t know if ‘directive’ is the most intuitive name for this concept, but it was the first thing that came to mind, since I was thinking about the problem in terms of ‘preprocessor directives’.

$exec

THe first directive that came to mind was $exec, which could be used to include the output of an external program in a JSON document, as a string value.

For example:

{
  "timestamp": {
    "$exec": [ "/bin/date" ]
  }
}

The value of an $exec directive is an array containing the path to a binary, and its command line arguments. Program output is captured as a string, and injected into the document. So in the simple example above, we would get a JSON document like this:

{
  "timestamp": "Thu 25 Sep 2014 15:30:40 AEST"
}

A limitation to be aware of is that $exec does not run in a shell, so pipes, redirection, and other shell built-ins are not available.

$join

Now that we can handle external input, we need to manipulate it.

Much like functions that join arrays of strings in various programming languages, the value (i.e. arguments) of a $join directive is an array containing two elements. The first element is an array containing the items to be joined, and the second is an optional delimiter.

As a simple example, we can use the $join directive to combine an array of three strings:

{
  "$join": [
    [ "A", "B", "C" ],
    ", "
  ]
}

This will produce the string:

"A, B, C"

The delimiter is optional, so this would work as well:

{
  "$join": [ "A", "B", "C" ]
}

Yielding:

"ABC"

$join + $exec

On its own, this is not particularly useful. However, when combined with $exec, we can do more interesting things:

{
  "$join": [
    [
      "Current time: ",
      { "$exec": [ "/bin/date" ] }
    ]
  ]
}

Once resolved, this would be produce a string similar to this:

"Current time: Thu 25 Sep 2014 15:30:40 AEST"

$join with arrays

If you choose to use an array as the delimiter, $join will perform array concatenation. This requires that all of the items to be joined resolve to arrays. So, for example, you could use a JSON Reference to concatenate several arrays from external files:

{
  "$join": [
    [
      { "$ref": "file://abc.json" },
      [ "1", "2", "3" ],
      { "$ref": "file://xyz.json" }
    ],
    [ "*", "*" ]
  ]
}

Assuming abc.json and xyz.json contain the arrays ["a", "b", "c"] and ["x", "y", "z"] respectively, the output would look like this:

[ "a", "b", "c", "*", "*", "1", "2", "3", "*", "*", "x", "y", "z" ]

You could also provide an empty array as the delimiter, to perform vanilla concatentation.

$merge

We’ve seen that we can use $exec and $join to work with strings and arrays, but what can we do with objects?

Using a $merge directive, we can combine an array of objects, with the attributes of later objects taking precedence over those provided by earlier objects.

Here is an example of a $merge directive that combines two objects:

{
  "$merge": [
    {
      "A": 1,
      "B": 2
    },
    {
      "B": 3
    }
  ]
}

When resolved, this would produce the following output:

{
  "A": 1
  "B": 3
}

As you would expect, the value of B defined by the first object has been replaced by the value of B in the second object.

JSON References + CloudFormation

Finally, I wanted to explore how JSON References could be used to do more than just reference static documents.

Seeing as my original motivation for writing this library was to automate the construction of CloudFormation (CFN) templates that needed to reference resources existing CFN stacks, I added a new cfn:// URI scheme to the JSON Reference resolver.

This allows CloudFormation resources to be identified using a specific URI format:

cfn://<stack-name>[@region]/<logical-name>[/[attribute]]

Support for this particular URI scheme is not built-in to the library itself, but is instead implemented in a example Python program that is included in the package. I did this to show that it is easy to extend the library in various ways.

Future development?

I decided to keep the initial set of directives very simple. This project had a pretty limited scope, and was intended to demonstrate an idea as much as it was to solve a particular problem. That said, there are clearly other operations that could be implemented as JSON Preprocessor directives.

$cond (and friends)

The most obvious being a conditional operation of some kind ($cond). A simple implementation of this could be based on equality testing directives (i.e. $eq to compare two JSON values), and branch depending on whether or not an expression evaluates to true:

{
  "$cond": [
    { "$eq": [ { "$exec": [ "/some/script" ] }, "expected_value" ] },
    { result: "expression is true" },
    { result: "expression is false" }
  ]
}

This could be end up being quite flexible.

$except

Another potentially useful directive is $except, which could take an array containing two arguments. The first would be an object (or something that resolves to an object) that is to be included in the JSON document. And the second argument would be an array of keys to be removed from that object, or an object containing key-value pairs to be removed from the object.

Here is how I see that working:

{
  "$except": [
    {
      "a": 1,
      "b": 2
    },
    [ "b" ]
  ]
}

This will remove the key “b” from the object, yielding:

{
  "a": "1"
}

When an object is used, only matching key-value pairs would be removed. So this example would produce the same result, because the value associated with the key “a” does not match, but “b” does.

{
  "$except": [
    {
      "a": 1,
      "b": 2
    },
    {
      "a": 0,
      "b": 2
    }
  ]
}

Wrapping up

That’s pretty much all I wanted to share about the project. I’m hoping that now the code is out in the open, someone will find it useful, and that potential collaborators might even come along suggest their own directives.

And in case you missed the link earlier, you can find the code here. Enjoy!