Parse, don't validate — Elixir edition, Part Two

August 15, 2020

Previously…

It’s been a couple of months since we published our previous post on the topic of “Parsing, not validation”. In it, we sketched out a proposal of what truly smart constructors could look like in Elixir.

We took the approach of separating the deserialization of data on-the-wire (which can happen anywhere, like in a protocol-specific Controller) from the construction of our domain objects. Crucially, we wanted to make sure that all objects which we construct are semantically valid, completely obviating the need for functions such as validate or valid?.

A smart constructor is a function which takes a primitive data type and returns either {:ok, <the_valid_object>} when an object can be constructed from the input, or {:error, <the_reason_for_the_error>} when the input does not specify a valid object.

But wait, isn’t that the type of a parser?

Bottom-up

A lot has been written on then topic of parsing in computer science, but we’re going to take a more naïve approach and say that a parser is simply a function that takes an unstructured input and either produces a more structured output, or fails.

We’ll capture this fact using the following type definition:

@type parser(a) :: (any() -> Result.t(a, any()))

That is: a parser of a is a function that accepts any conceivable input, and returns either a successfully-parsed type a, or some kind of error. We’re not going to dwell on the error type – some interesting things could be shipped in the error type, but for now we’ll make do with descriptive strings.

Result.t is defined as:

@type t(a, b) :: {:ok, a} | {:error, b}

A simple parser

The following function meets our criteria for being a parser:

def id(input) do
  Result.ok(input)
end

Given any input, it simply returns that input as a valid parse result. This one is not immediately useful, so let’s look at a more practical parser.

def number(input) do
  case is_number(input) do
    true -> Result.ok(input)
    false -> Result.error("not a number: #{inspect input}")
  end
end

Let’s try it out. Do integer and float inputs satisfy this parser?

iex> number(0)
{:ok, 0}
iex> number(23.23)
{:ok, 23.23}

Sure enough, they do. How about other Elixir data types?

iex> number(%{"hello" => "world"})
{:error, "not a number: %{\"hello\" => \"world\"}"}
iex> number(:ok)
{:error, "not a number: :ok"}
iex> number("234")
{:error, "not a number: \"234\""}

Let’s extend this parser to only accept non-negative numbers.

def non_neg_number(input) do
  case is_number(input) and input >= 0 do
    true -> Result.ok(input)
    false -> Result.error("not a non-negative number: #{inspect input}")
  end
end

Now, this parser is strictly more useful, not only detecting a type for us, but also constraining the values to a particular range.

iex> non_neg_number(2)
{:ok, 2}
iex> non_neg_number(-243)
{:error, "not a non-negative number: -243"}
iex> non_neg_number("mary had a little lamb")
{:error, "not a non-negative number: \"mary had a little lamb\""}

For completeness, let’s clamp down the number to the interval [0,100):

def in_range_0_100(input) do
  case is_number(input) and input >= 0 and input < 100 do
    true -> Result.ok(input)
    false -> Result.error("not a number in range [0,100): #{inspect input}")
  end
end

This parser will now only succeed when passed a very narrow range of numeric inputs:

iex> in_range_0_100(3)
{:ok, 3}
iex> in_range_0_100(100)
{:error, "not a number in range [0,100): 100"}
iex> in_range_0_100(99.999999)
{:ok, 99.999999}
iex> in_range_0_100([:a, :list])
{:error, "not a number in range [0,100): [:a, :list]"}

Now, if you said that this parser is getting weirdly specific and slightly unwieldy, you’d be right. But if you think it’s still looking quite readable, let’s say we now need to parse both numbers in the range [0,100) and strings which parse to numbers in the same range.

We’ll implement this by piling even more functionality onto a single, specific parser:

def numberlike_in_range_0_100(input) do
  case is_number(input) and input >= 0 and input < 100 do
    true -> Result.ok(input)
    false ->
      case is_binary(input) && Float.parse(input) do
        {num, ""} when num >=0 and num < 100 -> Result.ok(num)
        _ -> Result.error("not numberlike in range [0,100): #{inspect input}")
      end
  end
end

Sure enough, this does what we expect it to do:

iex> numberlike_in_range_0_100(1)
{:ok, 1}
iex> numberlike_in_range_0_100("1")
{:ok, 1.0}
iex> numberlike_in_range_0_100(-344)
{:error, "not numberlike in range [0,100): -344"}
iex> numberlike_in_range_0_100("-344")
{:error, "not numberlike in range [0,100): \"-344\""}
iex> numberlike_in_range_0_100("99.999999")
{:ok, 99.999999}
iex> numberlike_in_range_0_100(:foo)
{:error, "not numberlike in range [0,100): :foo"}

However, the code is not particularly satisfying to look at and work with. Additionally, we parse all numbers as Floats to avoid yet another nested case expression or Cartesian case pattern matching.

The promise of functional programming is that we can use composition over extension to create bigger programs. It would be nice if we could compose many small, generic parsers into a larger, more specific once. It turns our we can apply Railway-oriented programming to do just that.

Our Result module has several interesting functions, but here we’ll take advantage of one: and_then/2. It helps us write code only for the “happy path” through our parsers, passing the result of a successful parse to the next parser, but “bailing out” on the first error. Let’s see how it can help us refactor our numberlike_in_range_0_100 function.

Breaking out the specifics

We can decompose all our requirements regarding the input, which must be:

a number OR a string parseable to a number
non-negative
smaller than 100

To keep things simple, let’s at first ignore the “string parseable to a number” requirement, and simply line up all our numeric specifications. Like this:

def a_number(input) when is_number(input), do: Result.ok(input)
def a_number(input), do: Result.error("not a number: #{input}")

def non_negative(input) when input >= 0, do: Result.ok(input)
def non_negative(input), do: Result.error("not >= 0: #{input}")

def smaller_than_100(input) when input < 100, do: Result.ok(input)
def smaller_than_100(input), do: Result.error("not < 100: #{input}")

def number_in_range(input) do
  a_number(input)
  |> Result.and_then(&non_negative/1)
  |> Result.and_then(&smaller_than_100/1)
end

Now, let’s try to use this parser:

iex(13)> number_in_range(3)
{:ok, 3}
iex> number_in_range(:atom)
{:error, "not a number: atom"}
iex> number_in_range(1111)
{:error, "not < 100: 1111"}
iex> number_in_range(-1)
{:error, "not >= 0: -1"}

As you can see, if the chain of parsers all return {:ok, values, then number_in_range exits with a successful parse and the final result.

If any of the parsers fail, we bail out and return the failed parser’s error message to the caller.

So now, let’s tackle the hardest requirement, by which the input can be either a number or a string-that-can-be-read-as-a-number. Conceptually, we’d like a parser whose inputs represent the set union of a number parser and a string-to-number-parser. What would this look like?

First of all, we need to be aware that Elixir has separate functions for parsing integers and floats. So our string-to-number-parser itself will need to be a union of a string-to-float parser and a string-to-integer parser. Let’s write the basic building blocks first.

def float_string(input) do
  case is_binary(input) && Float.parse(input) do
    {n, ""} -> Result.ok(n)
    _ -> Result.error("not a string representation of a float: #{inspect input}")
  end
end

iex(11)> float_string("23.0")
{:ok, 23.0}
iex(12)> float_string("23")
{:ok, 23.0}
iex(13)> float_string("-23")
{:ok, -23.0}
iex(14)> float_string(23.0)
{:error, "not a string representation of a float: 23.0"}
iex(15)> float_string(%{foo: :bar})
{:error, "not a string representation of a float: %{foo: :bar}"}

Now, let’s write the analogous parser for integers.

def integer_string(input) do
  case is_binary(input) && Integer.parse(input) do
    {n, ""} -> Result.ok(n)
    _ -> Result.error("not a string representation of an integer: #{inspect input}")
  end
end

iex(18)> integer_string("234")
{:ok, 234}
iex(19)> integer_string("234.0")
{:error, "not a string representation of an integer: \"234.0\""}
iex(20)> integer_string(1)
{:error, "not a string representation of an integer: 1"}

Okay! These are all the building blocks we need, so let’s now create a parser that represents the union of the three requirements. Let’s see if you can remember them based on the code:

def numeric_string(input) do
  union_parser = union([&a_number/1,
                        &integer_string/1,
                        &float_string/1])
  union_parser.(input)
end

Where does union come from? Easy, it’s part of the Data.Parser module. It does exactly what we need: tries to apply each parser it received, in order, and uses the first successful result OR bails with an appropriate message.

Now, we can take a look at our entire parsing program:

defmodule NumericBetween0and99 do
  alias FE.Result
  import Data.Parser, only: [union: 1]

  def new(input) do
    numeric_string(input)
    |> Result.and_then(&non_negative/1)
    |> Result.and_then(&smaller_than_100/1)
  end

  defp numeric_string(input) do
    union_parser = union([&a_number/1,
                          &integer_string/1,
                          &float_string/1])
    union_parser.(input)
  end

  defp integer_string(input) do
    case is_binary(input) && Integer.parse(input) do
      {n, ""} -> Result.ok(n)
      _ -> Result.error("not a string representation "
                      <>"of an integer: #{inspect input}")
    end
  end

  defp float_string(input) do
    case is_binary(input) && Float.parse(input) do
      {n, ""} -> Result.ok(n)
      _ -> Result.error("not a string representation "
                      <>"of a float: #{inspect input}")
    end
  end

  defp a_number(input) when is_number(input), do: Result.ok(input)
  defp a_number(input), do: Result.error("not a number: #{input}")

  defp non_negative(input) when input >= 0, do: Result.ok(input)
  defp non_negative(input), do: Result.error("not >= 0: #{input}")

  defp smaller_than_100(input) when input < 100, do: Result.ok(input)
  defp smaller_than_100(input), do: Result.error("not < 100: #{input}")
end

The new/1 function is our only public function and the main workhorse. It contains a Result pipe – our railway-oriented program. The first step of the pipe says: apply a parser that is the union of a_number, integer_string, and float_string. If any one of these parse successfully, pass the input on to non_negative, and if that too is successful, pass the result to smaller_than_100, returning whatever comes out of this last step.

…And presto! We now have a module that encapsulates all our parsing rules (built rather verbosely by hand for the sake of learning), and exposes just one function: precisely our smart constructor! Now we can be certain that every value that is created by NumericBetween0and99.new/1 is exactly what it claims to be. No more validations!

Here’s how it works:

iex(2)> NumericBetween0and99.new(1)
{:ok, 1}
iex(3)> NumericBetween0and99.new("99.2345")
{:ok, 99.2345}
iex(4)> NumericBetween0and99.new("-234")
{:error, "not >= 0: -234"}

And if we try to parse input which does not satisfy our union parser, we get a swanky detailed DomainError structure:

iex(5)> NumericBetween0and99.new(:hello)
{:error,
 %Error.DomainError{
   caused_by: :nothing,
   details: %{
     input: :hello,
     parsers: [#Function<2.78889351/1 in NumericBetween0and99.numeric_string/1>,
      #Function<3.78889351/1 in NumericBetween0and99.numeric_string/1>,
      #Function<4.78889351/1 in NumericBetween0and99.numeric_string/1>]
   },
   reason: :no_parser_applies
 }}

Leveraging built-in `Data.Parser`s

We did a lot of work building out our smart constructor by hand, but we can save ourselves a lot of time by using the parsers bundled with the Data package.

In particular, the parser generator function predicate/2 will take any predicate function and map it onto parser semantics. The first argument to predicate/2 is our predicate function, while the second is either a default error OR a function that receives the bad input. We’ll leverage that second argument to retain our nice error messages, this time in the form of tagged tuples.

Our module can be golfed like this:

defmodule NumericBetween0and99 do
  alias FE.Result
  import Data.Parser, only: [union: 1, predicate: 2]

  def new(input) do
    union([predicate(&is_number/1, &({:not_a_number, &1})),
           &integer_string/1,
           &float_string/1]).(input)
    |> Result.and_then(predicate(&(&1>=0), &({:is_negative, &1})))
    |> Result.and_then(predicate(&(&1<100), &({:is_100_or_more, &1})))
  end

  defp integer_string(input) do
    case is_binary(input) && Integer.parse(input) do
      {n, ""} -> Result.ok(n)
      _ -> Result.error("not a string representation "
                      <>"of an integer: #{inspect input}")
    end
  end

  defp float_string(input) do
    case is_binary(input) && Float.parse(input) do
      {n, ""} -> Result.ok(n)
      _ -> Result.error("not a string representation "
                      <>"of a float: #{inspect input}")
    end
  end
end

iex(3)> NumericBetween0and99.new(333)
{:error, {:is_100_or_more, 333}}
iex(4)> NumericBetween0and99.new("33")
{:ok, 33}
iex(5)> NumericBetween0and99.new("-33.3")
{:error, {:is_negative, -33.3}}

Looks pretty good, but we still have those repetitive float_string and integer_string functions. They only differ in the Elixir module used for the parse/1 call, so let’s convert them to the new style of parser. The interface is quite simple: return a function that takes a single argument and returns a Result.t().

def string_repr_of(mod) when mod in [Integer, Float] do
  fn input ->
    case is_binary(input) && mod.parse(input) do
      {n, ""} -> Result.ok(n)
      _ -> Result.error("not a string representation "
                      <>"of a #{inspect mod}: #{inspect input}")
    end
  end
end

Now, we can partially apply the string_repr_of/1 function to a module name and receive a nice parser. It works like this:

iex> string_repr_of(Float).("-1.1")
{:ok, -1.1}
iex> string_repr_of(Float).(:foo)
{:error, "not a string representation of a Float: :foo"}
iex> string_repr_of(Integer).("1.1")
{:error, "not a string representation of a Integer: \"1.1\""}
iex> string_repr_of(Integer).("1")
{:ok, 1}

…And it turns out that our string_repr_of parser-builder is part of the BuiltIn suite of parsers that come with Data.Parser. It’s called string_of and returns a more detailed error struct on failure, but works the same, in principle.

We can now compose all these parsers together and arrive at a rather succinct smart constructor:

defmodule NumericBetween0and99 do
  alias FE.Result
  import Data.Parser, only: [union: 1, predicate: 2]
  import Data.Parser.BuiltIn, only: [string_of: 1]

  def new(input) do
    union([predicate(&is_number/1, &({:not_a_number, &1})),
           string_of(Integer),
           string_of(Float)]).(input)
    |> Result.and_then(predicate(&(&1>=0), &({:is_negative, &1})))
    |> Result.and_then(predicate(&(&1<100), &({:is_100_or_more, &1})))
  end
end

Summing it up

We’ve gone from manually specifying a railway-oriented parsing function for our very special numeric data point to leveraging the power of parser combinators — and specifying the same data point in a more declarative fashion.

You might rightly complain that the original code is shorter, has less dependencies, and does its job faster than our declarative smart constructor.

In this artificial scenario, it might indeed be more prudent to go with the more straightforward approach – we’ve overengineered our numeric bracket function quite a bit indeed.

However, as we hope to show you in the next installment, the declarative method really shines when the data to parse becomes more complex, or exhibits internal dependencies. Stay tuned as we turn to parse, not validate, some very gnarly JSON payloads into correct-by-construction Elixir structs!