Parse, don't validate — Elixir edition, Part Two
Previously…
It’s been a couple of months since we published our previous post on the topic of “Parsing, not validation”. In it, we sketched out a proposal of what truly smart constructors could look like in Elixir.
We took the approach of separating the deserialization of data on-the-wire
(which can happen anywhere, like in a protocol-specific Controller) from the
construction of our domain objects. Crucially, we wanted to make sure that all
objects which we construct are semantically valid, completely obviating the
need for functions such as validate
or valid?
.
A smart constructor is a function which takes a primitive data type and returns
either {:ok, <the_valid_object>}
when an object can be constructed from the
input, or {:error, <the_reason_for_the_error>}
when the input does not
specify a valid object.
But wait, isn’t that the type of a parser?
Bottom-up
A lot has been written on then topic of parsing in computer science, but we’re going to take a more naïve approach and say that a parser is simply a function that takes an unstructured input and either produces a more structured output, or fails.
We’ll capture this fact using the following type definition:
@type parser(a) :: (any() -> Result.t(a, any()))
That is: a parser of a
is a function that accepts any conceivable input, and
returns either a successfully-parsed type a
, or some kind of error. We’re not
going to dwell on the error type – some interesting things could be shipped in
the error type, but for now we’ll make do with descriptive strings.
Result.t is defined as:
@type t(a, b) :: {:ok, a} | {:error, b}
A simple parser
The following function meets our criteria for being a parser:
def id(input) do
Result.ok(input)
end
Given any input, it simply returns that input as a valid parse result. This one is not immediately useful, so let’s look at a more practical parser.
def number(input) do
case is_number(input) do
true -> Result.ok(input)
false -> Result.error("not a number: #{inspect input}")
end
end
Let’s try it out. Do integer and float inputs satisfy this parser?
iex> number(0)
{:ok, 0}
iex> number(23.23)
{:ok, 23.23}
Sure enough, they do. How about other Elixir data types?
iex> number(%{"hello" => "world"})
{:error, "not a number: %{\"hello\" => \"world\"}"}
iex> number(:ok)
{:error, "not a number: :ok"}
iex> number("234")
{:error, "not a number: \"234\""}
Let’s extend this parser to only accept non-negative numbers.
def non_neg_number(input) do
case is_number(input) and input >= 0 do
true -> Result.ok(input)
false -> Result.error("not a non-negative number: #{inspect input}")
end
end
Now, this parser is strictly more useful, not only detecting a type for us, but also constraining the values to a particular range.
iex> non_neg_number(2)
{:ok, 2}
iex> non_neg_number(-243)
{:error, "not a non-negative number: -243"}
iex> non_neg_number("mary had a little lamb")
{:error, "not a non-negative number: \"mary had a little lamb\""}
For completeness, let’s clamp down the number to the interval [0,100):
def in_range_0_100(input) do
case is_number(input) and input >= 0 and input < 100 do
true -> Result.ok(input)
false -> Result.error("not a number in range [0,100): #{inspect input}")
end
end
This parser will now only succeed when passed a very narrow range of numeric inputs:
iex> in_range_0_100(3)
{:ok, 3}
iex> in_range_0_100(100)
{:error, "not a number in range [0,100): 100"}
iex> in_range_0_100(99.999999)
{:ok, 99.999999}
iex> in_range_0_100([:a, :list])
{:error, "not a number in range [0,100): [:a, :list]"}
Now, if you said that this parser is getting weirdly specific and slightly unwieldy, you’d be right. But if you think it’s still looking quite readable, let’s say we now need to parse both numbers in the range [0,100) and strings which parse to numbers in the same range.
We’ll implement this by piling even more functionality onto a single, specific parser:
def numberlike_in_range_0_100(input) do
case is_number(input) and input >= 0 and input < 100 do
true -> Result.ok(input)
false ->
case is_binary(input) && Float.parse(input) do
{num, ""} when num >=0 and num < 100 -> Result.ok(num)
_ -> Result.error("not numberlike in range [0,100): #{inspect input}")
end
end
end
Sure enough, this does what we expect it to do:
iex> numberlike_in_range_0_100(1)
{:ok, 1}
iex> numberlike_in_range_0_100("1")
{:ok, 1.0}
iex> numberlike_in_range_0_100(-344)
{:error, "not numberlike in range [0,100): -344"}
iex> numberlike_in_range_0_100("-344")
{:error, "not numberlike in range [0,100): \"-344\""}
iex> numberlike_in_range_0_100("99.999999")
{:ok, 99.999999}
iex> numberlike_in_range_0_100(:foo)
{:error, "not numberlike in range [0,100): :foo"}
However, the code is not particularly satisfying to look at and work
with. Additionally, we parse all numbers as Float
s to avoid yet another
nested case
expression or Cartesian case
pattern matching.
The promise of functional programming is that we can use composition over extension to create bigger programs. It would be nice if we could compose many small, generic parsers into a larger, more specific once. It turns our we can apply Railway-oriented programming to do just that.
Our Result module has several interesting functions, but here we’ll take
advantage of one: and_then/2. It helps us write code only for the “happy
path” through our parsers, passing the result of a successful parse to the next
parser, but “bailing out” on the first error. Let’s see how it can help us
refactor our numberlike_in_range_0_100
function.
Breaking out the specifics
We can decompose all our requirements regarding the input, which must be:
-
a number OR a string parseable to a number
-
non-negative
-
smaller than 100
To keep things simple, let’s at first ignore the “string parseable to a number” requirement, and simply line up all our numeric specifications. Like this:
def a_number(input) when is_number(input), do: Result.ok(input)
def a_number(input), do: Result.error("not a number: #{input}")
def non_negative(input) when input >= 0, do: Result.ok(input)
def non_negative(input), do: Result.error("not >= 0: #{input}")
def smaller_than_100(input) when input < 100, do: Result.ok(input)
def smaller_than_100(input), do: Result.error("not < 100: #{input}")
def number_in_range(input) do
a_number(input)
|> Result.and_then(&non_negative/1)
|> Result.and_then(&smaller_than_100/1)
end
Now, let’s try to use this parser:
iex(13)> number_in_range(3)
{:ok, 3}
iex> number_in_range(:atom)
{:error, "not a number: atom"}
iex> number_in_range(1111)
{:error, "not < 100: 1111"}
iex> number_in_range(-1)
{:error, "not >= 0: -1"}
As you can see, if the chain of parsers all return {:ok
, values, then
number_in_range
exits with a successful parse and the final result.
If any of the parsers fail, we bail out and return the failed parser’s error message to the caller.
So now, let’s tackle the hardest requirement, by which the input can be either a number or a string-that-can-be-read-as-a-number. Conceptually, we’d like a parser whose inputs represent the set union of a number parser and a string-to-number-parser. What would this look like?
First of all, we need to be aware that Elixir has separate functions for parsing integers and floats. So our string-to-number-parser itself will need to be a union of a string-to-float parser and a string-to-integer parser. Let’s write the basic building blocks first.
def float_string(input) do
case is_binary(input) && Float.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation of a float: #{inspect input}")
end
end
iex(11)> float_string("23.0")
{:ok, 23.0}
iex(12)> float_string("23")
{:ok, 23.0}
iex(13)> float_string("-23")
{:ok, -23.0}
iex(14)> float_string(23.0)
{:error, "not a string representation of a float: 23.0"}
iex(15)> float_string(%{foo: :bar})
{:error, "not a string representation of a float: %{foo: :bar}"}
Now, let’s write the analogous parser for integers.
def integer_string(input) do
case is_binary(input) && Integer.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation of an integer: #{inspect input}")
end
end
iex(18)> integer_string("234")
{:ok, 234}
iex(19)> integer_string("234.0")
{:error, "not a string representation of an integer: \"234.0\""}
iex(20)> integer_string(1)
{:error, "not a string representation of an integer: 1"}
Okay! These are all the building blocks we need, so let’s now create a parser that represents the union of the three requirements. Let’s see if you can remember them based on the code:
def numeric_string(input) do
union_parser = union([&a_number/1,
&integer_string/1,
&float_string/1])
union_parser.(input)
end
Where does union
come from? Easy, it’s part of the Data.Parser module. It
does exactly what we need: tries to apply each parser it received, in order,
and uses the first successful result OR bails with an appropriate message.
Now, we can take a look at our entire parsing program:
defmodule NumericBetween0and99 do
alias FE.Result
import Data.Parser, only: [union: 1]
def new(input) do
numeric_string(input)
|> Result.and_then(&non_negative/1)
|> Result.and_then(&smaller_than_100/1)
end
defp numeric_string(input) do
union_parser = union([&a_number/1,
&integer_string/1,
&float_string/1])
union_parser.(input)
end
defp integer_string(input) do
case is_binary(input) && Integer.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation "
<>"of an integer: #{inspect input}")
end
end
defp float_string(input) do
case is_binary(input) && Float.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation "
<>"of a float: #{inspect input}")
end
end
defp a_number(input) when is_number(input), do: Result.ok(input)
defp a_number(input), do: Result.error("not a number: #{input}")
defp non_negative(input) when input >= 0, do: Result.ok(input)
defp non_negative(input), do: Result.error("not >= 0: #{input}")
defp smaller_than_100(input) when input < 100, do: Result.ok(input)
defp smaller_than_100(input), do: Result.error("not < 100: #{input}")
end
The new/1
function is our only public function and the main workhorse. It
contains a Result
pipe – our railway-oriented program. The first step of the
pipe says: apply a parser that is the union of a_number
, integer_string
,
and float_string
. If any one of these parse successfully, pass the input on
to non_negative
, and if that too is successful, pass the result to
smaller_than_100
, returning whatever comes out of this last step.
…And presto! We now have a module that encapsulates all our parsing rules
(built rather verbosely by hand for the sake of learning), and exposes just one
function: precisely our smart constructor! Now we can be certain that every
value that is created by NumericBetween0and99.new/1
is exactly what it
claims to be. No more validations!
Here’s how it works:
iex(2)> NumericBetween0and99.new(1)
{:ok, 1}
iex(3)> NumericBetween0and99.new("99.2345")
{:ok, 99.2345}
iex(4)> NumericBetween0and99.new("-234")
{:error, "not >= 0: -234"}
And if we try to parse input which does not satisfy our union parser, we get a
swanky detailed DomainError
structure:
iex(5)> NumericBetween0and99.new(:hello)
{:error,
%Error.DomainError{
caused_by: :nothing,
details: %{
input: :hello,
parsers: [#Function<2.78889351/1 in NumericBetween0and99.numeric_string/1>,
#Function<3.78889351/1 in NumericBetween0and99.numeric_string/1>,
#Function<4.78889351/1 in NumericBetween0and99.numeric_string/1>]
},
reason: :no_parser_applies
}}
Leveraging built-in Data.Parser
s
We did a lot of work building out our smart constructor by hand, but we can
save ourselves a lot of time by using the parsers bundled with the Data
package.
In particular, the parser generator function predicate/2
will take any
predicate function and map it onto parser semantics. The first argument to
predicate/2
is our predicate function, while the second is either a default
error OR a function that receives the bad input. We’ll leverage that second
argument to retain our nice error messages, this time in the form of tagged
tuples.
Our module can be golfed like this:
defmodule NumericBetween0and99 do
alias FE.Result
import Data.Parser, only: [union: 1, predicate: 2]
def new(input) do
union([predicate(&is_number/1, &({:not_a_number, &1})),
&integer_string/1,
&float_string/1]).(input)
|> Result.and_then(predicate(&(&1>=0), &({:is_negative, &1})))
|> Result.and_then(predicate(&(&1<100), &({:is_100_or_more, &1})))
end
defp integer_string(input) do
case is_binary(input) && Integer.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation "
<>"of an integer: #{inspect input}")
end
end
defp float_string(input) do
case is_binary(input) && Float.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation "
<>"of a float: #{inspect input}")
end
end
end
iex(3)> NumericBetween0and99.new(333)
{:error, {:is_100_or_more, 333}}
iex(4)> NumericBetween0and99.new("33")
{:ok, 33}
iex(5)> NumericBetween0and99.new("-33.3")
{:error, {:is_negative, -33.3}}
Looks pretty good, but we still have those repetitive float_string
and
integer_string
functions. They only differ in the Elixir module used for the
parse/1
call, so let’s convert them to the new style of parser. The
interface is quite simple: return a function that takes a single argument and returns a Result.t()
.
def string_repr_of(mod) when mod in [Integer, Float] do
fn input ->
case is_binary(input) && mod.parse(input) do
{n, ""} -> Result.ok(n)
_ -> Result.error("not a string representation "
<>"of a #{inspect mod}: #{inspect input}")
end
end
end
Now, we can partially apply the string_repr_of/1
function to a module name
and receive a nice parser. It works like this:
iex> string_repr_of(Float).("-1.1")
{:ok, -1.1}
iex> string_repr_of(Float).(:foo)
{:error, "not a string representation of a Float: :foo"}
iex> string_repr_of(Integer).("1.1")
{:error, "not a string representation of a Integer: \"1.1\""}
iex> string_repr_of(Integer).("1")
{:ok, 1}
…And it turns out that our string_repr_of
parser-builder is part of the
BuiltIn suite of parsers that come with Data.Parser
. It’s called
string_of
and returns a more detailed error struct on failure, but works the
same, in principle.
We can now compose all these parsers together and arrive at a rather succinct smart constructor:
defmodule NumericBetween0and99 do
alias FE.Result
import Data.Parser, only: [union: 1, predicate: 2]
import Data.Parser.BuiltIn, only: [string_of: 1]
def new(input) do
union([predicate(&is_number/1, &({:not_a_number, &1})),
string_of(Integer),
string_of(Float)]).(input)
|> Result.and_then(predicate(&(&1>=0), &({:is_negative, &1})))
|> Result.and_then(predicate(&(&1<100), &({:is_100_or_more, &1})))
end
end
Summing it up
We’ve gone from manually specifying a railway-oriented parsing function for our very special numeric data point to leveraging the power of parser combinators — and specifying the same data point in a more declarative fashion.
You might rightly complain that the original code is shorter, has less dependencies, and does its job faster than our declarative smart constructor.
In this artificial scenario, it might indeed be more prudent to go with the more straightforward approach – we’ve overengineered our numeric bracket function quite a bit indeed.
However, as we hope to show you in the next installment, the declarative method really shines when the data to parse becomes more complex, or exhibits internal dependencies. Stay tuned as we turn to parse, not validate, some very gnarly JSON payloads into correct-by-construction Elixir structs!