Parse, don't validate — Elixir edition
Rationale
For historical reasons, web stacks today often rely on something called a ‘validation’ step for data integrity. This ‘validation step’ checks that incoming data meets particular criteria before being persisted in the app. We’d like to present a completely different approach, inspired by the clean code methodology, DDD, and continuing the line of thought presented in “Parse, don’t validate”.
Why are we trying to convince you to abandon persistence-related validations (and all validations in general)?
Here’s why:
-
If you’re validating your data just before you persist it, it means that all the layers above the persistence layer have already been exposed to potentially malicious or malformed payloads. Parts of your application may have already acted on these bad payloads, too. This is bad for business domain coherence and very bad for security.
-
If you’re tying your data validation to the particular method of its storage, you’re unnecessarily coupling what should be pure business logic with the arbitrary and non-business-related requirements of your DB technology. This also means that you’re in uncharted territory when you want diversity and granularity in your storage mechanisms. You might have to use different validation passes provided by different libraries, depending on the backend you’re using for storage.
-
If you’re even thinking about validation as a discrete step, you’ve already implicitly admitted that the two points above are acceptable for you. You’ve admitted that your application cannot trust the data it operates on. We want to convince you to abandon the concept of validation completely and take control of your models by making them correct-by-construction. Once that is achieved, your applications will be able to trust their own data, because invalid data will not even be expressible.
The program
Here’s a fictional piece of code that we’ll be modifying to demostrate how dropping validation can improve your software. We need to take an address from a web form and persist it, then use it to send a parcel.
with {:ok, street} <- Access.fetch(params, :street),
{:ok, city} <- Access.fetch(params, :city),
{:ok, postal_code} <- Access.fetch(params, :postal_code)
address = %Address{street: street, city: city, postal_code: postal_code}
do
if Address.valid?(address) do
:ok = Repo.Address.insert(address)
:ok = ParcelService.send_parcel(address)
else
log_error("invalid_address, #{inspect address}")
return_error_to_user("address is invalid")
end
else
error_from_with ->
return_error_to_user(error_from_with)
end
The code above has some problems. It has three exit points, two of which represent errors encountered when assembling the data. It creates an Address
and performs some actions only when the Address
is valid, but it also performs some actions on the address if it’s not valid. Even logging invalid data could potentially lead to a breach.
-
We will prevent the app from using invalid addresses, because it simply won’t be possible to construct them.
-
We will make it clear which errors are the result of faulty input data (parse errors), and which errors are caused by malfunctions in other modules.
The root cause of the problem:
All of the issues in the code snippet above come from the fact that the
Address
struct can contain any information – it is just a container for 3
string fields. Also: after the address is persisted we no longer know if it’s
valid or not. If an Address
was returned from this function, the upstream
receiver would need to validate it again every time they wanted to use it. This exposes the fact that we’re not encoding validity in any way.
Solving the problem with a smart constructor
There’s just one thing we need to do in order to solve the issues mentioned
above. We need to ensure that an Address
struct can be created in a
controlled way, so that after it’s created we can be certain that it represents
a valid address.
We start by deciding that our struct can only be created in the module where it
is defined. By convention — in a function called new
or create
that returns
{:ok, %Address{}}
if creation was successful (so the data must be valid) or
{:error, term()}
otherwise.
To restrict other modules from creating instances of Address
, we’re going to
use a special type annotation in the Address
module:
@opaque t :: %__MODULE__{city: String.t(),
address: String.t(),
postal_code: String.t()}
With our type defined like this, Dialyzer will invalidate any code that tries
to construct or deconstruct instances of %Address
outside the Address
module.
This doesn’t come without a price: Dialyzer will also prevent us from accessing
struct fields using the dot notation. Still, we believe it’s worth it. Here’s a
hand-wavy implementation of the Address
module, including the crucial new
function:
defmodule Address do
defstruct [:street, :city, :postal_code]
@opaque t :: %__MODULE__{city: String.t(),
address: String.t(),
postal_code: String.t()}
@spec new(map()) :: {:ok, t()} | {:error, atom()}
def new(params) do
with
... business_logic ...
do
{:ok, %__MODULE__{street: valid_street,
city: valid_city,
postal_code: valid_postal_code}}
else
{:error, :validation_failed}
end
end
end
With this function, we can now refactor our original code to eliminate the possibility of invalid Address
es leaking out into the wider application:
case Address.new(params) do
{:ok, valid_address} ->
:ok = Repo.Address.insert(valid_address)
:ok = ParcelService.send_parcel(valid_address)
{:error, vaidation_error} ->
log_error(validation_error)
return_error_to_user(validation_error)
end
Notice how there isn’t even a variable named address
that could be mistaken
for valid data. We know that the address inside the {:ok, _}
tuple is a valid
one, and we make it explicit in the naming.
Now, there is no chance that someone will accidentally use an invalid Address
, as in log_error
below:
if Address.valid?(address) do
:ok = Repo.Address.insert(address)
:ok = ParcelService.send_parcel(address)
else
log_error("invalid_address, #{address}")
return_error_to_user("address is invalid")
end
Trying to instantiate an Address
from outside the Address
module angers the Dialyzer:
lib/example.ex:3: The specification for 'Elixir.MyApp.illegal_construction/0 has an opaque subtype
'Elixir.Address':t() which is violated by the success typing
() -> #{'__struct__' := 'Elixir.Address', 'city' := ... }
done (warnings were emitted)
Now, let’s see how it’s done.
Under the hood
The trick is to stop thinking about validating data in any way, and to frame creation of new instances of data as a parsing problem. Unstructured data comes in, and it either satisfies our parsers, yielding a successful parse result {:ok, result_type}
, or something fails, and that something becomes our unsucessful result: {:error, the_error}
Our little data utility library has a built-in parser combinator for working with parsing maps to structs. The syntax should be self-explanatory:
defmodule Address do
defstruct [:street, :city, :postal_code]
@opaque t :: %__MODULE__{}
@spec new(map()) :: {:ok, t()} | {:error, atom()}
def new(params) do
Data.Constructor.struct([
{:street, Data.Parser.BuiltIn.string()},
{:city, Data.Parser.BuiltIn.string()},
{:postal_code, Data.Parser.BuiltIn.string()}],
__MODULE__,
params)
end
end
Now, our function either successfully parses input maps according to our specs, or returns errors containing descriptions of where parsing failed:
iex(2)> Address.new(%{street: "1 Sunset Blvd.",
city: "Los Angeles",
postal_code: "90046"})
{:ok,
%Address{city: "Los Angeles", postal_code: "90046", street: "1 Sunset Blvd."}}
iex(3)> Address.new(%{city: "Los Angeles", postal_code: "90046"})
{:error,
%Error.DomainError{
caused_by: :nothing,
details: %{
field: :street,
input: %{city: "Los Angeles", postal_code: "90046"}
},
reason: :field_not_found_in_input
}}
iex(4)> Address.new(%{street: "1 Sunset Blvd.",
city: "Los Angeles",
postal_code: 9000})
{:error,
%Error.DomainError{
caused_by: {:just,
%Error.DomainError{caused_by: :nothing, details: %{}, reason: :not_a_string}},
details: %{
field: :postal_code,
input: %{city: "Los Angeles", postal_code: 9000, street: "1 Sunset Blvd."}
},
reason: :failed_to_parse_field
}}
Data.Constructor.struct gives us a toolset for creating smart
construtctors, and ditching the idea of validations completely. However, we
skimmed over the details of our ‘internal’ parsers, responsible for the fields
in Address
structs.
In our next post, we’ll go into these parsers in more depth and demonstrate
that we can take domain modeling to a higher level, by modeling the component
parts of Address
es as well. There’s no reason they should stay as
String.t
s.