The Virtue of Busy Waiting
Sleepy, flaky tests
We all sometimes come across the dreaded Process.sleep/1
function in Elixir
test code. You might also encounter its older brother, :timer.sleep/1
, but
you surely know that these two functions are both after the same thing: they
actively work to undermine your Red-Green-Refactor state of flow, and aim to
destroy your test suite’s determinism.
We feel that even one call to Process.sleep()
in a test suite is one too many,
and we’d like to showcase an approach which we use to clean up “sleepy”
tests. This approach will help you regain your flow, while removing
non-determinsm from your tests. But also: it will produce the pleasant side
effect of forcing you to think more systematically about the properties of the
system you’re testing.
Let’s solve the “sleepy test” problem.
Let’s just doze off for a few moments here…
The systems that we work with nowadays are de facto distributed systems, even if we don’t consciously design them that way. Caches, databases, job queues, metrics and log aggregators, third-party services, as well as our own client devices all comprise “our application”, and as such, require testing.
Many of these subsystems are asynchronous: writing a document to an eventually-consistent store doesn’t guarantee it will be there when we subsequently query for it; setting up a small job in our async pipeline doesn’t mean the results will be ready for us right away – even if the task itself is trivial.
And so it happens that when we write our tests, we do the laziest thing imaginable: rest and doze off a bit.
given_i_insert_a_document(db, %{"title": "test_doc", "content": "abc"})
Process.sleep(250) # wait for db to persist & replicate
x = when_i_query_by_title(db, "test_doc")
assert x == {:ok, %{"title": "test_doc", "content": "abc"}}
We fine-tune our sleep time to be “just enough” for the db to do the necessary paperwork, and we proceed with writing subsequent test cases, happy with our “simple” solution. This deceptively simple solution usually works. It works well enough for us to get productive and forget about why we needed it in the first place.
Things are peachy until we start noticing two worrying occurrences.
-
Sometimes our assertions fail with errors like:
{:error, "title 'test_doc' not found in db"}
-
Our test suite is taking longer and longer to finish
The first problem comes from the fact that we can’t be certain that the database will indeed finish processing and replicating our data within the 250 milliseconds that we specified. Sometimes it might take 259ms for our test record to become available. If we’d allowed for 10 extra milliseconds of sleep in our test, it would have passed. Of course, we know that this is just a spurious timing issue, so we’ll re-run the failing test.
This fact represents “secret knowledge” about the system: it will cause problems on CI servers, overloaded machines (we’ve all tried running our test suites while videoconferencing) and it will cause problems when onboarding new developers. It also reduces our confidence in our test suite. A failing test should reliably indicate an issue with the system under test. If it doesn’t, its value as a test is greatly diminished.
The second problem is that as our test cases increase in number, the little
moments of time that our computer spends sleeping begin to compound. make test
starts taking longer and longer, leading developers to find shortcuts,
run tests less often, or write tests after writing the code. All worrying
signs. Ideally, writing a test should be easier and more fun than writing the
implementation, to incentivize the developers. Tests that take ages to run are
not fun, and only incentivize us not to run them.
Fortunately, the solution to both of these problems is easy and fun.
Introducing: Polling
If you’re a parent and own a car, you’re probably familiar with the staple of family road trips: “Are we there yet?”. If you’re not, you might recall posing the question yourself, in your youth, to the great irritation of your parents. Luckily, the computer won’t get annoyed with us asking about the same thing over and over again. Let’s take advantage of this fact, and write some code to do the asking for us.
defmodule Wait do
def until(p) do
p.() || until(p)
end
end
We can now call Wait.until
with a predicate function p
, and if this
function returns false
, we’ll happily invoke it again, repeatedly, until we
get the answer we’re looking for. “ARE WE THEEEERE YET??”
With this pattern, we can continue asking the question for as long as it takes. For example, here’s how we could wait until the virtual machine generates a high-enough unique integer:
Wait.until(fn ->
IO.inspect(System.unique_integer([:positive])) > 100
end)
This patten is called busy-waiting, because instead of sleeping quietly and preserving resources, we keep the computer busy with calls to our predicate function. In most production systems, this is an antipattern you want to avoid due to its extreme wastefulness.
In test code, however, we shouldn’t be concerned with implementation efficiency, but with correctness, determinism, and turnaround time. Busy-waiting helps immensely with all of these, as we hope to demonstrate below.
Here are some patterns where we apply our Liveness library to make our tests more reliable and faster, while eliminating hand-written sleep statements.
Example 1: Ensuring resource availability
Let’s say we’re testing an external resource that’s represented by a
GenServer
. This resource is a singleton process, registered with the
system under a specific name.
defmodule ExternalAdder do
use GenServer
require Logger
def start_link do
GenServer.start_link(__MODULE__, [], name: __MODULE__)
end
def add(x,y), do: GenServer.call(__MODULE__, {:add, x, y})
def init([]) do
Process.flag(:trap_exit, true)
{:ok, :no_state}
end
def handle_call({:add, x, y}, _, state) do
{:reply, x + y, state}
end
def terminate(_, _) do
Logger.info("closing at #{DateTime.utc_now()}")
:ok
end
end
Below, we have a test suite…
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTestNoSleep do
use ExUnit.Case
test "addition identity" do
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,0) == 1
end
test "addition commutativity" do
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,2) == 3
assert ExternalAdder.add(2,1) == 3
end
end
…that will often fail with the following message:
$ elixir ./external_adder_test_no_sleep.exs
1) test addition identity (ExternalAdderTest)
external_adder_test_no_sleep.exs:12
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.115.0>}}
code: {:ok, _} = ExternalAdder.start_link()
stacktrace:
external_adder_test_no_sleep.exs:13: (test)
Finished in 0.1 seconds (0.1s on load, 0.00s on tests)
2 tests, 1 failure
Why is this suite flaky?
When the first test finishes, it sends an exit signal to the GenServer
process registered under the name ExternalAdder
. However, this signal is sent
asynchronously, and the ExternalAdder
process gets to execute its terminate
callback function. In our case, this involves logging a diagnostic message. As
it turns out, it often happens that the terminate
callback is still executing
when the second test case is started.
The call to ExternalAdder.start_link()
now fails, because the GenServer
registered in the previous test case is still in the process of
terminating. The BEAM runtime will not let us start another process with the
same name until the previous one is successfully unregistered. (Read more
about registering and unregistering
here.)
We know for a fact that a new ExternalAdder
should start up at some point,
because the old one was linked to the previous test case process, which is done
executing. We are running into a race condition which we could simply wait
out. Let’s take this approach and see if it works:
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTestSleep do
use ExUnit.Case
test "addition identity" do
Process.sleep(10)
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,0) == 1
end
test "addition commutativity" do
Process.sleep(10)
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,2) == 3
assert ExternalAdder.add(2,1) == 3
end
end
Indeed! The barely noticeable sleep of 10 milliseconds at the start of each
test case gives the terminating ExternalAdder
enough time to print out its
diagnostic message and unregister. Now the entire test suite passes.
All is fine and dandy… or is it? Let’s simulate several days’ worth of development and testing by running our test suite in a loop:
$ while (elixir ./external_adder_test_sleep.exs); do sleep 1; done
This will run the tests until they fail. And sure enough, after several tries, we hit the error from before:
.
11:38:25.281 [info] closing at 2019-10-22 11:38:25.266176Z
1) test addition identity (ExternalAdderTestSleep)
external_adder_test_sleep.exs:7
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.116.0>}}
code: {:ok, _} = ExternalAdder.start_link()
stacktrace:
external_adder_test_sleep.exs:9: (test)
Finished in 0.06 seconds (0.03s on load, 0.03s on tests)
2 tests, 1 failure
If you’ve used Process.sleep
to wait out race conditions in your tests
before, you’ll know what this means. From time to time, the shutdown process of
our GenServer
happens to take longer than 10ms. Let’s give it a more generous
allowance of 50ms.
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTestSleep50ms do
use ExUnit.Case
test "addition identity" do
Process.sleep(50)
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,0) == 1
end
test "addition commutativity" do
Process.sleep(50)
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(1,2) == 3
assert ExternalAdder.add(2,1) == 3
end
end
Now, this test suite seems to work everytime – we can leave it running in a loop and it keeps passing. If this was just one isolated test case, it would be an OK solution, but it doesn’t give us great confidence in the test passing on another developer’s overloaded machine, or a CI server.
Given enough time, this test suite will fail.
Let’s see how we can use a busy loop to remove the sleeps from our test and gain confidence in its reproducibility.
Imagine if we could say: “I expect this expression to eventually run
successfully, even if it fails a couple of times before”. Now, let’s
see how we would use it to state that we want our ExternalAdder
to
be eventually start_link
ed successfully.
eventually(fn -> {:ok, _pid} = ExternalAdder.start_link() end)
It turns out we can do just that! The
Liveness library provides our higher-order
eventually
function, which will take its input and try executing it multiple times, until
it succeeds.
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTestEventually do
use ExUnit.Case
import Eventually
test "addition identity" do
eventually(fn -> {:ok, _} = ExternalAdder.start_link() end)
assert ExternalAdder.add(1,0) == 1
end
test "addition commutativity" do
eventually(fn -> {:ok, _} = ExternalAdder.start_link() end)
assert ExternalAdder.add(1,2) == 3
assert ExternalAdder.add(2,1) == 3
end
end
Now, we can have confidence that our tests will pass in all kinds of situations, including on overloaded dev machines or CI servers.
Example 2: Too many sleeping tests
Let’s suppose that our development team is happy with the “good enough” solution of sleeping to wait out race conditions, and they’ve all agreed that a test suite failing every now and again is an acceptable price to pay.
Even if we’re happy with flaky and nondeterministic tests (which we shouldn’t be), there is another danger lurking here.
Here’s what happens when the number of “sleepy” test cases increases in our codebase.
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTest100CasesNoSleep do
use ExUnit.Case
for i <- 1..100 do
@tag number: i
test "addition identity for #{i}", %{number: n} do
Process.sleep(50)
{:ok, _} = ExternalAdder.start_link()
assert ExternalAdder.add(n,0) == n
end
end
end
Finished in 5.3 seconds (0.2s on load, 5.1s on tests)
100 tests, 0 failures
It works, but 100 simple tests took over 5 seconds to complete! And remember, we still don’t have any guarantees that at some point, some of these tests won’t randomly fail, for instance due to CPU time stolen by the hypervisor.
Let’s see if the situation would improve if we used eventually
instead:
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTest100CasesEventually do
use ExUnit.Case
import Eventually
for i <- 1..100 do
@tag number: i
test "addition identity for #{i}", %{number: n} do
eventually(fn -> {:ok, _} = ExternalAdder.start_link() end)
assert ExternalAdder.add(n,0) == n
end
end
end
Finished in 0.5 seconds (0.2s on load, 0.2s on tests)
100 tests, 0 failures
Much better. Remember that here we’re using the eventually
default of 250
attempts, every 20 milliseconds.
To gain more speed, we can try to decrease the polling interval and increase the number of attempts.
Code.require_file("external_adder.exs")
ExUnit.start()
defmodule ExternalAdderTest100CasesEventuallyShortInterval do
use ExUnit.Case
import Eventually
for i <- 1..100 do
@tag number: i
test "addition identity for #{i}", %{number: n} do
eventually(fn -> {:ok, _} = ExternalAdder.start_link() end, 1_000, 5)
assert ExternalAdder.add(n,0) == n
end
end
end
Finished in 0.4 seconds (0.3s on load, 0.1s on tests)
100 tests, 0 failures
An extra 20% faster! We’ve gone from a rather-untrustworthy, slow test suite to a deterministic, fast test suite.
Example 3: Assertion on eventually consistent results (e.g. db/queue reads)
Test assertions are where the eventually
pattern really shines. Very often we
come across situations where we’re testing an asynchronous or eventually
consistent system.
As part of the test, we perform some actions, and then we check that these actions had the intended effect. For example, we trigger an ansynchronous job via an API request, and we expect the job result to land in an S3 bucket.
This type of test often involves excessive sleep statements, put in place by different developers working on different machines. As with the previous examples, these tests are often flaky as well – there’s no guarantee that sleeping for N seconds will be enough for our system to process the data and upload results to a remote file server.
On top of that, the tests take longer and longer the more cases we need to support.
To demonstrate this, let’s build a very simple asynchronous system using built-in OTP functionality. The system accepts two numbers that we wish to add, and it eventually writes out the result of their addition to a new file. The caller is responsible for reading the result from the file.
defmodule AdditionJob do
use GenServer
@flush_timeout 500 # Change me to >600 and see the sleep_test fail!
def start_link(), do: GenServer.start_link(__MODULE__, [])
def schedule_addition(pid, x, y), do: GenServer.call(pid, {:add, x, y})
def init([]), do: {:ok, :no_state}
def handle_call({:add, x, y}, _, state) do
{output_filename, 0} = System.cmd("mktemp", [])
output_filename = String.trim(output_filename)
handle = File.open!(output_filename,
[:write, {:delayed_write, 64, @flush_timeout}])
IO.binwrite(handle, "#{x + y}")
{:reply, {:ok, output_filename}, state}
end
end
Eventual writes are simulated using the {:delayed_write, 64, @flush_timeout}
option passed to File.open!
. The Erlang VM will buffer writes until the
buffer size exceeds 64 bits or @flush_timeout
milliseconds have elapsed since
the buffered writes have been flushed to disk. Our system will use a
@flush_timeout
of 500 ms.
Here’s how we’d test this system with a sleep statement:
Code.require_file("addition_job.exs")
ExUnit.start()
defmodule AdditionJobSleepTest do
use ExUnit.Case
test "addition result is eventually available in output file" do
# given
{:ok, task} = AdditionJob.start_link()
# when
{:ok, output_file} = AdditionJob.schedule_addition(task, 1, 2)
# then
Process.sleep(600)
assert File.read(output_file) == {:ok, "3"}
end
end
When we run it, we can see that it indeed works and reads the correct result
from a file, and takes 0.6 seconds to run. Let’s see what it would look like if
we used Liveness.eventually
.
Code.require_file("addition_job.exs")
ExUnit.start()
defmodule AdditionJobSleepTest do
use ExUnit.Case
import Eventually
test "addition result is eventually available in output file" do
# given
{:ok, task} = AdditionJob.start_link()
# when
{:ok, output_file} = AdditionJob.schedule_addition(task, 1, 2)
# then
assert eventually(fn -> File.read(output_file) == {:ok, "3"} end)
end
end
We’ve removed the sleep statement and replaced it with a more deterministic (and descriptive!) assertion:
assert eventually(fn -> File.read(output_file) == {:ok, "3"} end)
but apart from this, we don’t seem to have gained much. With both this test assertion and the previous one, we need to wait at least 500 milliseconds for the results to show up in our output file.
However, the benefits of using an eventual assertion will become apparent to us further down the road. Here’s one such situation.
Suppose the specification of our AdditionJob
changes, and the eventual write
now takes a bit longer to complete. Let’s say we have to bump the
@flush_timeout
to 750 ms. If we do this, all our test cases that depend
on a hard-coded sleep of 600 ms will fail, and we’ll have to go through them
and change them.
Meanwhile, our eventual assertion keeps on working. The assertion says that eventually the results will be in the file, and that fact about our system has not changed, despite changes in its internal parameters.
Liveness.eventually
asserts Liveness Properties
The correctness of programs can be checked using a conjuction of safety and liveness properties, as proposed by Leslie Lamport.
Informally, the safety properties of a system define states that the system is not allowed to end up in. On the other hand, liveness properties say that even if the system is not in a given state yet, there’s a possibility that it will reach it at some point in the future.
In the language of the test cases we write to accompany our software, every
time we use assert
, we are verifying safety properties, because we have
some expectation about the state of the system in the given moment. More
explicitly, we say: “this is the permissible state of the system now. All other
states mean that this safety property has been violated and the test must
fail.”
When we compose assert
with eventually
, we are declaring something like the
following: “the system may be in whatever state at the moment, that’s ok. But
we expect the system to reach this desired state at some point in the future.”
This brings us into the domain of liveness properties.
Of course, the eventually
combinator will only keep retrying for so long, so
it’s possible that we won’t witness our system reaching the desired state
before our time runs out. This is a false negative, and a known fact about
practically checking liveness properties: you can’t know for sure that the
system can reach a desired state until you actually see the system in that
particular state. Model-based & formal methods offer other ways of verifying
liveness properties, but these are far outside the scope of ExUnit.
The #sleeplesstesting
challenge
If you’re sick of flaky and sleepy tests in your Elixir projects, how about
taking the
#sleeplesstesting
challenge? We dare you to grep your test directories for occurrences of the
word sleep
, and see what happens when you replace sleep
s with declarative
busy waits.
Ping us on
twitter and tweet your results with
the hashtag
#sleeplesstesting
!