There are many good reasons to write automated tests:

They can help drive, or provide feedback on, the implementation.
They act as living documentation / proof of what the program does.
They prevent us from putting bugs in front of users.

But, to me, the most valuable purpose of automated tests are the confidence that they inspire.
When I have a suite of good automated test, I’m confident it will tell me about any bugs I introduce before my users do.
It gives me the confidence to change my code without worrying about breaking it.
In other words – the tests help confident refactoring (defined as the changing of the code’s internals, to make it more readable / extensible etc., without changing its behaviour).

That’s why I was quite surprised when my colleague, Nitish, pointed out how the tests I was writing were hindering, not helping, refactoring.
Here’s a short,completely made up, example:

As part of our e-commerce application, we have a shopping cart, where we place items. We can then calculate the total amount to be paid for all the items in the cart:

class Cart
  def total
    cost =
    shipping =
    return cost + shipping

fairly straightforward: the total to pay is the sum of all items’ prices plus shipping costs.
The calculation of shipment method and cost is delegated to the Shipment class.

class Shipment
  def initialize(_items)
  def cost 
    15 # TODO: some complex logic that takes into account number, and size of, items

Now, as is inevitable, some evil “business” person decides to ruin our beautiful code.
They tell us that premium users will get a discount if they purchase more than 5 items.

OK, so technically, this means that the code to handle Shipment now has to know about the user that owns the cart. Let’s make that change.
Staying true to Kent Beck’s “make the change easy, then make the easy change”, and working in small, safe steps, we make a small change:

class Cart
  def total
    cost =
    shipping =, @user).cost # provide user object to shipment
    return cost + shipping

class Shipment
  def initialize(_items, _user)
  def cost 

This is a textbook refactoring: the behaviour of the code hasn’t changed. It’s not even reading the new user parameter yet. It returns the same result as it did before.

However, a test now fails:

#<Shipment (class)> received :new with unexpected arguments

Let’s have a look at that failing test:

RSpec.describe Cart do
  describe '#total' do
    before do
      # `Shipment` is a complex class; 
      # we don't want the tests for `Cart` to fail whenever `Shipment` changes
      # It's enough to just validate that the correct interaction happens
      fake_shipment = instance_double(Shipment, cost: shipment_cost)
      allow(Shipment).to receive(:new).with(items).and_return(fake_shipment)

    let(:items) { [ 5), 10)] }
    let(:shipment_cost) { 20 }

    it 'updates the total to be the sum of all prices plus shipping costs' do
      expect( eq(5+10+20)

Can you spot the problem here?

The fake Shipment object that is set up in the test doesn’t yet account for the new user argument.
No problem, just add some more mocking and we’ll be off on our merry way:

allow(Shipment).to receive(:new).with(items, user).and_return(fake_shipment)

That’s what I’ve done for years. Often, with much more complex class interactions, requiring many more changes to tests. Until Nitish pointed out that this is a waste of time.

My tests were essentially a copy-paste of my production code.
I only sprinkled some doubles, allows, and expects on top.
My tests were not going to find bugs, because all interactions were faked and will always return a constant value.
And, as we just saw, they were getting in the way of, rather than helping with, refactoring.
He called it the “Soviet police-station” style of testing.
The process to work with these tests is like so:

  • Refactor production code.
  • Observe some tests go red.
  • Take those tests to the back room.
  • Beat the sh*t out of them to make them look like the production code.
  • Tests finally confess to being green.

In other words – when a test fails, the “solution” is to copy over the implementation into the test, rather than fix an actual bug.

what purpose do those tests even serve?

They may have helped us drive the implementation at one point.
But they’re documenting the implementation, rather than what the application actually does.
They will certainly never find any bugs, because they’re just faking everything.

They will either stay green forever, or, if they do go red – only make us beat them up again rather than fix anything in our application.

Test behaviour; not functions

What’s the alternative to the soviet police-station type of useless testing?
We can verify the desired behaviour of the system, rather than individual functions and methods.
For example, instead of the test

      is the sum of all prices plus shipping costs

we can have something like

   inserting an item into the cart 
      updates the total to be the sum of all prices plus shipping costs

What’s the difference?

We’re not interested in testing a specific method (total) on a specific class (Cart). This is an internal detail of our implementation. It’s not what the users of our software care about.
What they do care about is, when they put an item into the cart, then they can see an updated total amount.
So long as this holds true, then the implementation details are irrelevant.
Our test should remain green, regardless of the implementation we choose.

(There’s much more to be said about this, and Ian Cooper said it much better in this life-changing conference talk.)

How does this work in practice?

What would be a good implementation of a test that verifies the application’s behaviour?
Well, that depends on our architecture.
If we use something like ports and adapters / hexagonal architecture, then our application’s control flow would look something like:

user request (e.g. click on UI, API request) --> adapter (e.g. web controller) --> port (e.g. domain service) --> core application code

In this case, a natural place to test the logic of our application will be the port.
The port exposes an API to accomplish a certain task (such as “add item”, “remove item” or “get cart total”). This is exactly what the user is trying to do.
It’s also decoupled from any fast-changing presentation / transport concerns.
Testing at the port level allows us to test the entire flow of a user request, which is just what we want:
We want to test the behaviour of the application, as seen by the user.

What if I’m using a different architecture?

then you’re wrong, and you should re-architect your application.
Just (mostly) kidding!

For other architectures, we’ll need to figure out where our “business logic” begins.
This is the point in the code that we want to verify.
It’s quite possible that this beginning is intermingled with presentation concerns. (for example: controllers in Rails’s implementation of MVC).

In this case, one option is to bite the bullet and test everything – the business logic, intermingled with other logic.
This is my first technique when using Rails – I use the built in methods for testing Rails controllers.
And it works reasonably well. It can break down in a couple of instances:

  1. When the ‘non-business’ (e.g. presentation) logic keeps changing.
    For example – the calculation doesn’t change, but the HTML generated by the controller does.
    In this case, we’re forced back to our old role as soviet cops, beating up the tests until they pass.
  2. Performance issues.
    Operations such as serializing JSON or rendering HTML may be time-consuming. If these are included as part of the executed code in tests, then the tests may become slow.

To solve those issues, it’s possible introduce a “port” (for example a “domain service” or “action” class). This leaves behind the presentation (or whatever) logic, and encapsulates all ‘business’-y logic in a more testable class.

P.S. we can still write isolated unit tests

The fact that we prefer behavioural tests doesn’t mean that we can’t also test in isolation, where it makes sense:

  1. complex pieces of logic may be easier and clearer to test separately. (think about the logic to calculate shipment costs, above. it could depend on a huge amount of variables, such as number of items, their size, who the user is, delivery address…)
  2. If we want to use mocks and stubs to drive our design (aka the “London” school of TDD).
    That’s a useful technique for discovering the implementation, and how the class design and interactions may look.
    Personally, I’d consider deleting these tests afterwards. They’ve already served their purpose of guiding our design. From now on, as we’ve seen, they will only serve to hinder changes to that design. They’re certainly not going to find any bugs.

Closing words

Tests should be an enabler for well-maintained code.
Tests should give us confidence that, if they are passing, then the application works as intended.
Additionally, they shouldn’t get in the way of changing the code.

Tests that verify implementation hinder us from changing our code, as well as being less confidence-inspiring.
Tests that verify behaviour enable us to change and evolve our code, as the application’s functionality, and our understanding of the domain, evolve.


4 thoughts on “Testing anti-pattern: The soviet police-station

  1. This is a great read and so validating! I am a stickler for tests and my philosophy is to write generous end to end tests and sprinkle a little unit tests to help debug complicated algorithms. The e2e tests allow me to use coverage tools to determine where dead code lives and where I need more tests. I used to work for a billions-dollar company who tested every little thing but no one knew how to refactor because every line was unit tested in 3 places so coverage tools were completely pointless. if your *user* can’t execute the code, it doesnt need to be there.


  2. Great post.
    Personally I am very conservative with my usage of mocks. I avoid them for anything business logic related if possible. They are however essential when dealing with clients that make external calls.

    Since I worked with you and Nitish I’ve been working in Go. Mocks are a substantially safer as Go’s type system ensures that the mock have a far more robust contract than Ruby. Still not perfect and does fall apart when you have multiple mocks that mimic concrete implementation.


  3. That’s why we write Functional Test along with unit tests. Unit tests have their own merit, but they on their own are not enough for big projects


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s