In America, developers write tests to “cover” their production code. In TDD Russia, developers write production code to cover their tests

In America, developers write tests to “cover” their production code. In TDD Russia, developers write production code to cover their tests

recently, an enterprising engineer on my team has added a code coverage check to our CI pipeline. On every pull request, a bot posts a comment like this:

Test coverage report
Total statements added in this PR: X
Total statements added covered by tests: Y
Coverage % in this PR : Z%

It’s a neat little reminder for developers to make sure our testing is up to par.

After a while, I noticed that many of my PRs have a 100% test coverage*.
Ever since then, many (read: zero) colleagues have approached me and, with eyes filled with wonderment, asked:
“Wow uselessdev, how are you able to produce such consistently thorough tests?”

Which got me thinking – how am I able to produce such consistently thorough tests? I never set out to achieve 100% coverage. I only learn about it after the fact, when the bot calculates the coverage percentages.

In fact, it’s not a hard question to answer – it’s because of the way I work.
Whenever possible, I try to practice TDD:

I think of a use-case that the software needs to support.
I write an automated test that verifies it.
Inevitably the test fails.
I write the minimal amount of code to make the test pass.
Repeat.

Have you spotted the answer? it all hinges on one word – “minimal”.

Another way to put it –
I don’t write code unless there’s a red test that requires it.

If you think about it, I’m not writing tests to cover production code.
Rather, it’s the other way round – I write production code to “cover” test cases.

This doesn’t only have the nice effect of achieving high code coverage. it also helps me avoid a common pitfall that all developers are tempted to make: writing speculative code.

Speculative code

I define “speculative code” as “code that was written, but was never executed, by the developer”.

This may sound confusing. Even if a developer doesn’t write a test for every line of code, surely they click around in their browser, or execute it in another way, to verify that it works as intended.

That’s generally true, but we also often write some code “just in case”.

take this piece of code, for example:

var user = usersRepository.find(userId);
var dataFromSome3rdParty = ThirdPartySDK.getData(userId);

This code works as intended.

“Well actually”, says a knowledgable reviewer, “ThirdPartySDK is just a random place over the internet, that is out of our control. What happens if it’s down? or has a bug?”

So the diligent developer adds error some handling:

var user = usersRepository.find(userId);
try {
    var dataFromSome3rdParty = ThirdPartySDK.getData(userId);
} catch (Exception err) {
    logger.error(err, "Cannot find data for " + userId);
    return "Sorry, we weren't able to find 3rd party data for " + user.emailAddress;
}

And everybody goes home happy.

Only, on our local machine, the sandbox environment of ThirdPartySDK always returns successfully. So the error handling code gets shipped without ever being executed.

Another example of speculative code:

var email = request.body?.user?.emailAddress;
var isValid = validateEmail(email);

Can you spot the code that the developer has never executed?
It’s pretty subtle. It’s those ?s (aka safe navigation operator).

Whenever a developer ran this code on their machine, `body` had a `user` property, and `user` had an `email` property. They never “used” the safe navigation functionality.

OK, but what’s so bad about some extra code for added safety?

Yes, we don’t expect these edge-case situations to happen, but better safe than sorry, right?

Well, any line of code that we write has a potential to have a bug hiding in it.
And if we never run that line of code, we have no chance to discover that bug.

And, it just so happens, that both of the code samples above have a bug in them.
Have you spotted them? Go back and have a look.

In the first example, if userId doesn’t exist in our system, then usersRepository.find will return null. And the 3rd-party SDK will throw a NotFoundOnThirdPartyError.

And our error-handling code will try to read user.EmailAddress in order to provide a meaningful message to the user. Oops!

Errors that happen inside error-handling code are a special type of hell.

In the second example, the developer never saw what happens if either of the `user` or `email` properties are missing.

What actually happens, is that email will be evaluated to null. And validateEmail may, or may not, blow up, when given a null argument.

Back to TDD

So, if we want to avoid shipping code that we never tested, we must test all the code we write.

When thinking “what if X happens”, we should make sure that we actually see what happens when X happens.

It could be using an automated test. You may choose to keep the test, or delete it before committing the code.

It doesn’t have to be an automated test, though. It’s possible to “generate” test cases by fiddling with a function input, or with your own code, to generate a “rare” scenario.

The idea is to make sure not to ship untested code. There are many ways to do it, but TDD is the most thorough one.

TDD makes us cover our tests with production code, and helps us avoid speculative, untested code.

*Footnote: the usefulness of test coverage

Whether test coverage is a good, or useful, metric, is beyond the scope of this article.
I just want to clarify here – I don’t think that test coverage should be a target.
It can’t tell us if we’re doing a good job testing our code, but it can tell us if we’re doing a bad job of it.
Meaning – 80%, 90% or 100% test coverage doesn’t guarantee that our code is well tested.
However, 50% test coverage does guarantee that our code is not well tested.

If I were a CTO… My manifesto for running a tech business

If I were a CTO… My manifesto for running a tech business

This post is a documentation of the way I, personally, think that a successful software organization could be structured and run.

I previously wrote about the way a successful individual developer could work.
This post is taking a much broader view of the entire tech organization.
You’ll find that the different principles and practices described here are similar, identical, or enable the ones mentioned in the above post.
As always, the opinions here are influenced by well-known best practices (DevOps, agile), and by my own ~15 years of experience in different software organizations.
Also important to mention, that these opinions are not influenced by actual experience of holding a senior leadership role. So this post is quite one-sided in favour of the “internal” tech organization, without much / any consideration to the CTO’s role as part of the wider management team.

This is meant to serve as a living document – I’m sure it’ll change it based on readers’ feedback, and my own learnings and observations.

Goals (The “Why?”)

I don’t currently, nor do I ever intend to, serve as a CEO, CTO, or any other ‘chief’.
So this isn’t an instruction manual for my future self.
I also don’t presume to be a “consultant” or “executive coach” (yet?).

I’ll never have the authority to actually implement all the items in this list. But I think it’s still valuable to put in writing, for a few reasons:

  1. To put my own thoughts in order – writing down this list will force me to articulate my “philosophy”, and to clarify it to myself. Clarifying my values and priorities is a valuable exercise. Especially for times such as job searching, when I consider whether a company is a good fit for me.
  2. To use in my own little domain – even though I’ll never be a “top dog”, I may have the opportunity to lead a team again. Some of the principles and practices outlined here can be implemented even at a small scale.
  3. To influence others – I hope to influence the thoughts and actions of my employers in the direction I believe is right. Even if this document itself is not enough to affect change, it could serve as a starting point for a conversation.

Principles

These are high-level overview of “How we win” – if we succeed in the below, then we will win as a team.
They are outcomes, or metrics, rather than concrete actions and steps (see “practices” below for breaking down of principles into actionable items)

You won’t find anything ground-breaking here; as I mentioned, this is built on top of well-established philosophies.

Generative culture

(sometimes referred to as “Westrum organizational culture”).

This is an organizational culture that is goal- and mission-driven, fosters collaboration, encourages risk taking, and implements novel ideas.
It is informed by the belief that employees are internally motivated.
Meaning – Everyone wants to do a good job. There’s no need to “force” the workers to do a good job. (this is known as “Theory Y”)
If we espouse this theory, then there’s no need for management to overly supervise, check up, or impose limitations on employees. But rather, give them the necessary tools, knowledge, and training, to succeed.
Some concrete examples of this may be:

  1. Team autonomy in what they do – projects and tasks are decided on by the people who do the work. Management is responsible for priorities, vision, and “big picture”. Not the everyday work.
  2. Team autonomy in how they work – teams are free to choose how they go about achieving their goals. Scrum / kanban / waterfall / anarchy.. whatever gets good, consistent results.
  3. Trust – no “code owners” that must approve every change. no requirement for X number of “approvals”. We employ grown ups – they won’t start riffing on main , committing bugs and spaghetti code, just because they can. We trust them to be responsible, and come up with quality mechanisms that work for them, without forcing anything on them.
  4. Failure (such as a bug, production outage, miscommunication with a customer) does not lead to punishment. After all, the person(s) who made the mistake had the best of intentions. This means that the system they operated in allowed for that mistake to happen (even in the case where that system tasked them with doing a job that they’re not qualified for).
    Therefore, failure is an opportunity to learn and improve for the company, as well the individual.

Psychological safety

This is the belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and the team is safe for interpersonal risk taking (definition by Dr. Amy Edmondson).
That’s a core aspect of a generative culture. In a generative culture, we rely on individuals, not “leadership”, to come up with ideas, initiatives and the execution that drive the company forward. This cannot be done if employees don’t feel safe expressing their opinions.

Some other required attributes of a generative organization include:

  • Autonomy
  • Trust
  • Continuous learning (and improvement)
  • Cross-team collaboration

The “Four Key Metrics”

DevOps research and assessment (DORA) has consistently found that excelling at these four metrics leads to excelling in business outcomes (profitability, market share, customer satisfaction, employee satisfaction, and more):

Delivery

Deployment Frequency – how often you put new code in front of customers
Lead Time for Changes – how long it takes from the first commit on a developer’s machine, until that code is in front of customers

Stability

Time to Restore Services – the time between introducing a failure (bug / outage), and resolving it.
Change Failure Rate – % of deployments that cause a failure in production.

Moving these needles upwards requires being really good at quite a few behaviours and practices, as outlined below.
While they are not “principles” per-se, they are very helpful high-level goals that we can use as guidance.

Feedback

We recognise that we are often wrong. But we don’t know that we’re wrong, or in what way.
Therefore, we solicit, we value, and we act on, feedback. We aim to get feedback, and act on it, as quickly as possible.
This has multiple manifestations –

  1. Feedback about whether our product meets customers’ needs
  2. Feedback about whether our software behaves as intended
  3. Feedback about the quality of our code
  4. Feedback about us, our processes, and tools

Practices

The above principles are nothing in their own right. Only daily behaviours and incentive structures can bring a principle to life.
Here are some concrete practices and processes that, I believe, help realise the above principles:

Organization

Product teams / reverse Conway manoeuvre

Conway’s law dictates that our software structure will reflect the company’s organizational structure.
So, if we want to create a software architecture of independent, loosely-coupled components, then we need to structure our organization in such a way.
This would look different in every problem domain. But generally, a team is assigned a cohesive, independent sub-domain of the company’s business. For example – a “loans” team, a “savings” team, a “mortgages” team, and so on.
Each team has the responsibility for, and the personnel / tools to, provide the best loans / savings / mortgage software product. Starting from ideation, up to maintaining a service in production.

The team may be asked to provide some big picture outcome (e.g. “x% more savings account customers”, or “y% less churn for mortgage holders”). But the way the go about it is up to the team itself.

Realizes principles:

Generative culture / autonomy, trust – by making teams self-sufficient. They’re not dependent on anyone outside the team (e.g. QA team, ops team) to accomplish their goals.
Four key metrics / delivery – by removing dependencies and coupling, there’s less need for communication and coordination. Teams are free to work as quickly as they’d like.

Communication – Communities of practice

While teams are autonomous, none of them is an island. Teams still need to effectively work together, communicate about what they’re doing, coordinate, etc.
Additionally, learnings from one team (e.g. how to solve a specific problem) can be applicable to other teams.

The “standard” approach to these needs is a hierarchical one, traversing the organizational “tree”:
if team A needs to coordinate with team B, then it will go up through team A’s manager, who will go to team A’s director, who will go to their VP, who will go down to team B’s director, who will go to team B’s manager.
This approach is wasteful, and contradicts the principles of autonomy and theory Y.

An alternative would be to create structures where teams and individuals can communicate directly.
This could be communities of practice (e.g. “frontend devs”, “DB administrators”), technical all-hands (e.g. weekly open engineering meeting), or ad-hoc working groups. The details should be self-organized by the team(s) for whatever works for them. Management’s role is to allow the time and space (and encouragement) for these structures to emerge.

Realizes principles:

Generative culture / autonomy – even when coordination outside the team is necessary, the team has the autonomy to choose how to do that.
Generative culture / collaboration – allowing (and encouraging) direct communication, rather than hierarchical one, increases collaboration.
Generative culture / continuous learning – by providing opportunities for individuals and teams to learn from each other.

Management

We already touched on many things that management does not do – supervision, validation,tactical decision-making.
So what does management do, in our fairytale, rainbows-and-unicorns organization?
The main responsibilities of management are, generally, twofold:

  1. Provide context and “big picture” – making sure that everyone in the organization knows what the overall company goals and priorities are. So that they’re able to prioritise their own work accordingly.
    Making connections between different parts of the organization (e.g. “oh, you’re doing project X? well, Joanne from marketing is doing project Y which is related. You should talk!”)
  2. Reinforce the desired organizational culture – not by talking about it, but by consciously incentivizing desired behaviour
    some examples:
    • When a mistake / outage occurs, celebrate it as an opportunity to learn and improve, rather than playing blame games. Encourage teams to look at how the system / work processes can be improved to prevent the next issue.
    • Proactively reward employees who exemplify desired principles. Get rid of employees who don’t.
    • Back teams up when they need to invest resources in improving their processes, even in the face of external pressure (e.g. customer requests)
    • Back teams up even when they’re going in a different direction than what the people in management would’ve done
    • Share their own mistakes and vulnerabilities openly, to promote a culture of psychological safety

As you can see, the job of management is extremely important, but not large in volume.
This means that the organization requires less managers to function (for example, there’s no need for “directors” that have several teams under them, or “VPs” that have several directors under them, etc.)

Realizes principles:

Generative culture – this style of management allows individuals, not management, to generate value for the company (hence, “generative” culture).

Processes

As said earlier, each team can choose whatever workflow works for them.
There are a few common guidelines that are helpful across the board, though:

Relationship with the customer

Everyone on the team is expected to engage with, and learn from, customers.
Customers are not hidden away from developers behind product managers, business analysts etc.
This has the added benefit of removing menial tasks from product people (such as being a go-between between developers and customers, or writing down work tickets). They are free to focus on value-adding activities, such as customer behaviour analysis, market research, forward-planning, etc.

Realizes principles:

Feedback – developers have access to direct feedback from customers about what works and what doesn’t.

Technical

there are many technical practices required to achieve the above principles (especially the “four metrics”). DORA has a comprehensive list of them. I’ll only mention the ones that I’ve found especially important or valuable:

Continuous integration and delivery

In order to understand as quickly as possible whether our software behaves as intended, we must integrate all changes as frequently as possible, and check whether the software indeed behaves as it should.

In order to understand as quickly as possible whether the changes we’ve made are useful to customers, we must put them in front as customers as quickly (and frequently) as possible.

The pursuit of continuous integration and delivery is beneficial in itself.
It forces us to improve in many aspects of our work – automated testing, configuration and source management (to maintain safety while going fast), loose coupling (to avoid teams being blocked) etc.

Delivery pipeline as a first-class citizen

If we can’t (safely) deploy our software, then our customers can’t benefit from anything that we do. In this case, there’s no point to any other activity (e.g. developing a new feature, or fixing a bug). As it will not go in front of customers.

However obvious this seems, it has profound implications. It means that any “blockage” of our deployment pipeline (bad configuration, flakey tests, even significant slowing down of the pipeline) is as bad as a customer-facing outage. (Actually, it is a customer-facing outage. The customer does not get the functionality that they should)

Realizes principles:

Feedback (at multiple levels)
The four key metrics / delivery

Automated tests / test-driven development

I’ve actually seen an organization that did great on delivery metrics (e.g. multiple deployments per day), without emphasizing automated tests. As expected, their stability metrics (e.g number of bugs) were incredibly poor. And it was noticeable – this company has lost multiple contracts because customers were dissatisfied with the quality of the software.
If we aim to be able to release frequently, with confidence, we must have a reliable test suite.
Writing tests-first also provides invaluable feedback about our software design.

Realizes principles:

Feedback – about whether our software behaves as intended
The four key metrics – all of them. Testing decreases the odds of introducing a bug (i.e change failure rate). But it also gives us the confidence to deploy rapidly, without lengthy manual verification.

Fearless refactoring

Many of us have an aversion to changing working code. Whether it’s because we don’t see the value (it’s working; so what if it’s hard to read?), or afraid of the consequences (i.e introducing a bug).

However, if we aim for excellence in delivery and reliability, we can’t accept code that is hard (for us) to maintain.
Code that’s hard to maintain means slower speed (since it takes longer to change). it also jeopardizes our reliability (because it makes it easier to introduce a bug).

Therefore, we must encourage (and expect) developers to improve code that is difficult to understand or to change.
More than that – It’s also important to change code based on improved understanding of the problem:
we’ve all seen cases where code was built to accommodate use case X. But actually, use case Y is what the customer actually needed. So the code implements use case X, with some hacks and workarounds to make it behave like Y.
This is another case of code that’s hard to understand and maintain, and must be changed.

There are many more valuable technical practices. But, I believe that the teams will find them for themselves, if they aim to improve on the principles and practices already mentioned.
For example –

  • observability and monitoring – a team will naturally invest in those areas if it aims to improve its reliability metrics
  • Change management, version control, deployment automation – a team will naturally invest in those areas if it aims to improve its delivery metrics

Conclusion

If you’ve read this far, and you are not my mother, than thank you very much for bearing with me.
(If you are my mother, then hi mum!)

You may be interested in reading some of DORA’s research, or the “Accelerate” book.
This post has turned out to be a sort of poor-person’s reader’s digest of the DORA materials..

My software development manifesto

My software development manifesto

This blog post details the ideal process I would like to follow when working as a software developer. It lists the activities I find most beneficial at an hourly, daily, and weekly basis.
Many of the systems and processes below I’ve followed myself, and found useful. Others – I only had the opportunity to read or hear about, but have not tried.

Like any ideal, it’s not always fully achievable, or even realistic at points. But it’s important for me to have a “north star” to aim towards, so I know which direction I’d like to move in.

The audience for this post is:

  1. Me: To clarify my own thoughts, and to refer back to when thinking about making changes in how I work.
  2. Colleagues, team-mates, managers: To articulate what my agenda is, what kind of changes I may propose to our working arrangements, and why.
  3. Anyone else: To gather feedback, suggestions, or hear about their own experiences with these patterns and practices.

I’ll organize the processes I like to follow into different timeframes, or “feedback loops”.
Knowing whether we’re on the right track or not as soon as possible is one of the most important things in our work. Therefore, quick, tight feedback loops are paramount.

You’ll notice a high degree of commanalities between the different loops. Essentially, it’s the same process, only at different scales: Make a small step, verify, put it out of your head, move to the next step. Frequently stop and evaluate if we’re on the right track. Repeat

Inner loop: Implementation. Time frame: minutes

This is the core software development loop – Make a small change, verify that it works. And another one, and another. Then commit to source control, repeat.
I start this loop with writing an automated test that describes and verifies the behaviour I’m implementing*. Then I’d write the minimal amount of unstructured, “hacky” code, to make that test pass.
And then another one, and another one. Over and over.
This is a long series of very very small steps. (For example – running tests on every 1-2 lines of code changed, commiting every 1-10 lines of code changed.)

I would defer any “refactoring” or “tidying up” until the last possible moment. Usually after I’ve implemented all the functionality in this particular area.
That may even take a few days (and a few PRs).
That’s because I’m always learning, as I implement more functionality. I’m learning about the business problem I’m solving. About the domain. About the details of the implementation.
I’d like to refactor once I have the maximum level of knowledge, and not before.

Personal experience: I found that the only way to verify every single small change, dozens of times an hour, is with automated tests. The alternatives (e.g. going to the browser and clicking around) are too cumbersome.
I love working in this way. I can make progress very quickly without context-switching between code and browser.
I can commit a small chunk of functionality, forget about it, and move on. Thus decreasing my cognitive load.
Additionally, automated tests give me a very high degree of confidence.
Ideally, I’d push code to production without ever opening a browser (well, maybe just once or twice..)

*A short appendix on testing:
I mentioned that I’d like to test the behaviour I’m trying to implement.
I don’t want to test (like is often the case with “isolated unit tests”) the implementation details (e.g. “class A calls class B’s method X with arguments 1, 2, 3).
Testing the implementation doesn’t provide a high degree of confidence that the software behaves as intended.
It also hinders further changes to the software (I wrote a whole blog post about this).

My ideal tests would test the user-facing output of the service I’m working on (e.g. a JSON API response, or rendered HTML).
I would only fake modules that are outside of the system (e.g. database, 3rd party APIs, external systems).
But everything within the scope of the system behaves like it would in production. Thus, providing a high degree of confidence.
You can find much more detail in this life-changing (really!) conference talk that forever changed the way I practice TDD.

Second loop: Deployment. Time frame: hours / one day

I’ve now done a few hours of repeating the implementation loop. I should have some functionality that is somewhat useful to a user of the software.
At this point I’d like to put it in front of a customer, and verify that it actually achieves something useful.
In case the change is not that useful yet (for example – it’s implementing one step out of a multi-step process), I’d still like to test and deploy the code, behind a feature gate.

Before deploying, I’d get feedback on the quality of my work.
I’d ask any interested colleagues to review the code I wrote (in case I wasn’t pairing with them this whole time).
Pull / merge requests are standard in the industry, and are a convenient way to showcase changes. But an asynchronous review process is too slow – I’d like to get my changes reviewed and merged faster.
I’d want my teammates to provide feedback in a matter of minutes, rather than hours. And I’ll follow up with a synchronous, face to face conversation, if there’s any discussion to be had.
(In return, I will review my colleagues’ work as quickly as possible as well :))

If the changes are significant, sensitive, or change code that is used in many places, I may ask a teammate to manually verify them as well. or double-check for regressions in other areas.
I may ask a customer-minded colleague, such as a product person, or a designer, to have a look as well.

Once I’ve got my thumbs-up (hopefully in no more than an hour or two) I’ll merge my changes to the mainline branch.
The continuous delivery pipeline will pick that up automatically, package up the code, and run acceptance / smoke tests. After 30-60 minutes, this new version of the software will be in front of customers.

Personal experience: Working in this way meant that I could finish a small piece of work, put it out of my mind, and concentrate on the next one. That’s been immensly helpful in keeping me focussed, and reducing my cognitive load.
Additionally, it’s very helpful in case anything does go wrong in production. I know that the bug is likely related to the very small change I made recently.

Once I’ve finished a discrete piece of work, I need to figure out what to do next.
Getting feedback on our team’s work is the most important thing, so I’ll prioritise the tasks that are closest to achieving that.
Meaning – any task on the team that is closest to being shipped (and so, to getting feedback), is the most important task right now.
So I’ll focus on getting the most “advanced” task over the line. It may be by reviewing a colleague’s work, by helping them get unblocked, or simply by collaborating with them to make their development process faster.
Only if there isn’t a task in progress that I can move forward, I’ll pick up the next most important task for the team to do, from our prioritised backlog.

Personal experience: The experience of a team working in this way was the same as the individual experience I described above.
As a team, we were able to finish a small piece of work, put it out of our minds, and concentrate on the next one.
We avoided ineffective ways of working, such as starting multiple things at once while waiting for reviews, or long-running development efforts that are harder to test and to review. We always had something working to show for our work, rather than multiple half-finished things.
Working in this way also helped the team collaborate more closely, focussing on the team’s goals.

Third loop: Development Iteration. Time frame: 1-2 weeks

We’ve now done a few days of repeating the deployment loop. We should have a feature or improvement that is rather useful to a user of the software.
The team would speak to users of the software, and hear their feedback on it. Preferably in person.
Even if the feature is not “generally available” yet, “demo-ing” the changes to customers is still valuable.

The feedback from customers, as well as our team’s plans, company goals, industry trends etc. will inform our plans and priorities for the next iteration. The team (collaboratively, not just “managers” or “product owners”) will create its prioritised backlog based on those.

This point in time is also a good opportunity for the team to reflect and improve.
Are we happy with the value we delivered during this iteration? Was it the right thing for the customer? Are we satisfied with the quality of it? the speed at which we delivered? It’s a good point to discuss how we can deliver more value, at higher quality, faster, in the future.
What’s stopping us from improving, and how can we remove those impediments?
We can use metrics, such as the DORA “4 key metrics” to inform that conversation.

We plan and prioritise actions to realise those improvements.
(Some examples of such actions: improvements to the speed and reliability of the CI / CD pipeline; improvements to the time it takes to execute tests locally; simplifying code that we found hard to work with; exploring different ways to engage with customers and get their input; improvement to our monitoring tools to enable speedier detection and mitigation of production errors.)

We can also create, and reflect on, time-bound “experiments” to the way we work, and see if they move the needle on the speed / quality of our delivery (examples of such experiments: pair on all development tasks; institute a weekly “refinement” meeting with the whole team; have a daily call with customers…).

Personal experience: I only have “anti-experiences” here, I’m afraid. I’ve worked in many “agile” flavours, including many forms of scrum and kanban. I haven’t found any one system to be inherently “better” than the others.
I did find common problems with all of them.

The issue with agile that I observed in ~100% of the teams I’ve been on, is this:
we use some process blindly, without understanding why, or what its value or intended outcome is. We’re not being agile – we’re just following some arbitrary process that doesn’t help us improve.

My ideal process would involve a team that understands what it is we’re trying to improve (e.g. speed / quality / product-market fit).
We understand how our current process is meant to serve that. We make changes that are designed to improve our outcomes.
In that case, it doesn’t matter if we meet every day to talk about our tasks, or if we play poker every 2 weeks, or whatever.

So, what do you think?

This list is incomplete; I can go on forever about larger feedback loops (e.g. a quarterly feedback loop), or go into more details on the specifics of the principles and processes. It’ll never end. I hope I’ve been able to capture the essence of what’s important (to me) in a software development process.


What’s your opinion? Are these the right things to be aspiring to? Are these achievable? What have I missed?
Let me know in the comments.

Stop assuming your future self is an idiot (an alternative to YAGNI)

Stop assuming your future self is an idiot (an alternative to YAGNI)

I have been aware of, and even talking about, YAGNI (“You ain’t going to need it”), and the dangers of “future-proofing” for a long while. But not until recently have I actually applied this principle in earnest.

Trying to understand what took me so long, I took note of what makes other developers hesitant to apply this principle.

When I try to get others to practise YAGNI, I find the same reluctance that I myself have shown.
When I say to a team member (or to my younger self) “you ain’t going to need it”, the answer is always “yes, you’re probably right. But what if…??”.

And there’s no good way to answer that. I can’t prove that, in every possible future universe, we will never need this code.
Thus YAGNI fails to convince, and the redundant code stays.

I think I’ve been able to find a better argument, though.

My solution has been to play along with this thought experiment.

“OK, so what if we don’t put that ‘future-proof’ code there right now?
And suppose we do find out, in the future, that we do need to make that change?
Would that be such a disaster?
Or, if that happens, we can then make the change that you’re proposing now, right?
And even better – at that point, we’ll have more information and ability to make the right sort of change.”

I’ve had much better success with this line of arguing. We realise that future us are better equipped to deal with this change then present us.
We “just” need to believe in ourselves.

..About that “just”

So why don’t we, by default, believe in future us?
Why don’t we believe that future us can make that change just as well as present us?

I actually already alluded to this in a previous post, about being scared of changing code:
Loss of context and lack of confidence, are the main issues here.

Context

We know that at this point in time, when we’re well-versed with this part of the code, we can see a good (future) solution.
However, we’re not confident that we’ll see it in the future, when we maybe don’t remember everything about this area of the codebase.

It’s easy to see where this sentiment comes from. Many times when we read past code, we’re not confident that we 100% understand it. So why should the code we’re writing now be any different?

It’s taken me many years to have enough self confidence to counter that.
No, I am competent enough that when I do come back to this in the future, I will understand it well enough.

And I’ll make sure of that by leaving some clues for myself – clear names, easy to understand design, descriptive tests, etc.

(An important side point here is about continuity of knowledge.
The person, or team, that authored the original code, would only need a reminder of what it does, and how.
But a different person / team will have much much lower confidence in understanding the code.
The amount of clues – good names, comments, tests etc. would have to be even higher for them.)

Confidence

By this I don’t mean self-confidence, but the confidence in our changes. That we won’t be breaking anything.

For that, a good test suite, good monitoring and remediation tools are required.
But especially needed is a high degree of psychological safety. The confidence that, if we do end up breaking something, we won’t be punished for it.

Conclusion

Saying “YAGNI” is often not convincing enough. Many people’s response to that is that “Well, we can’t know that for certain. And when we will ‘need it’, then it’ll be too late!”.

I propose a more convincing argument – “We Can Always Change It Later”, or “WCACIL” (pronounced… er… however you want to pronounce it).

This argument needs to be supported by a framework that makes future change less scary:
Tests, documentation, simple design, and a safe environment.
And also, maybe a prod from more experienced team members who’ve done it before.

The best tool for the job is the tool you know how to use

The best tool for the job is the tool you know how to use

A recurring cliche at tech companies is that they use the “right tool for the job”. This is meant to show that the company is pragmatic, and not dogmatic, about the technology they use. It’s supposed to be a “good thing”.

for example – “We’re predominantely a nodeJS shop, but we also have a few microservices in golang”. Or (worse), “We let each team decide the best technology for them”.

I don’t agree with that approach. There are benefits realised by using, say, golnag in the right context. But, they are dwarfed by some not-so-obvious problems.

A “bad tool” used well is better than a “good tool” used badly

In most cases, an organization has a deep understanding, experience, and tools in a specific technology.

Suppose a use case arises where that specific technology isn’t the best fit. There’s a better tool for that class of problems.

I contend that the existing tech stack would still perform better than an “optimal”, but less known, technology.

there are two sides to this argument –

1. The “bad” tool isn’t all that bad

Most tech stacks these days are extremely versatile.

You could write embedded systems in javascript, websites with rust, even IoT in ruby..

It wouldn’t work as well as the the tools that are uniquely qualified for that context. But it can take you 80% of the way there. And, in 80% of cases – that’s good enough.

2. The “good” tool isn’t all that good

I mean – the tool probably is good. Your understanding of it, is not.
How to use it, best practices, common pitfalls, tooling, eco-system, and a million and one other things, that are only learned through experience.

You would not realise the same value from using that tool as someone who’s proficient in it.

Even worse – you’ll likely make some beginner mistakes.

And you’ll make them when they have the most impact – right at the beginning, when the system’s architecture is being established.

After a few months, you’ll gain enough experience to realise the mistakes you’ve made. But By then it’ll be much harder, or even infeasible, to fix.

There are some other issues with using a different technology than your main one:

Splitting the platform

Your organization has probably built (or bought) tooling around your main tech stack. They help your teams deliver faster, better, safer.

These tools will not be available for a new tech stack.

New tools, or ports of existing tools, will be required for the new tech stack.

The choice would be to either:
Invest the time and resources in (re)building (and maintaining) ports of the existing tools for that new technology, OR
Let the team using the new technology figure it out on their own.

In either case, this will result in a ton of extra work. Either for the platform / devX team (to build those tools), or for the product team (to solve boilerplate problems that have already been solved for the main tech stack).

Splitting the people

There’s a huge advantage to having a workforce that are all focused on a single tech stack. They can share knowledge, and even code, very easily. They can support each other. Onboarding one team member into a different team is much easier.

That means that there’s a lot of flexibility whereby people are able to move around teams. Maybe even on temporary basis, if one team is in need of extra support.
This is made much more difficult when there are different technologies involved.

Hiring may become more difficult also, if different teams have vastly different requirements.

What can I do if my main tech stack really is unsuitable for this one particular use case?

A former colleague of mine, in a ruby shop, had a need to develop a system that renders pixel-prefect PDFs.
They found that ruby lacked the tools and libraries to do that.
On the other hand – java has plenty of solid libraries for PDF rendering.

So they did something simple (but genius)- they ran their ruby system on the JVM.
This allowed them to use java libraries from within ruby code.
Literally the best of all worlds.

This is not unique to my colleague’s case, though.
You can run many languages on the JVM, and benefit from the rich java echosystem.
You can call performant C or Rust code from ruby, python, .NET, etc.

It’s possible to use the right tool at just the right place where it’s needed, without going ‘all-in’.

What can I do if I can’t get away with using my familiar tools?

Your existing tools probably cover 80% of all cases. But there will always be those 20% where you simply have to use “the right tool”. So let’s think about how to mitigate the above drawbacks of using an unfamiliar tool.

The most obvious option is to buy that familiarity: Bring in someone from the outside who’s already proficient with this tool. This can be in the form of permanent employees, or limited-time consultancy / contractors.

There’s a problem with any purchased capability, though.
They may be an expert in using the tool, but they are complete novices in using it in your specific context.
While they won’t make the beginner mistakes with the tool, as mentioned above, they’ll likely make beginner mistakes regarding your specific domain and context.

For this reason, I’d try and avoid using the consultancy model here. Firstly – they won’t have enough time to learn your domain. Secondly- your team won’t have enough time to learn the tool, to see where it doesn’t fit well with your domain.

Even hiring in full-time experts should be done with caution. They, too, will have no knowledge of your specific business context to begin with.
It may seem like a good idea to hire a whole team of experts, that can get up and running quickly. But consider pairing them with existing engineers with good understanding of your product and domain. The outside experts can level-up the existing engineers on the technology. The existing engineers can help the experts with context and domain knowledge.

It may seem slower to begin with, but can help avoid costly mistakes. And has the benefit of spreading the knowledge of the new tech stack, raising its bus factor.

Exceptions to the “rule”

Like any made-up-internet-advice, the position I outlined above is not a hard and fast rule.
There are cases where it would make complete sense to develop expertise in a technology that is not your core competency.

The most obvious example would be a new delivery method for your services: If you want to start serving your customers via a mobile app, for example. Then building the knowledge and tools around mobile development makes perfect sense.
Or creating an API / developer experience capability, if you want to start exposing your service via a developer API / SDK.

Or, if you’re a huge organization with thousands of developers. You’ll naturally have so many employees that have prior experience with different technologies. In that case you may find many of the issues outlined here do not apply.

In summary

Going all-in on a technical capability can have many benefits.
Richness of tools, flexibility of developers being able to move around different codebases, knowledge sharing, and more.
It makes sense to try and preserve that depth of expertise, and not to dilute it by bringing in more technologies and tools into the mix. Today, with every technology being so broad and multi-purpose, it’s easier to do than ever.

And remember – “select isn’t broken“: Many times I thought that some technology “cannot do” some task. Only to find out that it can, actually do that. It was just that I couldn’t.

Stop lying to yourself – you will never “fix it later”

Stop lying to yourself – you will never “fix it later”

Recently I approved a pull request from a colleague, that had the following description: “That’s a hacky way of doing this, but I don’t have time today to come up with a better implementation”.
It got me thinking about when this “hack” might be fixed.
I could recall many times when I, or my colleagues, shipped code that we were not completely happy with (from a maintainability / quality / cleanliness aspect, sub-par functionality, inferior user experience etc.).
On the other hand, I could recall far far fewer times where we went back and fixed those things.

I’ve read somewhere (unfortunately I can’t find the source) that “The longer something remains unchanged, the less likely it is to change in the future”.
Meaning – from the moment we shipped this “hack”, it then becomes less and less likely to be fixed as time goes on.
If we don’t fix it today, then tomorrow it’ll be less likely to be fixed. And even less so the day after, the week after, the month after. I observed this rule to be true, and I think there are a few reasons for it.

Surprisingly, it isn’t because we’re bad at our jobs, unprofessional, or simply uncaring.
It’s not even because of evil product managers who “force” us to move on to the next feature, not “allowing” us to fix things.

There are a few, interconnected, reasons:

Loss of context and confidence

The further removed you are from the point where the code was written, the less you understand it. You remember less about what it does, what it’s supposed to do, how it does it, where it’s used, etc.
If you don’t understand all its intended use cases, then you’re not confident that you can test all of them.
Which means you’re worried that any change you make might break some use case you were unaware of. (yes, good tests help, but how many of us trust their test suites even when we’re not very familiar with the code?)

This type of thinking leads to fear, which inhibits change.
The risk of breaking something isn’t “worth” the benefit of improving it.

Normalization

The more you’ve lived with something, the more used you are to it.
It feels like less and less of a problem with time.

For example – I recently moved house. In the first few days of unpacking, we didn’t have time or energy to re-assemble our bed frame.
It wasn’t a priority – we can sleep just as well on a mattress on the floor. There are more important things to sort out.
We eventually did get round to assembling it. SIX MONTHS after we moved in.
For the first few days, it was weird walking past the different bits and pieces of bed; laying on the floor.
But we got used to it. And eventually, barely thought about it.

Higher priority

This is a result of the previous two reasons.
On the one hand, we have something that we’re used to living with, which we are afraid to change.
We perceive it as high risk, low reward.
On the other hand, we have some new thing that we want to build / improve. There’s always a new thing.
The choice seems obvious. Every single time.

You’re now relying on that bad code

Even though we know that this code is “bad”, we need to build other features on top of it.
And we need to do it quickly, before there’s a chance to fix this “bad” code.
So now we have code that depends on the “bad” code, and will probably break if we change the bad code.

For example, we wrote our data validation at the UI layer. But we know that data validation should happen at the domain layer. So we intend to move that code “later”.
But after a while, we wrote some domain-level code, assuming that data received from the UI is already valid.
So moving the validation out of the UI will break that new code.

A more serious, and “architectural” example:
“We’re starting out with a schema-less database (such as mongoDB), because we don’t know what our data will look like. We want to be able to change its shape quickly and easily. We can re-evaluate it when our data model is more stable”.
I’ve worked at 3 different companies that used this exact same thinking. What I found common to all of them is:
1. They’re still using mongoDB, years later. 2. They’re very unhappy with mongoDB.
But they can’t replace it, because they built so much functionality on top of it!

What’s the point, then?

So, we realise that if we don’t fix something right away, we’re likely to never fix it. So what? why is that important?

Because it allows us to make informed decisions. Up until now, we thought that our choice was “either fix it now, or defer it to some point in the future”.
Now we can state our options more truthfully – “either fix it now, or be OK with it never being fixed”. That’s a whole new conversation. One which is much more realistic.

We can use this knowledge to inform our priorities:
If we know that it’s “now or never”, we may be able to prioritise that important fix, rather than throwing it into the black hole that is the bottom 80% of our backlog.

We can even use this to inform our work agreements and processes.
One process that worked pretty well for my team in the past, was to allocate a “cleanup” period immediately after each project.
The team doesn’t move on to the next thing right away when a feature is shipped. But rather, it has time to improve all those things that are important, but will otherwise never be fixed.

If we keep believing that a deferred task will one day be done, we’ll never fix anything.
If we acknowledge that we only have a small window of opportunity to act, we can make realistic, actionable plans.

(Case in point: I thought about writing this post while in the shower yesterday. When I got out, I told my wife about it. She said “Go and write this post right now. Otherwise you’ll never do it”.
And she was right: Compare this, written and published post, to the list full of “blog post ideas” that never materialized. I never “wrote it later”
I’m going to take my own advise now, and delete that list entirely.)

Business analysts hate him: how to get the business to give you the correct requirements right from the start

Business analysts hate him: how to get the business to give you the correct requirements right from the start

In a recent episode of the advice show “soft skills engineering”, a listener described some of their reasons for hating their job:

developers who need me to Google for them, business people who don’t understand how to provide requirements

I’d like to focus on “business people who don’t understand how to provide requirements”.
That’s an issue that used to frustrate me to no end earlier in my career.
I would receive a work ticket describing functionality to be implemented, complete with a UI design mockup.
I would go into my hole, and emerge a few days / weeks later with everything working as described.
Then, to no-one’s surprise, I would be assigned a new task. “move that button from the bottom of the screen to the top”. Or “limit this set of functionalities to admin users only”. Or “for premium users, the calculation should be different”.
And then, also to no-one’s surprise, I’d get annoyed.

“But you signed off on the mockup that had the button at the bottom!” or “Now I have to completely re-write the permissions mechanism for this!” “What’s changed between when you wrote this spec, and now??! Couldn’t you have thought a bit harder and given me the correct requirements the first time?”

It seems that the question asker and I are kindred spirits.

The first thing podcasts hosts Dave and Jamison said is this: If the question asker were to change jobs, they would find that this is still happening at their new job as well.

Too right, I say! Aren’t those business people stupid, hurr durr…

They also said –

No matter how clearly they document requirements, there’s always some ambiguity.

It’s unreasonable to expect business people to come to you with even somewhat fleshed-out correct requirements

Developers underestimate how extremely challenging it is to translate problems that people observe in the real-world into software requirements that can be implemented

You need to talk to them [people who provide requirements] more. If they give you requirements that you are concerned about, you just talk to them about it and ask them:
Why do they think this is the right thing? how certain they are that the requirements aren’t going to change? what if it takes longer than they expect?

These are all great observations. But I’d like to expand on them a little bit, since this is an important issue that makes life miserable for many developers.


So now, I’m ready to reveal the secret. Something that I’ve learned after many, painful experiences as a software developer. Are you ready?

How to get business people to figure out what they actually want

Before we dive into the answer, let’s try a few thought experiments. These are designed to get us into the head of the business people, and try and think like they think (don’t worry! it’ll only be temporary. You’ll be back to thinking rationally in a minute).

Experiment #1: Have you ever changed your mind about something?

Has it ever happened that you thought that one thing was the best option, but, upon seeing what it actually looks like, you realised it wasn’t? For example –

  • You thought that the big armchair would look best next to the chaise-longue, in the drawing room. But then you realised that it was the most comfortable chair in the house, so it should go in front of the TV.
  • Or, you thought that there is no way that sweet pineapples belong on savory pizza. That’s insanity! But then, when you tasted it this one time, it totally worked!
  • Or, you thought it would be a killer photo if you and your boyfriend pretended to hold the leaning tower of Pisa up between you. But when you posted it on instagram, it got no likes at all.

Do you see where I’m going with this? It’s impossible to know whether something would work for you before actually trying it.
If you’re still not convinced try this one –

Experiment #2: Can you provide the right requirements upfront?

Imagine a software development process consisting of these stages:

  1. Write requirements
  2. Create software design (e.g. classes and modules) based on those requirements
  3. Implement the design

Now imagine that you’re responsible for step 2: Given a set of requirements, you provide a set of classes and functions to satisfy those requirements.
Do you think the resulting implementation in step 3 is going to 100% reflect your design?
Or will you find edge cases, unforeseen complications, wrong assumptions, which would mean that the implementation needs to deviate from the design?

The simple truth is that, like Dave and Jamison hinted, there’s no such thing as “the correct requirements upfront”.

So what can we do?

This is a solved problem

20 years ago, a bunch of smart dudes got together, and wrote a manifesto for better software development. Here it is, in (almost) its entirety:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

How does that help us to get the actual right requirements out of those finicky business people?
Let’s see:

  1. Customer collaboration over contract negotiation: don’t waste your time writing specifications that you know will be inaccurate. Instead, find out who has the answers, and work together with them to find the right solution.
    How?
  2. Individuals and interactions over processes and tools: don’t invest time and effort into creating a process for generating and validating requirements.
    Instead – talk to the actual subject matter experts. Get them to explain their problem to you.
    Understand their pain points, and points of view. Raise any questions immediately to them. Get their feedback early and often.
    Get their feedback on –
  3. Working software over comprehensive documentation. Remember – you can’t tell what that Hawaiian pizza tastes like just by reading the recipe. you need to actually try some.
    If you do all that, then you’ll be
  4. Responding to change over following a plan: change is inevitable. Even if you think you like that sofa on that side of the room, you might realise it looks much better on the other side.
    Change will happen. You can choose, like me, to be grumpy and bitter about it. Or, you can accept it, and make sure that it causes you the least amount of disruption and pain.

Some practical tips

Here are some ways I’ve had success putting these principles into practice –

  1. Cut out the middlemen – find out who the actual users / subject matter experts are.
    Hint: They are not your product manager / business analyst.
    Talk directly with the people who personally experience the business problem. Understand what you’re actually trying to solve.
  2. Get the business people engaged – in the first step, you’ve found the people who care most about getting this business problem solved. You’ve listened to them in earnest. Most chances are that they now trust that you care about their problems as well.
    It then hopefully shouldn’t be difficult to get them to commit to engage with you.
    Ask them to be available for reviews, questions, etc. Explain that, this way, you’d be able to build what they need faster and better.
  3. Feedback – get their feedback early and often.
    Invite them to your story refinement / estimation meetings. Have regular demos of (some) working software. Change your upcoming tasks based on what they tell you.
  4. Embrace change – changes will happen. When they do, be thankful that you caught it as early as you did.
    If you have a good understanding of the business problem at hand, you should also understand why that change happened. That should make it a bit easier to stomach.

* for cases where you don’t know your users (e.g. a public-facing product), this process will have to change a bit. You’d still want someone who’s a subject matter expert (this may very well be your product manager). Together, find ways to get users feedback without being able to talk to all the users.
For example – talking to a sample set of users, using A/B tests.

The nice thing about these steps is that anyone can take them. You don’t have to be a manager / lead: All you’re doing is talking to some of your colleagues in other departments.
Unless you work for a very broken organization, nobody would tell you off for simply talking to other people (if they do – consider changing jobs…)

Closing thought: But I’m a developer, why are you asking me to do a PM / BA’s job??

That’s a common misgiving developers have to the approach I presented above.
So, what do you think is a developer’s job? If it’s only about converting requirements into software, then our job description would be something like:

Write code to fulfill given requirements

Maybe that’s what you think your job is. And, for junior developers, it probably is.
I propose, though, that for senior developers, this job description is more accurate:

Provide business value by creating software

How is this different from the previous definition? The replacement of “requirements” with “business value” is subtle, yet profound.


Imagine that you 100% accurately implement the given requirements.
But then, no user is buying that functionality. Or, the internal users still use excel to perform this task, because your solution doesn’t cover all cases.
You have fulfilled all requirements. But have you done your job?
I’d say no. You’ve created software, but that software hasn’t provided much value.

As we progress in our careers, we’re expected to have a larger impact on our company, to justify that larger paycheck we’ve come to expect.
And it’s not only on the ‘how’: less buggy, and more maintainable, software. It’s also on the ‘what’: making sure that the software we create serves the needs of our employer.

Testing anti-pattern: The soviet police-station

Testing anti-pattern: The soviet police-station

There are many good reasons to write automated tests:

They can help drive, or provide feedback on, the implementation.
They act as living documentation / proof of what the program does.
They prevent us from putting bugs in front of users.

But, to me, the most valuable purpose of automated tests are the confidence that they inspire.
When I have a suite of good automated test, I’m confident it will tell me about any bugs I introduce before my users do.
It gives me the confidence to change my code without worrying about breaking it.
In other words – the tests help confident refactoring (defined as the changing of the code’s internals, to make it more readable / extensible etc., without changing its behaviour).

That’s why I was quite surprised when my colleague, Nitish, pointed out how the tests I was writing were hindering, not helping, refactoring.
Here’s a short,completely made up, example:

As part of our e-commerce application, we have a shopping cart, where we place items. We can then calculate the total amount to be paid for all the items in the cart:

class Cart
  def total
    cost = @items.map(&:price).sum
    shipping = Shipment.new(@items).cost
    return cost + shipping
  end
end

fairly straightforward: the total to pay is the sum of all items’ prices plus shipping costs.
The calculation of shipment method and cost is delegated to the Shipment class.

class Shipment
  def initialize(_items)
  end
  def cost 
    15 # TODO: some complex logic that takes into account number, and size of, items
  end
end

Now, as is inevitable, some evil “business” person decides to ruin our beautiful code.
They tell us that premium users will get a discount if they purchase more than 5 items.

OK, so technically, this means that the code to handle Shipment now has to know about the user that owns the cart. Let’s make that change.
Staying true to Kent Beck’s “make the change easy, then make the easy change”, and working in small, safe steps, we make a small change:

class Cart
  def total
    cost = @items.map(&:price).sum
    shipping = Shipment.new(@items, @user).cost # provide user object to shipment
    return cost + shipping
  end
end

class Shipment
  def initialize(_items, _user)
  end
  def cost 
    15
  end
end

This is a textbook refactoring: the behaviour of the code hasn’t changed. It’s not even reading the new user parameter yet. It returns the same result as it did before.

However, a test now fails:

#<Shipment (class)> received :new with unexpected arguments

Let’s have a look at that failing test:

RSpec.describe Cart do
  describe '#total' do
    before do
      # `Shipment` is a complex class; 
      # we don't want the tests for `Cart` to fail whenever `Shipment` changes
      # It's enough to just validate that the correct interaction happens
      fake_shipment = instance_double(Shipment, cost: shipment_cost)
      allow(Shipment).to receive(:new).with(items).and_return(fake_shipment)
    end

    let(:items) { [Item.new(price: 5), Item.new(price: 10)] }
    let(:shipment_cost) { 20 }

    it 'updates the total to be the sum of all prices plus shipping costs' do
      expect(Cart.new(items).total).to eq(5+10+20)
    end
  end
end

Can you spot the problem here?

The fake Shipment object that is set up in the test doesn’t yet account for the new user argument.
No problem, just add some more mocking and we’ll be off on our merry way:

allow(Shipment).to receive(:new).with(items, user).and_return(fake_shipment)

That’s what I’ve done for years. Often, with much more complex class interactions, requiring many more changes to tests. Until Nitish pointed out that this is a waste of time.

My tests were essentially a copy-paste of my production code.
I only sprinkled some doubles, allows, and expects on top.
My tests were not going to find bugs, because all interactions were faked and will always return a constant value.
And, as we just saw, they were getting in the way of, rather than helping with, refactoring.
He called it the “Soviet police-station” style of testing.
The process to work with these tests is like so:

  • Refactor production code.
  • Observe some tests go red.
  • Take those tests to the back room.
  • Beat the sh*t out of them to make them look like the production code.
  • Tests finally confess to being green.

In other words – when a test fails, the “solution” is to copy over the implementation into the test, rather than fix an actual bug.

what purpose do those tests even serve?

They may have helped us drive the implementation at one point.
But they’re documenting the implementation, rather than what the application actually does.
They will certainly never find any bugs, because they’re just faking everything.

They will either stay green forever, or, if they do go red – only make us beat them up again rather than fix anything in our application.

Test behaviour; not functions

What’s the alternative to the soviet police-station type of useless testing?
We can verify the desired behaviour of the system, rather than individual functions and methods.
For example, instead of the test

Cart 
   #total 
      is the sum of all prices plus shipping costs

we can have something like

Shopping 
   inserting an item into the cart 
      updates the total to be the sum of all prices plus shipping costs

What’s the difference?

We’re not interested in testing a specific method (total) on a specific class (Cart). This is an internal detail of our implementation. It’s not what the users of our software care about.
What they do care about is, when they put an item into the cart, then they can see an updated total amount.
So long as this holds true, then the implementation details are irrelevant.
Our test should remain green, regardless of the implementation we choose.

(There’s much more to be said about this, and Ian Cooper said it much better in this life-changing conference talk.)

How does this work in practice?

What would be a good implementation of a test that verifies the application’s behaviour?
Well, that depends on our architecture.
If we use something like ports and adapters / hexagonal architecture, then our application’s control flow would look something like:

user request (e.g. click on UI, API request) --> adapter (e.g. web controller) --> port (e.g. domain service) --> core application code

In this case, a natural place to test the logic of our application will be the port.
The port exposes an API to accomplish a certain task (such as “add item”, “remove item” or “get cart total”). This is exactly what the user is trying to do.
It’s also decoupled from any fast-changing presentation / transport concerns.
Testing at the port level allows us to test the entire flow of a user request, which is just what we want:
We want to test the behaviour of the application, as seen by the user.

What if I’m using a different architecture?

then you’re wrong, and you should re-architect your application.
Just (mostly) kidding!

For other architectures, we’ll need to figure out where our “business logic” begins.
This is the point in the code that we want to verify.
It’s quite possible that this beginning is intermingled with presentation concerns. (for example: controllers in Rails’s implementation of MVC).

In this case, one option is to bite the bullet and test everything – the business logic, intermingled with other logic.
This is my first technique when using Rails – I use the built in methods for testing Rails controllers.
And it works reasonably well. It can break down in a couple of instances:

  1. When the ‘non-business’ (e.g. presentation) logic keeps changing.
    For example – the calculation doesn’t change, but the HTML generated by the controller does.
    In this case, we’re forced back to our old role as soviet cops, beating up the tests until they pass.
  2. Performance issues.
    Operations such as serializing JSON or rendering HTML may be time-consuming. If these are included as part of the executed code in tests, then the tests may become slow.

To solve those issues, it’s possible introduce a “port” (for example a “domain service” or “action” class). This leaves behind the presentation (or whatever) logic, and encapsulates all ‘business’-y logic in a more testable class.

P.S. we can still write isolated unit tests

The fact that we prefer behavioural tests doesn’t mean that we can’t also test in isolation, where it makes sense:

  1. complex pieces of logic may be easier and clearer to test separately. (think about the logic to calculate shipment costs, above. it could depend on a huge amount of variables, such as number of items, their size, who the user is, delivery address…)
  2. If we want to use mocks and stubs to drive our design (aka the “London” school of TDD).
    That’s a useful technique for discovering the implementation, and how the class design and interactions may look.
    Personally, I’d consider deleting these tests afterwards. They’ve already served their purpose of guiding our design. From now on, as we’ve seen, they will only serve to hinder changes to that design. They’re certainly not going to find any bugs.

Closing words

Tests should be an enabler for well-maintained code.
Tests should give us confidence that, if they are passing, then the application works as intended.
Additionally, they shouldn’t get in the way of changing the code.

Tests that verify implementation hinder us from changing our code, as well as being less confidence-inspiring.
Tests that verify behaviour enable us to change and evolve our code, as the application’s functionality, and our understanding of the domain, evolve.

Mock-driven development

There are lots of different way to fake behaviour in tests -
you've got your mocks, your spies, your stubs, your doubles etc. 
They are all (subtly) different. 
So let me say, right off the bat - I can never remember which is which. 
I'm going to use these terms interchangeably, and in some cases - wrongly. 
If you think this stuff is important, please let me know in the comments.


A colleague of mine invited Tim Mackinnon to an informal drink at our office one day. He introduced Tim is “the co-discoverer of mock objects”.
I felt that it was only common courtesy that I read the man’s work before meeting him.
Reading how Tim and his colleagues used mock objects to make the tests guide the design of their application was eye opening.

How it works (as I understand it)

In very short – the process, termed “needs-driven development” (but I call it “mock-driven development” because, why not?), is this:

  1. Tests for a particular functionality are written first, before implementation. So far this is good ol’ TDD.
  2. Unlike ‘classic’ TDD, we don’t make the test pass by implementing the functionality. Instead, we discover what the code under test needs from the ‘outside world’ to make the test pass.
    Meaning – what are the dependencies that the code under test needs to have in order to do its job.
  3. We then use test fakes to simulate those dependencies, without implementing any of them.

An example from the above article

The authors set out to implement a cache class that falls back to loading objects from the database, if they are not cached.

The first test is: Given that the cache is empty, when key A is requested, then object A is returned.
However, rather than being satisfied with black-box testing such as expect(cache.get('keyA')).to eq(objA), the authors stopped to think not only about the interface (tested above), but also about the design.
They realized that the cache object would need to fetch the objects from storage (as they’re not cached). They identified “fetching from storage” as a distinct responsibility. This responsibility can (and should) be implemented by a separate object.
They named this object “object loader”. And they’ve coded this into their test: expect(object_loader).to receive(:fetch).with('keyA').and_return(objA)
That meant that they were able to get this test to pass without worrying about how to actually fetch objects from storage. All they had to do was find a way to get a fake object loader into their cache class.

The complete test case looks something like this –

it 'loads an object from storage when the cache is empty' do   
  object_loader = double('object loader')
  expect(object_loader).to receive(:fetch).with('keyA').and_return(objA)
  cache = Cache.new(object_loader)
  expect(cache.get('keyA')).to eq(objA)
end

Then, they were able to continue focusing on further functionality of the cache (like caching, eviction policies etc.), without getting sidetracked.
I recommend reading the full article – it’s not nearly as academical and scary as it looks.

A new(ish) way to look at things

I never used test fakes like that – I’ve always used them after the fact.
Either when writing tests after I’ve written implementation (insert wrist-slap here), or as a way to speed up slow tests by faking-out the slow bits.


When writing code, I’m always trying to think as ‘dumbly’ as possible. As in – when writing class A that uses class B, I pretend like I have no idea how class B works. Looking only at B’s interface helps me decouple A from B, and helps me detect any leaky abstractions that B may expose.
That’s why the ‘mock driven development’ approach suggested by Tim et al., appealed to me. The notion of “I know I need an X here, but I don’t want to actually think about X for now” is exactly how I like to think.

A simple example

I’ll describe how I used this process to implement a repository that uses AWS S3 as its backing storage.
The required behaviour was: take an object as input, and write it to an S3 bucket.
I started out by defining the steps in the process: serialize the object to string, connect to an S3 bucket, write string to bucket.
By identifying the different steps, the design became apparent.
Using fakes, I was able to write tests that verified these steps:

it 'uses a serializer to serialize the given object' do
    input = MyObject.new
    serializer = double('serializer')
    expect(serializer).to receive(:serialize).with(input)
    
    described_class.new(serializer).save(input)
  end

(Note: I’m using dependency injection to enable me to stub-out the serializer object.)
The only thing I know after writing this test is that the input is serialized using something called `serializer`, that has a `serialize` method. That’s the first step, and I can write some code to implement that.

Now to implement the next step: connect to an S3 bucket

it 'connects to an S3 bucket' do
    bucket_factory = double('bucket factory')
    expect(bucket_factory).to receive(:get_bucket)
    input = MyObject.new
    serializer = double('serializer')
    allow(serializer).to receive(:serialize)
    described_class.new(serializer, bucket_factory).save(input)
   end

Here I realized that connecting to an S3 bucket is its own responsibility, and opted for a factory design pattern. Again, I don’t care at this point how this factory object works.
This was not my original design, though – using a factory was not my first thought.
I actually started by writing a test that verified that the repository connected to S3, by stubbing some of the AWS SDK classes.
However, I found the test setup too verbose and complex. That made me realize that the implementation would be too.
A verbose and complex implementation suggests that there’s extra responsibilities to tease out.
(Also – they say “don’t mock what you don’t own”. So, in any case, it would’ve been better to wrap the AWS-specific stuff in a wrapper class and mock that wrapper class)

I realized all that without writing a single line of implementation code. All it took to change my design was changing a few lines of test code; it cost me nothing.
By abstracting-away the interaction with AWS I made my life much simpler, and I was able to stay focused on the original steps defined above.
Later, when I went to implement the factory class, my life was pretty simple, again. I could ignore everything – serialization, coordination etc., and concentrate only on connecting to S3.

The final step is to use the bucket returned from the factory to write the string returned by the serializer:

it 'saves the serialized object to an S3 bucket' do
    bucket_factory = double('bucket factory')
    bucket = double('S3 bucket')
    allow(bucket_factory).to receive(:get_bucket).and_return(bucket)
    input = MyObject.new
    serializer = double('serializer')
    allow(serializer).to receive(:serialize).and_return('serialized object')
  
    expect(bucket).to receive(:put).with('serialized object')
    described_class.new(serializer, bucket_factory).save(input)
  end


And that’s the repository tested!
Now, take a second to try and imagine what the code that satisfies these tests looks like. Don’t peek below!

Is this what you had in mind?

class Repository
  def initialize(serializer, bucket_factory)
    @serializer = serializer
    @bucket_factory = bucket_factory
  end

  def save(obj)
    serialized = @serializer.serialize(obj)
    bucket = @bucket_factory.get_bucket
    bucket.put(serialized)
  end
end

Analysis

The resulting code looks laughably simple.
Why did I go to all that trouble with faking, testing etc. just for this, frankly trivial piece of code?
The reason is – those tests aren’t the artefacts, or the result, of my coding process. They are the coding process.
If I hadn’t used them to guide my coding, my code would’ve looked completely different. here are some of the advantages I saw with this process, and its result:

  • It encouraged me to separate responsibilities to other objects.
    With black-box testing it could be tempting to implement everything inside the same class. (Note: the red-green-refactor cycle of TDD should help me get a similar result, as I would’ve refactored my implementation once it was working.
    However, using mocks removed a lot of friction out of refactoring, as I only had to change little bits of test code, and no production code.)
  • I defined the APIs of the collaborating objects before even implementing them. That means that, by definition, those APIs don’t expose any implementation details. The result is very loosely-coupled code.
    (In fact, we went through several different types of serializers, but our tests never changed)
  • The tests are super fast: Finished in 0.00524 seconds (files took 0.05876 seconds to load)
  • The code and tests are isolated; the only dependency required to run the above code and tests is `RSpec` (testing library)

There is, of course, one very glaring problem:
The tests for my repository class are all passing, but it doesn’t actually persist any objects!
I still need integration tests that verify that the class is doing what it’s meant to do.
However, with extensive unit tests, the slower, more brittle, integration tests can be limited to a narrow set of scenarios.

Conclusion

For a while now, I’ve found that writing tests first helped me define my code’s interfaces more clearly.
Having tests as the first users of the code allowed me to look at it from the outside. That meant that the needs of the code’s users determine how the code is used, not any implementation details of the code itself.

Using mocks to drive the implementation takes this process one step further – I can now define other objects’ interfaces, driven by the user of those objects (which is my code under test).
In the example above, I defined the interfaces of the `Serializer` and `BucketFactory` classes while thinking of the `Repository` class.

The next thing I’d like to think about, is the long-term value of these unit tests:
Now that I have my nice, loosely-coupled design, have those tests served their purpose? Do they provide value anymore?