barbarianmeetscoding

hackerz edition

Thoughts on Unit Testing and TDD: Test Behavior, Not Implementation

| Comments

Ladies and gentlemen: The blog post you are about to read is based on a true story/e-mail. Only the names of classes and methods have been changed to protect the privacy and innocence of the source code. The facts remain unchanged and the contents unadultered…

Disclaimer: I love automated testing and I have been a practicioner of TDD since 2010.

When I refer to unit tests in this article I use the most strict definition of unit tests, that is, test a unit of code – a public method of a class – in complete isolation – by stubbing or mocking all its dependencies. When I refer to integration tests I mean testing classes that collaborate together. When I refer to acceptance tests I mean end-to-end tests.

Last Friday I sent a code review (what a wonderful invention the code review) to a friend and colleague, let’s call him D… hmm… Dumbledore! Yes! That will do. I sent a code review to Dumbledore and continued coding away. After a brief period of time Dumbledore dilligently went through my code and sent me the review back. I impatiently opened it and read:

It looks like you are testing an implementation detail.

Dumbledore - True Story

Which started a great conversation about programming in general (what a wonderful invention the code review) and about testing behavior vs testing implementation in particular. I kept thinking about the testing behavior not implementation conundrum after work which lead me to write an email the day after and this blog post today. I think that some times I need time to think before I can articulate a good answer to stuff. Particularly with some techniques and practices that I have been doing for a while and I have internalized so much that I almost don’t question any more.

Here is what I concluded after some thought. The email read like this (foreground slowly fades into the past, the screen now in fuzzy shades of gray, Gmail):

I think that indeed the most ideal objective when testing is to test behavior, that’s what you really want to focus on when writing automated tests because it is the only real/final type of test that is going to tell you whether or not your application behaves in the way you want. But that means that, to really test behavior in complex systems you are going to need to write acceptance/integration tests (black-box tests), that is, expensive tests (expensive to setup and expensive to run). For instance, in the particular case that we were talking about in the code review, in order to write a real test of behavior I would have needed to write a black box test that would have tested the service, the inbox manager, any other inbox manager collaborators and probably a repository, NHibernate and the database. This is a huge investment and that’s why I usually like to write unit tests (test unit of code in isolation) instead when I am developing, particularly if I feel that time is running out XD (although in the best case scenario I would write an acceptance test and then TDD my way into a working implementation).

It is quite difficult to test behavior when writing a unit test for a complex class that has many collaborators, that in turn have more collaborators, that have in turn yet more collaborators. For instance,

  1. If you have a calculator with an add method that takes a couple of integers, then no problem, you can easily test behavior.
  2. If you have a console app that takes string arguments and adds them (string calculator kata?), you may have a class that does the arithmetic operations and an argument parser. You can test both individually for behavior and the whole console app with black box testing. Still no problem.
  3. If you have a huge application with tons of classes that depend on each other then testing gets harder. Black box testing becomes hard to setup and slow to run, so you lose one of the most useful/important characteristics of automated testing: short feedback loops. Something that quickly tells you that your code is [not] working as expected, and does it fast.

In case number 3, the case we find ourselves at. You have classes with tons of nested collaborators. You can, and should write acceptance tests, but the feedback loop for these tests is soooooooo slow. You write them, build, start backend, “ups, I made a stupid error”, fix, build, start backend, dammit, another error, and so on and so forth… that’s why, testing with acceptace tests is painful, and that’s where the value of unit tests lie. I can write a test, red, write code, green, refactor, green, write test, red, write code, green, refactor, green and so on super fast so that when I am ready to run the acceptance test I am in a position where I am pretty sure that it will pass.

And this brings us to the problem, How can I write unit tests for complex systems so I can test behavior instead of an implementation?. And that is a tricky question that requires consideration. In the case we are discussing about, I very distinctly tested an implementation:

ShouldGetTheFoldersViaTheInboxManager_WhenCalled (or something like that) where I asserted that a collaborator’s method had been called.

I could make it more behavior-y (and that’s an improvement indeed) by changing its name and body:

ShouldGetTheFoldersIntheUsersInbox_WhenCalled where I would setup the inbox manager stub and test that the service returns the aforementioned folders.

This is an improvement indeed but it still has a problem. I am testing an implementation disguised as a behavior. Why? Because I have explicitly set up the stub. (This test is the equivalent to the mock assertion, I just changed the side of the coin, particularly in this case where the only thing the method does is delegate)

In order to really test behavior in this particular case I think we would need to do a couple of things:

  1. Answer the question: What part of my code is just an implementation? How many collaborators and how deep are just part of this implementation detail? That would probably end up in us dividing the application in additional submodules with a more strict separation (which would be a good thing). We usually achieve this through projects/assemblies/layers, but it would be probably interesting to make use of more private classes and start thinking more deeply about when we use DI (dependency inversion) and IoC (inversion of control) to make an explicit separation between implementation details and the contract/service a given module fulfills/provides.
  2. Since we are using IoC heavily, we would need to improve our testing development environment to be able to inject all the dependencies that are part of the implementation, so that you won’t need to do that manually, and thus save time and pain.

Those are the things I can think about when reflecting about this problem, but there is a lot of people that knows more about this than I do that have much more to say.

Another different question is, is there value in unit tests that test an implementation? or is there value in testing units of code in isolation (because I think that most cases they are the same thing)?

The pros that I see in writing unit tests are the ones that I have always seen:

  • test ensures the code behaves as I want it to behave
  • fast to write, fast to run, short feedback loop, good value for price
  • find bugs with super-high granularity (if a unit test shows a bug, you pretty much know which class it is in, you don’t need to go throw all classes nor debug it, that’s priceless)
  • it forces you to write very simple classes and increase modularity and separation of concerns (because writing unit tests for fat, complex classes with tons of collaborators is horrible. We have a natural tendency to avoid pain)
  • increases developer confidence to refactor
  • documentation for my future self and other developers
  • if something changes that affects the unit under test, you will know it (regardless if it is a bug or a new implementation xD)

The cons are the ones that I have learnt to live with and that have made me reflect on improving the way I test:

  • they can be brittle in the sense that changes in the implementation may break your tests
  • they are not as trustworthy as higher level tests. You can make assumptions when stubbing collaborators that do not match reality and end up testing… something different

For instance, in the case of the calculator console app, you could have a calculator class that would call an argument parser to parse the arguments and then call the arithmetic unit to perform the arithmetics on those parameters. You could write a test to test that the calculator behaves in that way. GivenTwoNumbers_ShouldReturnItsSum and you would setup two stubs to do the parsing and the arithmetic calculation. The calculator class itself doesn’t do much, it justs delegates argument parsing to an argument parser class and arithmetic calculations to the other class. If you write a test to verify that the calculator behaves in that way, you are pretty much testing that algorithm: 1) parse 2)arithmetics, but more explicitely, there’s some IArgumentParser that is going to parse the arguments and some IArithmeticsUnit that is going to add them. What do you gain by writing this test? this particular test is of marginal value if you compare it to the unit tests for the arithmetic and the parser classes but you still gain something: 1) you know that the class works in the way you want, without the test, how would you know it works at all? 2) If you run the app and there’s a bug, you are going to be pretty sure that the bug is not in the calculator but in other dependencies.

And that’s pretty much it… I have to think about other specific examples within our application MagicFlow …

hmm… this could be a blog post xDDD

(foreground distorts into reality, welcome back to the present) And that was the email. Hope you found it interesting and feel free to comment and share your thoughs about unit testing and TDD!

I have been thinking for a while about writing more articles regarding my experience and journey with unit testing and TDD, so expect more of them coming. And particularly at this point where I am starting to question the definition of unit tests in isolation I learnt from the art of unit testing. All hope is not lost, I was just a liiiiitle bit slower than Roy :)

I used to feel that a ‘unit’ was the smallest possible part of a code base (a method, really). But in the past couple of years I’ve changed my mind. Here’s how I define a unit test, as of October 2011:

A unit test is an automated piece of code that invokes a unit of work in the system and then checks a single assumption about the behavior of that unit of work.

A unit of work is a single logical functional use case in the system that can be invoked by some public interface (in most cases). A unit of work can span a single method, a whole class or multiple classes working together to achieve one single logical purpose that can be verified.

Roy Osherove Unit Testing Definition

Some Interesting references

Comments