Unit Testing Best Practices
Unit tests are an important scaffold for large-scale software development; they enable us to design, write, and deploy production code with confidence by validating that software will behave as expected. Even though they may not execute in live systems, their development and maintenance requires the same care as general production code. Sometimes developers do not realize this which leads to testing code with more code smells than production code. Engineers may not give enough attention to test code changes in the code review process. However, most of the time the test code reflects the health of the production code. If the test code has some code smells, this can be a sign that the production code can be improved. In this post, I’m going to mention some of the best practices to keep unit test code clean and maximize the benefits they provide.
The best practices for unit testing are debated topics in the industry. In practice, however, projects and teams should align on key concepts in order to foster code consistency and ease-of-maintenance. I’m going to mention the meaning of unit testing in the Object-Oriented design world, characteristics of a unit test, naming conventions for unit tests, and when we should or should not use mocking. We can have or find many different answers/approaches for these concepts, and the relevance of different trade-offs may vary depending on the situation.
Unit Testing
To define unit testing, we should first define the unit. Once the unit has been defined, then we can define unit testing as testing the behaviors of a unit. Let’s try it for Object-Oriented software development methodology. Classes are the main building block of software designed with the Object-Oriented design paradigm. Then we can think that the class concept is the unit of Object-Oriented software, and unit testing involves independent testing of behaviors of a class by the developer who implements these behaviors.
A behavior and method relation of a class may not be 1:1. Sometimes a class can have more than one method to implement a behavior that is unit tested as a whole. Sometimes more than one class can be used to implement a behavior that’s unit tested. However, sometimes it can be a sign of code smell, like temporal coupling, if you use more than one public method or class to implement a unit tested behavior.
Characteristics of a Unit Test
When developing unit tests, some key considerations include: How fast should a unit test be? How often should a unit test be run? Which kind of object methods are valid or not valid for unit testing? How should we structure a unit test? What kind of assertions should we make in a unit test? Let’s think about the answers to these questions.
Some expected characteristics of unit tests include: fast execution times in order to provide immediate feedback about implementation correctness, readability in order to clearly express the behavior that’s tested, consistency and predictability of results through the use of deterministic evaluations, and robustness to structural changes (i.e., refactoring) in the implementation.
Speed
Developers expect unit tests to run quickly because they are generally executed frequently during the development process. We typically run unit tests whenever we make a change to our code in order to get immediate feedback about whether something is broken or not. Speed is a relative concept, but as Martin Fowler said in his article, “But the real point is that your test suites should run fast enough that you’re not discouraged from running them frequently enough”.
Having fast unit tests requires continuous care along the life cycle of our codebase, but we can also have some rules that help us to create fast unit tests when we name/tag a test as a unit test. Michael Feather mentioned some of this kind of rules in his article:
A test is not a unit test if:
- It talks to the database
- It communicates across the network
- It touches the file system
- It can’t run at the same time as any of your other unit tests
- You have to do special things to your environment (such as editing config files) to run it.
Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.
Behavioral Testing vs Structural Testing
In OOP, the structure of an object refers to the specific order and manner in which the production code implementing that object uses its dependent methods or classes. Since the structure of an object is related to the way that the production code is written, it can generally be considered an implementation detail. Structural testing involves testing these implementation details.
Let’s see an example of structural testing vs behavioral testing: we have an Order class and orders can be cancelled, there are some rules to check if an order is cancellable or not and these rules are executed by OrderSpecification;
And we have a test class to test this cancellation scenario;
In the above test class we both check that the order is cancelled and that the OrderSpecification method is called exactly once. However, the specific way in which OrderSpecification is used is an internal implementation detail of the order cancel code, so the above test would be considered an example of structural testing.
Let’s see the behavioral test code for this scenario;
In the above test class we only care about the behavior of order cancellation, not its internal implementation details.
We expect two things from production code: one is “doing the right thing”, the other one is “doing the thing right”. Unit tests should focus on the former, i.e., the behavior produced by the production code, which is one level of abstraction above the implementation details. So, as Kent Beck says in his article, “Programmer tests should be sensitive to behavior changes and insensitive to structure changes. If the program’s behavior is stable from an observer’s perspective, no tests should change.”
Why? When we think of the benefit, cost, and maintenance dimensions of unit testing, it’s not hard to see that structure-sensitive tests create more friction rather than they provide safety. Agile development teams change the structure of code continuously as they refactor, and fixing many brittle tests that are not related to any behavior after a refactoring is a very tiring and discouraging process.
For example, let’s say we change the method signature for the isCancellable method within the OrderSpecification class from using the Order class as an argument to using the OrderStatus class:
In such a situations, the expected behavior of our code has not really changed, but the following test will start to fail because of our verification that depends on the method’s signature:
Unfortunately, mocking libraries make this kind of structure testing very easy to write, so we should use their structure verification functions with caution. Of course, there can be some exceptional cases where we have to rely on some structural testing instead of behavioral testing to achieve some level of confidence with our system. For example, if a real implementation is too slow to use, or too complex to build, then we may use structural testing as verifying invocation of some functions with mocking. Another case can be related to orders of the function call, like checking cache hit/miss, in some cases a cache miss may have financial costs, let’s say we call a paid API in case of a cache miss, and we may use structural testing to verify cache methods are called or not. But these should be exceptional, not our default choice.
Should we write unit tests for all classes?
No, because classes have different kinds of behaviors. Some classes have behaviors directly for business logic related to requirements of our domain, while other classes have behaviors that are related to application/system-level requirements, like transaction, security, observability, etc.
We separate classes that have different kinds of behaviors using stereotypes, i.e. different categories of responsibilities. We use Domain-Driven Design (DDD) concepts in some of our projects, and DDD tactical design has some stereotypes for classes, such as aggregate root, entity, value objects, domain service, application service, repository, etc.
Let’s examine the application service case; application services are like gateways to our domain model from the outside world, as we see in the diagram below:
Application services handle application-level requirements (e.g., security, transactions, etc.) while routing requests from the outside world (which can be anything that’s not directly related to our domain model, like the web layer, RPC layer, a storage access layer, etc.) to our domain model. There is no business logic in application services, and their code mostly consists of direct calls to our domain model. If we try to write unit tests for these classes, there is nothing to verify from a behavior perspective; we can only verify interactions between them and the domain model. But we mentioned this is structural testing, and we don’t prefer these kinds of tests generally. So, we don’t prefer writing unit tests for DDD application services.
Then, how can we test these application services? There are different kinds of testing styles other than unit testing, and we think that integration tests that use these application services in the test flow are better alternatives for the application services of DDD.
Structure of a Unit Test
Generally, a unit test has three parts; setting pre-condition, taking action on the object, and making the verification. These are the Arrange/Act/Assert (or alternatively, Given/When/Then as used in Behavior Driven Development (BDD) testing). Applying this kind of structural style to our unit tests increases their readability.
Sometimes we can ignore the Arrange
part if we don’t need to set anything before the Act
part, but we should always have the Act
and Assert
parts when writing a unit test.
We can see an example of these Arrange/Act/Assert parts below;
Using a Naming Convention
The name of a unit test is important because it directly affects code readability. Unit tests should be readable because we should easily understand what is broken in our system when a unit test fails. We should also understand the behavior of our system when reading unit tests because people come and go in a project.
Some programming languages allow us to use plain language as method names, for example with Kotlin we can write below test method;
Some testing frameworks, like JUnit, provide a tag(@DisplayName) for this purpose if we can’t use the method names.
There are different naming conventions to name unit tests. Teams can align on a standard naming convention that members find most readable; alternatively, other teams may allow team members to use the most appropriate names for their tests instead of using a standard naming convention. In our last Kotlin project, we used the convention “Should ExpectedBehavior When StateUnderTest”.
Mocking
We use mock objects to replace real implementations that our production code depends on in a test with the help of libraries like Mockito, MockK, Python unittest.mock, etc. Using mock objects makes it easy to write more focused and cheap test code when our production code has a non-deterministic outside dependency.
For example, we mock a repository class that finds orders of a customer by status in the test code below:
Using mocks is not a silver bullet, and overusing mocking can cause some problems. For example, when using mocking, writing the stub code needed to program the behavior the mock can expose implementation details or structure of the underlying system being tested. As we mentioned before, this makes our tests brittle when the structure is changed. Test code with mocking is harder to understand when compared to test code without mocking, because of additional code required. Mocking can also cause false-positive tests because the behavior of real implementations can change, but our mock implementations may be out of date.
Mocking can be an appropriate choice for dependencies involving external systems. For example, mocking a repository class that communicates with a database, mocking a service class that calls another service/application over network, and mocking a service class that writes/reads some files to/from disk, can make sense. If we can use a real implementation then we should use it instead of a mock.
If we can’t use a real implementation, mocking is not our only option. We can also use fake objects, these are much simpler and lighter weight, only test-purpose implementations of the functionality provided by the production code. For example, implementing a test scenario that has complex conditions on its given part can be simpler with using fake objects instead of mock objects.
Conclusion
Unit tests should be considered a first-class citizen when writing production code in order to maximize their benefits. We should let our unit tests drive our production code’s design and readability by applying some best practices that we mentioned:
- Align about the meaning of unit testing concepts at least within the team/project.
- Keep your unit tests fast.
- Make behavioral tests instead of structural tests.
- Decide to write unit tests for a class according to responsibilities of the class.
- Align about the structure of a unit test.
- Align about a naming convention for unit tests or use a free naming convention as depending on the code review process.
- Use mocks with caution, don’t prefer to make structural testing with them.
Acknowledgments
Thanks to my colleagues who reviewed this post and provided invaluable feedback.
Notes
1: This post is copy of my medium post that’s under Udemy Tech Blog.