Testing code

24 October 2016

It's always a bit of an embarrassment when talking about your code tests. I think most developers know that they don’t have enough tests or that their tests are not good enough.

There is never enough time to either write or to run tests that fully cover all possibilities so, like all types of programming, testing becomes a compromise where you try to make the best use of the limited resources available for testing.

I don't think there is a right answer for all cases, the correct approach is going to depend on the nature of the code and what is is supposed to do. Given that the resources available are limited, you also need to set priorities about what kinds of errors are most important to look for.

Evolution of a testing strategy

For the last 14 years we have been developing the SAFE, a web application that we use for administration of our HPC systems. Throughout this period finding ways to improve code testing has been a major concern. This is a fairly mainstream type of code development so one would assume that testing this kind of application is a known quantity. In practice code and tests are developed together. Design decisions in the code have a large impact on how testing can best be carried out and we often make design decisions based on how we intend to test the code so I think it is interesting to share how our testing strategy evolved over time.

The SAFE is written in Java so very early in the process we made the decision to use Junit to construct our tests. On top of this we have added our own extensions and helper classes to make it easier and quicker to write tests.

As the SAFE holds its state in a database we needed some mechanism to reset the database to a known starting state before each test, a “database fixture” in the terminology used by Junit. To start with we had no solution for this problem. Our first attempt stored the database fixtures in another database. This worked reasonably well for a number of years but it was quite a cumbersome process to set up and maintain the database of features. We eventually replaced this system with one that stored the fixtures as XML files which are easier to read and edit. More importantly it allowed consistent version control of tests and fixtures. This system is not just limited to fixtures, it also allows us to verify changes to the database generated by the tests. At any point in a test we can record the current state of the database then, at any subsequent point, we can record the state again and generate a XML document representing the changes. Another advantage of using an XML representation is that we can use XLST transforms to remove features like timestamps that can vary from one test run to another. Using any kind of database fixture means that unit tests can take some time to run. It takes about an hour and a half to run our nightly integration tests.

We use a similar XML system to verify dynamically generated HTML content. To make these tests easier we choose to generate the dynamic content in standard Java classes rather than in jsp pages. This was a deliberate design decision primarily driven by the need to write tests.

Testing high level work-flows

The problem that took us longest to solve was to find a good way of testing high level work-flows. We made some attempt to do this using standard browser automation tools where the testing framework remote-controls a real web browser. However we found this approach fragile and did not integrate well with our Junit tests. We now use standard Junit tests using the “Mock objects” technique. A standard Java Servlet container like tomcat interacts with the Servlets that make up an application via objects that implement the HttpServletRequest and HttpServletResponse interfaces. We “Mock up” the servlet container by creating our own test classes that implement the same interfaces and invoke the servlets from Junit tests. We also mock the email system. These mock objects are remarkably simple to write especially as we are not trying to produce a general purpose testing framework, just one that works very well with our own application framework. The end result is remarkably powerful. A great deal of the code needed to set up a request and check the validity of the result is common to multiple tests and can be extracted into methods in super-classes. As a result the test that simulates a user applying for a new account is only about 50 lines long, even though this simulates a chain of 3 form submissions.

 

The files that contain the expected html content and database changes are obviously much longer but these are usually generated by running the test and manually validating the content it generates, so generating these files does not significantly add to the time needed to write the test. The database change documents from one test can also be used as one of the database fixtures for follow-on tests.

This approach is used for high level tests corresponding to user-stories from the requirements and their purpose is to detect any change in the behaviour of these user-stories. A failing test does not necessarily mean that the code is broken just that it has changed and the change in behaviour should be reviewed. This kind of test is very valuable in the ongoing development of production code as they are directly related to the user experience and result in a large degree of code coverage. As well as being relatively quick to write they are also well targeted at the most important parts of the code. On the other hand they don’t fully explore all possible edge cases so these high level tests complement rather than replace low level unit tests.

Designing a test strategy (and designing code to be easy to test) is an important part of code development. Just as there always seems to be more tests you could write, there always seems to be ways you could improve your testing procedure.