Software design in scientific computing

Author: Adrian Jackson
Posted: 10 May 2016 | 00:07

Useful software design

Prompted by a recent discussion of a blog post discussing applying commercial development techniques to academic software development, I've been trying to formalise the software design process I'd recommend to academic software developers.

Just the term, software design, puts a lot of people off. It sounds like a long, elaborate process, full of requirements capture and storyboards, but it really doesn't have to be. I think anyone who is writing programs will be doing some form of software design, even if that design is just following the process they've always used, but are just not formalising it. However, formalising your software design could bring important benefits.

Design process

For me minimal software design should involve writing a very short document that contains the following:

  1. What is the purpose of the program?
  2. What data will it consume and produce?
  3. How can you validate correctness?
  4. Who do you expect the users to be?
  5. What parts of the program could you imagine replacing in the future (ie for performance or functionality reasons, for instance moving from 2D to 3D simulations, using a different solver, etc...)?
  6. Are you going to use existing code or libraries to implement all or part of the program (what are your dependencies)?

It could just be a sentence for each of the above, especially if you are writing a very small program, but you could also expand on these and write a page or two, with some pictures to help out if needed.

As with any tool to improve software development, the key to using something like this is tailoring; using the parts that are useful, adding to it as required, and ignoring the bits that don't apply. The purpose of the questions above are simply to encourage thought before coding starts, hopefully prompting sensible decisions on how the program is written based on the answers. They can also be used to evaluate the program later on, in code reviews or when releases happen; you'll have a document to reference that outlines key information about the program.

Benefits

One of the key design areas for software, which often is not properly considered and can have a big impact on applications later in the development process, is the changeability of a program. If there are areas in the program that it is likely you will want to change in the future, for performance or functionality reasons, then if you can cleanly encapsulate them in the program it will enable easy changes in the future (thinking data layout or solvers).

Likewise, understanding the likely user base, and I/O, of the program will let you take sensible decisions on how to structure your program and the level of testing needed (will you need GUIs, do you need multiple data formats for input and output, are you going to need documentation and error checking of inputs, etc...).

Finally, understanding and documenting your validation cases/information (how you can confirm correctness once you have a working program) will help anyone else developing the program in the future; and thinking about what parts of the program can be implemented using code you've developed before, or that's already available, will give you chance to have a look around for code you can re-use, potentially saving time in your development.

Design patterns

Really there's also a seventh question along the lines of "What design pattern will this program follow?". But I've not included it as I know this would be a software engineering step too far for a lot of developers. I instinctively don't like design patterns as it always feels like trying to shoehorn a program into a specific model rather than allowing programming freedom. However, in reality we all follow design patterns, even if it's just the same pattern you always use (functional decomposition, for example). 

The only reason to include such a question would be to get people thinking about how they structure programs, and whether this program will be similar to programs they have written before or will require a new approach (and therefore probably more testing, comments, documentation,etc...).  If you use this type of design approach for all the programs you write then it will not take long to build up your own set of design patterns which you can utilise in future designs.

Evaluation

Lastly, the real strength of adding a small amount of software design to your development process is not just in helping to ensure that the program is sensibly structured from the start, allowing easy modification and maintenance where most required, but also it allows evaluation of your programs once development has progressed. The software design gives an initial set of specifications that the program can be evaluated against, it can be used in code reviews and other assessment processes, to work out whether the approach taken to writing the program worked and was (relatively) efficient. 

It allows evaluation of the program over time so that maintenance and continued development costs can be assessed against the development module or design pattern used, and can let you learn from the good and bad points of the program for future development work.

I can imagine that, again, this is a step too far for many developers who aren't even undertaking regular code reviews, let alone evaluating software against a design over time. However, if you are to learn from your mistakes and improve your development skills over time some form of evaluation of the code you write is essential, and if you have an initial design this can be very useful in the evaluation.

I'm not claiming that the process I've outlined is perfect and I've probably left some important stuff out, but it's a process I try to follow when developing. If you have any suggestions for improvements/changes do get in touch because I'd be keen for some feedback. Would this work for you? Would you use it? Or can you not see the benefits?

You can also read this post on our Medium account.

Author

Adrian Jackson, EPCC
You can often find Adrian on Twitter.

Comments

Hi Adrian, this is concrete in terms of what you'd suggest doing, but I have a hard time fitting it into the types of software I work on. What kind of user base, developer audience, and software maturity level are you thinking of here?

In my field, I see three types of software.

A) scripts for munging one data type into another;
B) exploratory research software development to explore a problem space, propose solutions, and harden those solutions into something potentially useful;
C) infrastructure-y software development designed to meet a clear need & evolve over time;

I mostly work with A and B. An example of A would be scripts meant to develop data summaries and produce the data underlying plots for a paper; an example of B would be the khmer software, which is a library that lets us explore concepts and ideas in graph-based sequence analysis. Looking back on a decade of software development, from your list above I would only have been able to usefully do #2, for sequence formats taken in / output. With everything else, I would have been almost completely wrong in terms of predicting how the software actually evolved. From my experience in two decades of scientific software development, I suspect this to be fairly common.

So, I'm curious to what classes of research software and types of research problems you would apply this. All of them? Places where there's good theory and you're following through with a useful computational implementation? Problems where there's an existing user base with known problems that you're trying to solve? etc.

Hi Titus,

Thanks, great question, and as usual I have my own unconscious biases to the types of work I do so don't properly think how this stuff applies to other people.

For me A) isn't what I would class as software, more the infrastructure around the software you're using. This doesn't mean it doesn't need some software support (revision control, tests, etc...) but I would expect it to be so small that it does not require design. If the things you are developing in A) get bigger than a few hundred lines then I'd suggest they shouldn't be scripts any more and need to be full programs, and then the design does apply (after all, something like that could easily be re-used and re-purposed down the line by many other people). However, if you were doing design on this type of software I would expect the process to be extremely minimal (i.e. only really having to give a single line answer for each of the questions).

In terms of B), how you'd use software design would depend on how you do your development. Ideally, if you can be disciplined enough to throw away code, then I would say that your don't need design for the exploratory software, but when you come to harden those to solutions then you would apply the design approach. However, this really is dependent on you throwing away exploratory software (or at least shutting it down and keeping it for reference, not using it for anything); if you don't do that then I think it's probably even more important to have some design up front to allow the exploratory software to be developed in a way that allows it to change over time without requiring full re-writes of every part of it.

I take your point that it's very hard to predict how software will develop over time, however that may be the best motivator for design, it would give you some way of evaluating, after the fact, how the software evolved compared to what you expected it to, and may be there would be patterns that emerge in this that you could use in terms of understanding how your developing of software generally progress (I know there are lots of maybes in that sentence, it could be that the evolution of the software is random and you learn nothing from the design process, but I'd hope not).

So, to summarise, in general I'd be applying design to B) and C), although if you can throw away and re-do research software it'd only be to the final stages of B).

OK, I can sort of see that. I'm still skeptical of the value for B - I don't know how much software makes it from "useful to me" to "useful to others", and while we've been more successful than many at that transition, we've also invested a lot of time in stuff that may have neither short-term nor long-term utility. But I see that it might, hypothetically, be useful ;).

It's an important point, I think the majority of software does not go from useful to me to used by others, however a lot of the problems I see is because that transition has happened and the code was never designed for it. Unfortunately I guess it means the only way to solve that issue is to do work on more software than will ultimately need it, where the balance lies is difficult to work out.

However, what I was trying to get across is that it shouldn't be a time consuming or arduous process, it should be as stripped down as possible. If it is easy and intuitive then it doesn't matter if it's done to more codes and strictly need it.

If what I've suggested isn't easy and intuitive then it'd be good to get some idea of what process would be easy and low enough overhead to do to convince people it's an acceptable cost...

Thanks again for the blog post!

Another quick question for you -- is there any evidence that any of this works in research specifically, and that the tradeoffs in time and energy are worthwhile?

thanks,
--titus

Great question, the challenge here is that there is a lot of evidence that parts of software design have big benefits in large scale software. I'm not aware of any studies focussing on academic, small scale, development projects.

I'd quite like to conduct a study like the one published here: http://arxiv.org/pdf/1506.05272v1.pdf but on software design instead, I'll have to add it to the list of things to do :)