Software contribution and development guide

Introduction

This software contribution guide aims to be a core collection of concepts and guidelines for developing and contributing to scientific software in a sustainable and research ethical manner.

We employ a series of core concepts to ensure longevity, stability and validity of the developed software. These can be summarized as

Version control and branching strategies
Design and development guidelines tailored for science software
Code testing and documentation
Modern development and maintenance operations

Motivation

A major motivation for such an extensive contribution guide is to ensure sustainable code and that proper research ethics are uphold.

Consider the case of laboratory work, the first thing we get comfortable with is a lab-protocol of what nob was turned to what setting, or how much of a compound was mixed with another, during an experiment. This ensures that the experiment is reproducible, a cornerstone of the scientific method. Of course we should uphold software to the same rigor when we do numerical experiments and advanced calculations. Imagine reading a math proof where suddenly the author just states “well, this next part of the proof is in my desk. But, its mine and not nicely typed out so I rather not share it, just take my word for it.”. You would reject that proof as being completely untrustworthy! When the code is not described accurately enough in a paper, the code is not publicly available, and/or the code is so obscure that it cannot be used by an independent research group: independent reproducibility and verification is impossible (see Replication crisis). This concept applies to both data science and simulations.

While code testing might be self-explanatory, the other topics warrant some introduction. We include many aspects of software development and design since simply publishing the code is not always enough. For example, version control is essential for reproducibility. Imagine you are reading a peer reviewed paper and see a very strange result, you also see that the code that produced it is available at a personal website. So in your curiosity of this exciting result, you download the software, reproduce the settings in the paper exactly and press enter. Confusion ensues when your result is nowhere near the one in the publication. What you don’t know, is that between then and now the author updated the software and the old result is now gone forever. With no way of knowing why it occurred or if it was real without a deep-dive into the code, which could take months. This brings us to the example of style guides and documentation: proper use of these topics aims to not only reduce the deep-dive time by orders of magnitude for external readers, but also for the original author months or years in the future. Development and maintenance operations are introduced to avoid spending unnecessary time and to speed up decimation of the software within the community, thereby avoiding that other reinvent the wheel unnecessarily.

It is also worth noting when it comes to physics modeling, a paper should always contain all information needed for another researcher to be able to write their own equivalent software. One should not rely on code as a form of scientific communication, although it can be valuable tool in said communication. An equation implemented in code without an explanation / citation / accompanying paper that describes its derivation and meaning does not help anyone understand what the codes does, just how it does it. If the software does not warrant a peer-reviewed paper in e.g. a space physics oriented journal, remember that one can always write a submission for Journal of Open Source Software.

The guides presented here apply to both “packages”, such as scipy or sorts, and “project specific code”, such as one-off calculation for a paper.