In December, I came across a guide to reproducible code in ecology and evolution published by the British Ecological Society. While my background is in physics research, I think the guide is incredibly useful for scientists of all disciplines. Reading through this guide also got me thinking about reproducibility in general and my own journey towards creating reproducible research.
The British Ecological Society’s guide comes at a time when researchers across the various scientific disciplines are discussing the importance of reproducibility. This conversation has been prominent enough to garner coverage in the media. Roger Peng wrote an excellent paper in 2015 about the reproducibility crisis in the sciences that delved into what has happened and also provides a call to action to improve the global robustness of scientific data analysis.
What is actually in the guide? The guide covers a wide swath of reproducible code topics ranging from simple reproducible workflows, to suggestions for programming standards, to an introduction to versioning. Each section breaks down a particular topic and gives some examples of how to implement their suggestions in R. For instance, the section on Reproducible Reports gives a description of what reproducible reports are, why they are important, and then provides a brief guide to using R Markdown to create reproducible reports.
One of the statements in the guide that I appreciated was the following reminder at the end of the introduction:
True reproducibility is really hard. But do not let this put you off. We would not expect anyone to follow all of the advice in this booklet at once. Instead, challenge yourself to add one more aspect to each of your projects. Remember, partially reproducible research is much better than completely non-reproducible research.
This got me thinking about my own experience developing reproducible code in my own research. I didn’t set out initially to begin writing reproducible code. At the beginning of graduate school my code was a mess. I had scripts thrown into different folders, files with names like
calculations_final_actual.R, and no clear sense of how to keep track of things. My first piece of reproducible coding came about due to a problem that I needed to solve. I wanted to calculate the Stark map of Rubidium, which is a fairly straightforward calculation with many smaller sub-problems to solve. The first step that entered my workflow was the mixture of using Git along with R Studio projects. Versioning allowed me to keep better track of my changes and easily return to early versions of the code if I came across problems down the line.
Reproducible reports were the next piece to enter my workflow. The first time that I truly started to understand the use of this report format, particularly using R Markdown, was testing changes and permutations for a calculation. It allowed me to write out the narrative of what and why I was changing the code. I later started using reproducible reports for all of my analysis as it let me easily reproduce figures and distribute the analysis to my colleagues.
There are still places that I can work on creating more reproducible code and I think the guide from the British Ecological Society does a great job of highlighting simple methods to encourage the production of reproducible code. I highly encourage scientists of all disciplines to think about reproducible code. Try getting your colleagues on board if you already create reproducible code. I agree with Roger Peng that reproducible coding is just a single step to solving the larger problems of reproducibility and replicability within the sciences, but I think it is an important step. For many researchers, reproducible code may be the first step they take to create a more rigorous research framework.