The source code published on Github alongside the original study combines three different coding languages: python, R and Julia. Although the code is generally clearly written and includes some explanatory comments, several adjustments in the code are required to enable it to run on a different machine. This chapter focuses on the configuration (operating system, statistical software and specific libraries) required to run the program.
3.1 Procedure
Guidelines by scientific journals to promote the reproducibility of published scientific articles recommend the authors to publish their code including a file named README that specifies the configuration of the computing environment and the procedure to follow to obtain the same results (for guidance on code publications, see here for social science journals or here for Nature journals. This source code only includes an empty README file, so we made several attempts to guess a working configuration. We tested the code on the following platforms:
A Windows personal computer with 32 Go RAM and a 8 cores processor: the file 001 - dl_GEE.pysuccessfully ran, but the subsequent code files failed, because they call on Linux commands (e.g. wget) or rely on libraries that seem to be unix native.
A Linux (Ubuntu 22.04) computer with 8 Go RAM and a 4 cores processor: The code failed to run due to a lack of memory.
A Linux pod (Ubuntu 22.04) on a Kubernetes server (Onyxia/SSP Cloud) running aDocker image with R and R Studio on which we install a python distribution (miniconda) with the R package {reticulate} and a 18.5 Julia distribution with the R package {JuliaCall}. The script files 002 - dl_IUCN.R, 003 - species_richness.py and 004 - join_rasters.R run successfully (after several trials and errors and including some corrections in the code, documented in the following chapters). However, the script 005 - covariate_matching.jl fails to start running, apparently because Julia does not successfully identify dependencies required by the ArchGDAL package (CURL_4), dependencies that are nevertheless present on the system.
A Linux pod (Ubuntu 22.04) on a Kubernetes server (Onyxia/SSP Cloud) running aDocker image with python and Julia where we install R and spatial dependenties with Linux apt package manager (following a procedure documented by ThinkR on RTasks). The scripts files 002 - dl_IUCN.R, 003 - species_richness.py and 004 - join_rasters.R run succesfully and the script 005 - covariate_matching.jl starts running, but fails further down in the execution process. Because it happens in parallel processing, the error messages are not meaningful and we need to contact the authors to help us identify the cause of the error.
Working with Kubernetes pods enables to mobilize large memory and processing resources, and to flexibly adapt the configuration. However, pods must be deleted after use and re-created for each new use, which is time consuming. With the help of Onyxia admin, we are preparing a docker image that includes R, python and Julia with all the required package. This should make quicker the re-creation process.
3.2 Technical environment and prerequisites
We document the execution process by including all package installation, code modifications and script execution in a literate programming format. It is possible to combine different programming languages in litterate programming platforms such as Jupyter or RMarkdown, or its new generation Quarto. We decided to use Quarto because of its versatility and our familiarity of this tool. Quarto can be obtained at www.quarto.org.
The code also requires Linux or a Windows machine running Windows subsystem for Linux, otherwise a substantial rewriting of some scripts is needed. We run it on a Windows personal computer. We have not tried, but it might be possible to run it on Mac as it is similar to Linux in several regards (both are unix systems).
The first script 001 - dl_GEE.py requires to have a gmail account to access google services and to registrer on Google Earth Engine.
3.3 Set up R
The following script installs the R dependies called in the code. We are not certain that all dependencies are effectively used in the script.
Code
# Install from Github --------------------------------------------------------# Some packages need to be installed from developper sources because there are# not available or official sources have some issues.# Installing version 1.3.5 which is the last working version apparentlyif (system.file(package ="doMC") =="") { remotes::install_github("https://github.com/cran/doMC/tree/fbea362b96cc4469deb6065ff9fbd5d4794ccac1")} if (system.file(package ="gdalUtils") =="") { remotes::install_github("https://github.com/cran/gdalUtils", upgrade =FALSE)} if (system.file(package ="ggeasy") =="") { remotes::install_github("jonocarroll/ggeasy")} if (system.file(package ="velox") =="") { remotes::install_github("https://github.com/hunzikp/velox", upgrade =FALSE)} if (system.file(package ="rnaturalearth") =="") { remotes::install_github("https://github.com/ropensci/rnaturalearth")} if (system.file(package ="gdalUtils") =="") { remotes::install_github("gearslaboratory/gdalUtils")} if (system.file(package ="geoarrow") =="") { remotes::install_github("paleolimbot/geoarrow")} # Install from CRAN ------------------------------------------------------------# These packages are available from the usual source from Rrequired_packages <-c( # List all required R packages"reticulate", # To interact with python (normally installed with Quarto)"JuliaCall", # To interact with Julia "tidyverse", # To facilitate data manipulation"aws.s3", # to interact with S3# All packages below are used in Wolf and al. code files:"countrycode","cowplot","data.table","dtplyr","fasterize","foreach","foreign","ggforce","ggplot2","ggrepel","GpGp","grid","jsonlite","landscapetools","lme4","MCMC.OTU","ncdf4","parallel","pbapply","plyr","raster","rasterVis","rbounds","RColorBrewer","RCurl","readr","reshape2","rgdal","rjson","rnaturalearth","scales","sf","smoothr","spaMM","spgwr","spmoran","spNNGP","stars","stringr","tidyverse",# "unix","velox","viridis","wbstats","wdpar") missing <-!(required_packages %in%installed.packages())# Install if(any(missing)) install.packages(required_packages[missing], repos ="https://cran.irsn.fr/")
3.4 Set up python
The following scripts install a python distribution (miniconda) with the R package reticulate. It also installs several python packages. Note that several of these packages are not effectively called within the R code.
Code
library(reticulate) # to run python from R# Variables to modifymy_envname <-"replication-wolf"scripts_to_run <-c("003") # or c("001", "003") or "001"# Install python if not already presentif (!dir.exists(miniconda_path())) {install_miniconda()}# conda_remove(my_envname) # for debugging purposes# Create environment if not alreadyif (!my_envname %in%conda_list()$name) {conda_create(my_envname)}# Packages needed for each scriptrequirements <-list("001"=c("earthengine-api", "rasterio", "pandas", "pydrive"),"003"=c("fiona", "rasterio", "ray[default]", "dbfread", "pandas"))# Combined depending on the variable defined at beginning of code chunkrequired <-unique(unlist(requirements[scripts_to_run]))# identify which ones are missingpy_installed_packages <-py_list_packages(my_envname)$packagemissing_packages <- required[!required %in% py_installed_packages]# Install thoseif (length(missing_packages) >0) {conda_install(envname = my_envname, packages = missing_packages,pip =TRUE)}# Adding a dependency required on linux platformsif ((Sys.info()["sysname"] =="Linux") & (!"libstdcxx-ng"%in% py_installed_packages)) {conda_install(envname = my_envname,packages ="libstdcxx-ng") } # Activate the corresponding environmentuse_condaenv(my_envname)
Code
library(reticulate) # to run python from R# Variables to modifymy_envname <-"replication-wolf2"# py_list_packages(my_envname)# system("strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX")# system('conda install -c "conda-forge/label/gcc7" libstdcxx-ng')# # conda_install()# Install python if not already presentif (!dir.exists(miniconda_path())) {install_miniconda()}# Create environment if not alreadyif (my_envname %in%conda_list()$name) {conda_remove(my_envname)} conda_create(my_envname)requirements_pip <-c("fiona", "ray[default]", "rasterio", "ray", "dbfread", "pandas")requirements_conda <-c("libstdcxx-ng")# identify which ones are missingpy_installed_packages <-py_list_packages(my_envname)$packagemissing_pip <- requirements_pip[!requirements_pip %in% py_installed_packages]missing_conda <- requirements_conda[!requirements_conda %in% py_installed_packages]# Install thoseif (length(missing_pip) >0) {conda_install(envname = my_envname, packages = missing_pip,pip =TRUE)}if (length(missing_conda) >0) {conda_install(envname = my_envname, packages = missing_conda)}# Activate the corresponding environmentuse_condaenv(my_envname)
Create an authorization
Code
from pydrive.auth import GoogleAuthfrom pydrive.drive import GoogleDrivegauth = GoogleAuth()# Try to load saved client credentialsgauth.LoadCredentialsFile("mycreds.txt")if gauth.credentials isNone:# Authenticate if they're not there gauth.LocalWebserverAuth()elif gauth.access_token_expired:# Refresh them if expired gauth.Refresh()else:# Initialize the saved creds gauth.Authorize()# Save the current credentials to a filegauth.SaveCredentialsFile("mycreds.txt")
3.5 Set up Julia
First we install or set-up Julia from R if needed.
The script 002 - dl_IUCN.R includes a system command that wget that refers to a fownloading software that is included in UNIX platforms (Mac and Linux).
On Linux, if wget is not available, the user must run the same commands without wsl, that is run the command sudo apt-get update followed by the command sudo apt-get install parallel.
Code
if (Sys.info()["sysname"] =="Linux") {system("sudo apt update")system("sudo apt install -y parallel")}
It is possible however to run it on Windows, if and only the Windows system includs Windows Subsystem for Linux. In that case, if wsl is not already installed the user must first install parallel that includes wget, runing the command wsl sudo apt-get update followed by the command wsl sudo apt-get install parallel.
Code
# Replace all wget calls by wsl wgetif (Sys.info()["sysname"] =="Windows") {replace_all("002 - dl_IUCN.R", "wget", "wsl wget")}