TROUBLESHOOTING R
Isolating Issues and Asking Questions
Erik Kusch
erik.kusch@au.dk
Section for Ecoinformatics & Biodiversity
Center for Biodiversity and Dynamics in a Changing World (BIOCHANGE)
Aarhus University
17/02/2021
Aarhus University Biostatistics - Why? What? How? 1 / 19
1 Dealing with Problems
Problem-Sources
Approaching Problems
2 Reproducibiliy
What & Why
How
3 Reporting Issues and Asking for Help
Minimal Working Examples
Minimality
Completeness
Reproducibility
Aarhus University Biostatistics - Why? What? How? 2 / 19
Dealing with Problems
1 Dealing with Problems
Problem-Sources
Approaching Problems
2 Reproducibiliy
What & Why
How
3 Reporting Issues and Asking for Help
Minimal Working Examples
Minimality
Completeness
Reproducibility
Aarhus University Biostatistics - Why? What? How? 3 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Problem-Sources
So your analysis is misbehaving. . .
Start by investigating common errror sources:
Code Errors
Capitalisation
Mismatched...
Object types
Value modes
Object dimensions
Directory structures
Non-self-contained
scripts
Mis-Specified Statistics
Assumptions not met
Erroneous outliers
Mis-interpretation of
outputs
Incorrect data
handling
Unexpected Findings
"[...] in an experimental
framework, surprising
results are significant
results, rather than signs
of failure."
Curtin & Parker, 2014
How do we resolve these?
Aarhus University Biostatistics - Why? What? How? 4 / 19
Dealing with Problems Approaching Problems
Errors and Warnings
Error
R: “I can’t and won’t. You won’t force
me. I quit. I am done.
Warning
R: “I can, but you might not like the
output better look at that, my guy.
http://rex-analytics.com/decoding-error-messages- r/
Aarhus University Biostatistics - Why? What? How? 5 / 19
Dealing with Problems Approaching Problems
Breaking problems down
Investigate objects and their contents
Read function help files (especially the arguments and values sections)
Simulate data if in doubt
Sousa et al., 2007
Your analyses hardly ever fail at all steps.
Aarhus University Biostatistics - Why? What? How? 6 / 19
Dealing with Problems Approaching Problems
Breaking problems down
Investigate objects and their contents
Read function help files (especially the arguments and values sections)
Simulate data if in doubt
Sousa et al., 2007
Your analyses hardly ever fail at all steps.
Aarhus University Biostatistics - Why? What? How? 6 / 19
Dealing with Problems Approaching Problems
Breaking problems down
Investigate objects and their contents
Read function help files (especially the arguments and values sections)
Simulate data if in doubt
Sousa et al., 2007
Your analyses hardly ever fail at all steps.
Aarhus University Biostatistics - Why? What? How? 6 / 19
Reproducibiliy
1 Dealing with Problems
Problem-Sources
Approaching Problems
2 Reproducibiliy
What & Why
How
3 Reporting Issues and Asking for Help
Minimal Working Examples
Minimality
Completeness
Reproducibility
Aarhus University Biostatistics - Why? What? How? 7 / 19
Reproducibiliy What & Why
What it is and why you need to care
Reproducibility ensures:
Honest and true reporting
Results can be scrutinised
Problems can be traced and identified
Co-authors and reviewers get less annoyed
Our Goal:
An analysis is reproducible when it can be executed by different
researchers across operating systems with a single button prompt and
return the same results for everyone.
Aarhus University Biostatistics - Why? What? How? 8 / 19
Reproducibiliy How
Coding Styles
Code
Consistency
The same result on every execution.
The same naming scheme throughout.
Self-contained - Code runs in an empty R environment to completion.
Documentation
Commenting - Informative comments for:
What each line is doing/supposed to do.
Why each line is doing what it does/is supposed to do.
Header - Information of what the script does, and who authored it.
File-Versioning - GitHub or an equivalent to trace when a script broke.
Aarhus University Biostatistics - Why? What? How? 9 / 19
Reproducibiliy How
Packages
Never include forced package installation in a script.
Instead, do something like this:
install.load.package <- function(x) {
if (!require(x, character.only = TRUE))
install.packages(x, repos='http://cran.us.r-project.org')
require(x, character.only = TRUE)
}
package_vec <- c(
"rethinking", # for quadratic approximation of posteriors
"reshape2" # for data reformatting
)
sapply(package_vec, install.load.package)
Aarhus University Biostatistics - Why? What? How? 10 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Make an Omlette out of Easter Eggs
1 Find the egg(s)/data
2 Open the egg(s)/data
3 Make the egg(s) into an omelette /
do your analysis
.... but... where are the eggs/the data?
But what if you want to hunt for Easter Eggs/data somewhere other than the
garden at Achmore/your specific project folder?
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Working Directories
Never include hard-coded directory indexing.
Soft-code your working directories from a base/project directory:
Dir.Base <- getwd() # to find the project folder
Dir.Data <- file.path(Dir.Base, "Data") # index the data folder
Now we have our directories indexed without any hardcoding:
Dir.Base
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions"
Dir.Data
## [1] "D:/Documents/Teaching/Excursions-into-Biostatistics/Troubleshooting - Isolating Issues and Asking Questions/Data"
Aarhus University Biostatistics - Why? What? How? 11 / 19
Reproducibiliy How
Randomness
Always make random processes reproducible.
Imagine you survey the elevation of a random
surface with paratroopers:
Clouds obscure the landscape
All flights are on the same route
Troopers report elevation of their landing
There is no drift of paratroopers
Troopers jump at fixed intervals starting at a
certain time after take-off
How can we get the same elevations from the troopers of two different
flights?
rnorm(5, 1, 0.2)
## [1] 0.8796 1.1960 1.2139 0.9883 0.9878
rnorm(5, 1, 0.2)
## [1] 1.1414 0.7078 1.3137 1.1270 1.0204
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
A seed needs to be set before every random process.
Aarhus University Biostatistics - Why? What? How? 12 / 19
Reproducibiliy How
Randomness
Always make random processes reproducible.
Imagine you survey the elevation of a random
surface with paratroopers:
Clouds obscure the landscape
All flights are on the same route
Troopers report elevation of their landing
There is no drift of paratroopers
Troopers jump at fixed intervals starting at a
certain time after take-off
How can we get the same elevations from the troopers of two different
flights?
rnorm(5, 1, 0.2)
## [1] 0.8796 1.1960 1.2139 0.9883 0.9878
rnorm(5, 1, 0.2)
## [1] 1.1414 0.7078 1.3137 1.1270 1.0204
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
A seed needs to be set before every random process.
Aarhus University Biostatistics - Why? What? How? 12 / 19
Reproducibiliy How
Randomness
Always make random processes reproducible.
Imagine you survey the elevation of a random
surface with paratroopers:
Clouds obscure the landscape
All flights are on the same route
Troopers report elevation of their landing
There is no drift of paratroopers
Troopers jump at fixed intervals starting at a
certain time after take-off
How can we get the same elevations from the troopers of two different
flights?
rnorm(5, 1, 0.2)
## [1] 0.8796 1.1960 1.2139 0.9883 0.9878
rnorm(5, 1, 0.2)
## [1] 1.1414 0.7078 1.3137 1.1270 1.0204
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
A seed needs to be set before every random process.
Aarhus University Biostatistics - Why? What? How? 12 / 19
Reproducibiliy How
Randomness
Always make random processes reproducible.
Imagine you survey the elevation of a random
surface with paratroopers:
Clouds obscure the landscape
All flights are on the same route
Troopers report elevation of their landing
There is no drift of paratroopers
Troopers jump at fixed intervals starting at a
certain time after take-off
We let all flights
commence their jump at the
same time after take-off.
How can we get the same elevations from the troopers of two different
flights?
rnorm(5, 1, 0.2)
## [1] 0.8796 1.1960 1.2139 0.9883 0.9878
rnorm(5, 1, 0.2)
## [1] 1.1414 0.7078 1.3137 1.1270 1.0204
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
A seed needs to be set before every random process.
Aarhus University Biostatistics - Why? What? How? 12 / 19
Reproducibiliy How
Randomness
Always make random processes reproducible.
rnorm(5, 1, 0.2)
## [1] 0.8796 1.1960 1.2139 0.9883 0.9878
rnorm(5, 1, 0.2)
## [1] 1.1414 0.7078 1.3137 1.1270 1.0204
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
set.seed(42); rnorm(5, 1, 0.2)
## [1] 1.2742 0.8871 1.0726 1.1266 1.0809
A seed needs to be set before every random process.
Aarhus University Biostatistics - Why? What? How? 12 / 19
Reporting Issues and Asking for Help
1 Dealing with Problems
Problem-Sources
Approaching Problems
2 Reproducibiliy
What & Why
How
3 Reporting Issues and Asking for Help
Minimal Working Examples
Minimality
Completeness
Reproducibility
Aarhus University Biostatistics - Why? What? How? 13 / 19
Reporting Issues and Asking for Help Minimal Working Examples
What is a Minimal Working Example (MWE)?
A reproducible, simplified version of code producing a desired outcome.
Minimal. Information and code is reduced as much as possible. This
involves reducing:
Lines of R code
Number of R packages needed
Reliance on directory structures
Description of work, result, aim, and potential error
Complete. All information, data, and code necessary to execute the MWE
is presented.
Reproducible. The MWE returns the same output on all operating
systems when executed by any person. Ideally, this happens at the push of
a single button.
Aarhus University Biostatistics - Why? What? How? 14 / 19
Reporting Issues and Asking for Help Minimality
Minimal Description
The Goal: I want to use a k-means clustering approach to identify biome
classes across Alaska during the year 1982 using GIMMs NDVI 3g annual
mean and seasonality values.
Method(s) & Material(s):
- Data:
- GIMMs NDVI 3g (9x9km index of vegetation performance)
- Natural Vector Shapefiles (shapefile-collection of states and provinces)
- Method:
- K-Means-Clustering implemented via the mclust package
The Problem: Call to mclustBIC() produces the following error:
Error in mvnXII(data = data, prior = prior) : NA/NaN/Inf
in foreign function call (arg 1)
Aarhus University Biostatistics - Why? What? How? 15 / 19
Reporting Issues and Asking for Help Minimality
Minimal Data - Before Reducing to MWE
The Data on our end before handing it off:
Data needs to be reduced to the important parts (
All1982_ras
, in this case).
We can export an R environment with the save.image() function.
The Data for the MWE that we hand over:
Aarhus University Biostatistics - Why? What? How? 16 / 19
Reporting Issues and Asking for Help Minimality
Minimal Code
Original Script: MWE Script:
We removed code for:
Download of data
Cropping and masking of data
Calculation of compound metrics
Aarhus University Biostatistics - Why? What? How? 17 / 19
Reporting Issues and Asking for Help Completeness
Complete Reporting
We also supply:
Archive of our raw data directory
Script which contains all the code
we removed from full script when
creating MWE script
We may also want to provide:
- sessionInfo()
- rstudioapi::versionInfo()
- Metadata if needed
MWE Script:
Aarhus University Biostatistics - Why? What? How? 18 / 19
Reporting Issues and Asking for Help Reproducibility
Reproducible Issues
We test our MWE script:
- In a new, empty directory by sourcing
the script
- For completeness by running it on an
empty R environment and by adding
rm(list=ls()) to the start of the
script
Our script still produces the same
error?
We ship it off to someone whom we
hope can help us.
https://knowyourmeme.com/photos/1297214-impossible-perhaps- the-
archives-are- incomplete
Aarhus University Biostatistics - Why? What? How? 19 / 19