08/07/2021
Erik Kusch
1
Erik Kusch (erik.kusch@bio.au.dk), PhD
Student
Department of Biology
Section for Ecoinformatics & Biodiversity
Center for Biodiversity Dynamics in a
Changing World (BIOCHANGE)
Aarhus University
Erik Kusch
1
CODING PRACTICES
Life with R
08/07/2021 SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
2
Code Structure
SalGo-Team - Coding Practices
Code Folding
Braced Regions (“{…}”)
Code Sections
Code Sections
Title followed by 4:
“#”
“=”
-
Section Hierarchy determined
by preceding “#”
Requires Rstudio Version ≥ 1.4
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
08/07/2021
Erik Kusch
3
Header
SalGo-Team - Coding Practices
Information of:
Project Membership
Contents
Dependencies
Authorship
Can be used to track edit dates, but file versioning is better for this
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
08/07/2021
Erik Kusch
4
rm(list=ls())
Clears working directory
Assures your code is self-contained
Directories
Avoid hard-coding directories
Use getwd() and file.path() for soft-coded directories
Packages
Do not use library() to load packages
Automatically install packages that are needed, but not installed
Sourcing/Functionality
Source other code documents
Create small helper functions
Preamble
SalGo-Team - Coding Practices
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
08/07/2021
Erik Kusch
5
Data
SalGo-Team - Coding Practices
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
File Checks
Use file.exists(…) to check if data files are present
Download
httr::GET(…) for data download given a link
utils::unzip(…) for extraction of .zip archives
Loading
Do not load data via dialogue boxes!
Use code to load data
Manipulation
Do not manipulate data outside of R
Make all data manipulation traceable through R-Code
08/07/2021
Erik Kusch
6
Analysis
SalGo-Team - Coding Practices
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
Data
No more manipulation of data
Subsetting is permissible
Analyses
Variable Selection
Analyses
Model Comparison
Model Evaluation
Naming
Do not override model objects
Use unique names for model objects/outputs
08/07/2021
Erik Kusch
7
Export
SalGo-Team - Coding Practices
Header
Title
Contents
Author
Preamble
rm(list=ls()
Directories
Packages
Sourcing
Data
Download
Loading
Manipulation
Analysis
Statistical Tests/Models
Export
Results
Plots
Manipulated Data
Results
Export model objects as .Rdata with save(…)
Export model summaries as .txt with sink(…) … sink()
Plotting
Export ggplot2 figures with ggsave(…)
Export base plot figures with png(…) … dev.off()
Manipulated Data
Do not override original data files!
Append suffix to data name (e.g. “_clean”)
Do not save as excel-readable files unless strictly necessary
Use saveRDS(…) to store .rds files
08/07/2021
Erik Kusch
8
Naming Schemes
SalGo-Team - Coding Practices
Specificity
What does the object contain?
Spaces
“_”for objects and “.” for functions
Capitalisation
Choose your style, but be consistent
Classes
Suffix/Prefix to indicate object class
Iterators
Avoid single-letter iterators
Logical Operators
Do not use “T” or “F”
Object Function
Bad Name
mydata fun
Specificity
NDVI1981 Additionfunction
Spaces
NDVI_1981 Addition.function
Capitalisation
NDVI_1981 Addition.Function
Classes
NDVI1981_ras FUN.Addition
08/07/2021
Erik Kusch
9
Comments
SalGo-Team - Coding Practices
What?
Comments
Start with “#”
Are not executed as code
Why?
Convey goal of the code lines
Revisit code
How?
Good comments:
Specify (what the code is doing)
Justify (what is being done)
BAD!
GOOD!
08/07/2021
Erik Kusch
10
Object Assignment
Use “<-” instead of “=“
Readability
Spaces in function specification
Line breaks between individual chunks of code
Soft-wrap in RStudio to prevent scrolling side-ways
Consistency
Develop a style and be consistent in using it
Quality of Life with R
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
11
Reproducibility
SalGo-Team - Coding Practices
Why Care?
Reproducible research as gold-standard of science
Journal Requirements
Code sharing
Data sharing
How?
STOP. HARDCODING.
Test your scripts on
different machines.
08/07/2021
Erik Kusch
12
Reproducibility Cardinal Sins
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
13
Packages
SalGo-Team - Coding Practices
What People Do
Use library(…) to load packages
Not everyone has the same packages installed
Force installation via install.packages(…)
May disrupt local package versions
How To Do Better
Identify packages that need installing and only
install those prior to loading
08/07/2021
Erik Kusch
14
Directories
SalGo-Team - Coding Practices
What People Do
Hardcode working directories with setwd(…)
Others do not have access to your hard drive
How To Do Better
Organise projects in separate folders
Softcode working directories:
getwd(…) to index project folder
file.path(…) to index sub-directories
Project Folder
Data
Raw
Manipulated
Exports
Plots
Models
08/07/2021
Erik Kusch
15
File Checks
SalGo-Team - Coding Practices
What People Do
Overwrite existing data files
Can possibly break reproducibility and cement errors
Carry out analyses whose output is already present
Waste of processing time and power
How To Do Better
Check whether files are present with file.exists(…)
Load/Manipulate
Load/Write
Load/Analyse
08/07/2021
Erik Kusch
16
Random Processes
SalGo-Team - Coding Practices
What People Do
Sample data sets
Partition data into test and training data sets
May induce severe bias
How To Do Better
Set a seed to make random processes reproducible
set.seed(…)
Sample multiple times and average out to remove bias
Bootstrapping
08/07/2021
Erik Kusch
17
Reproducibility - Reporting
SalGo-Team - Coding Practices
Code is as essential as your written
report.
08/07/2021
Erik Kusch
18
Using Rmarkdown for your research comes with a multitude of advantages:
1. Entire workflow in one program (RStudio)
2. Research and reports reproducible at the click of one button
3. Combines R functionality and LATEX formatting (if desired)
4. Consistent formatting
5. Clear presentation of code
6. Dynamic documents (you can generate various output document types)
7. Applicable for almost all document types you may desire as an output
Reporting Code
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
19
More Quality of Life Improvements
SalGo-Team - Coding Practices
Everything hereafter is nice-to-have or nice-
to-do, but not essential.
08/07/2021
Erik Kusch
20
Functions & Sourcing
SalGo-Team - Coding Practices
Big projects lend themselves well to a multi-document workflow:
Functions
Fun <- function(…){} to create a custom
function
Useful for:
Soft-coding analyses
Repeating code for different
input parameters
Should be:
internally consistent
well-documented
easy to understand
Sourcing
Source(…) command to load/execute
R scripts
Useful for:
Keeping code structured and
concise
Storing extra functionality
Should:
Use sensible file names
Sourced functions need to be
called
08/07/2021
Erik Kusch
21
Progress Bar
Updates on how code is progressing
Especially useful when your code involves loops
Estimators
When to expect code to finish processing
Most useful in loop-based approaches as they need a
baseline for the estimation
Progress Bar & Estimators
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
22
Errors
Stop code execution
Report message to console
Warnings
Do not stop code execution
Report message to console
User-Input
Pause code execution
Wait for user-input in console
Errors, Warnings & User-Inputs
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
23
By default, R only uses one core
Inefficient use of computational power
Use parallel processing to make use of remaining cores
Particularly powerful when paired with eval(parse(text=“…”))
Parallel Processing
SalGo-Team - Coding Practices
08/07/2021
Erik Kusch
24
Versioning
SalGo-Team - Coding Practices
Use GitHub to:
Track code and development
You may revert to previous versions of code if you broke
something
Share code with others
You can make available R functionality and packages on GitHub
Collaborate with others
Pull requests and the issue feature allow for collaborative code
development