Reproducible Research with R and RStudio

Chapter 4 - Getting Started with File Management

Emile Latour

August 24, 2016

Introduction

Well organized files enable you to

Using tools such as R, knitr/rmarkdown, and markup languages like LaTeX require detailed knowledge of where files are stored in your computer.

Introduction

To enable reproducibility, file management may require command line tools to access and organize files.

By typing commands, you are documenting all the steps that you take, which has a big advantage over clicking and dragging with a cursor.

Introduction

In this chapter, we will:

  1. Discuss how a reproducible research project may be organized.
  2. Cover file path naming conventions.
  3. Organize files with RStudio Projects.
  4. Cover some basic R and Unix-like shell commands (Cover R only).
  5. See how to navigate files in RStudio in the Files pane.

Discussion in this chapter focuses on files locally stored on your computer. As opposed to remotely stored in the cloud.

File Trees

Project File Tree

Root directories

A root directory is the first level in a disk such as a hard drive.


Windows notation

C:\

Unix-like systems (Macs and Linux computers) notation

/

Subdirectories & parent directories

Directories inside other directories are also referred to as child directoires of a parent directory.

Windows file path in R

Windows computers separate sub-directories with the back slash (\). For example:

C:\ExampleProject\Data

When you type a Windows file path in R, you need to use two backslashes rather than one:

C:\\ExampleProject\\Data

Another option for writing Windows file names in R is to use one forward slash (/).

C:/ExampleProject/Data

Unix-like systems file path in R

Unix-like systems, including Mac computers, are indicate with a forward slash (/). For example:

/ExampleProject/Data

Note: in Unix-like systems, the forward slash (/) with nothing before it indicates the root directory.

/ExampleProject/Data # subdirectory of the root

ExampleProject/Data # subdirectory of the working directory

Working directories

More on this later.

Absolute vs. relative paths

Spaces in directory & file names

Naming conventions

# Examples of CamelBack 
/ExampleProject/Data
fitModels.R
# Underscore examples
/example_project/Data
fit_models.R
# Dash examples
/example-project/Data
fit-models.R

Organizing your research project

Things to note in the example file tree

README

sessionInfo()

It is good practice to include the system information for the R session you used to create the project. This can be done by writing your README file in R Markdown and including the command sessionInfo() in a knitr code chunk.

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.4     tools_3.3.1     htmltools_0.3.5
##  [5] yaml_2.1.13     Rcpp_0.12.6     stringi_1.1.1   rmarkdown_1.0  
##  [9] knitr_1.14      stringr_1.1.0   digest_0.6.10   evaluate_0.9

Setting directories as RStudio Projects

If using RStudio, you may want to organize files as a Project. The author gives steps to turn an existing directory into a project as follows:

  1. Click on File in the RStudio menu bar.
  2. Select New Project and a new window will pop up.
  3. Select the option Existing Directory.
  4. Click on the Browse button to find the directory that you want to turn into and RStudio Project.
  5. Select Create Project.

Notice that RStudio has put a file with the extension .Rproj in the directory.

RStudio Projects

R file manipulation commands

Commands for handling and navigating through files.

getwd

Find your current working directory

# Show the current working directory used when knitting this presentation.
getwd()
## [1] "H:/RRR-JC/RRRR_Chapter4_Latour"

list.files

See all of the files and sub-directories in the current working directory.

# See files in current working directory
list.files()
## [1] "images"                          "RRRR-Chapter-4-Outline.docx"    
## [3] "RRRR-Chapter-4-Outline.html"     "RRRR-Chapter-4-Outline.pdf"     
## [5] "RRRR-Chapter-4-Outline.Rmd"      "RRRR-Chapter4-Presentation.html"
## [7] "RRRR-Chapter4-Presentation.Rmd"  "RRRR_Chapter4_8-24-16.html"     
## [9] "Thumbs.db"

You can also list files in other directories by specifying the path.

# See files in other directories
directoryPath <- "S:/BSR_Project"
list.files(directoryPath)
##  [1] "BreeMitchell"     "Jansen_Lynn"      "JeffTyner"       
##  [4] "Lastname"         "Leachman_Sancy"   "Loayza"          
##  [7] "Luai Zarour"      "Nesmith_Meghan"   "Nima_Nabavizadeh"
## [10] "Oncology_surveys" "Ryan_Christopher" "Thomas_George"   
## [13] "Tsikitis"

setwd

Set the current working directory.

setwd("S:/BSR_Project/lastName_firstName")

Note: Setting the directory in a code chunk will change the working directory for all subsequent code chunks.

root.dir

Resets the root (or working) directory for all code chunks.

root.dir("S:/BSR_Project/lastName_firstName")

This function does not seem to be supported or maintained anymore. There is no documentation in R and it cannot find the function.

The text says that nested file structures are preferable, rather than using this function.

dir.create

Creates a directory.

dir.create("S:/BSR_Project/lastName_firstName")

file.create

Creates a new blank file.

dir.create("S:/BSR_Project/lastName_firstName/SoureCode.R")

cat

Creates a new file and puts text into it.

cat("Reproducible Research", 
    "S:/BSR_Project/lastName_firstName/SoureCode.R")

Warning: The cat command will overwrite existing files with new content. To add text to existing files use the append = TRUE argument.

file.rename

Renames a file.

file.rename(from = "S:/BSR_Project/lastName_firstName/SoureCode.R", 
            to  "S:/BSR_Project/lastName_firstName/DataManagement.R")

It can also be used to move a file from one directory to another. It will not create new directories.

file.rename(from = "S:/BSR_Project/lastName_firstName/SoureCode.R", 
            to  "S:/Share/OCC_DATA/lastName_firstName/SourceCode.R")

file.copy

Copies the file to another directory.

file.copy(from = "S:/BSR_Project/lastName_firstName/SoureCode.R", 
            to  "S:/Share/OCC_DATA/lastName_firstName/SourceCode.R")

Unix-like shell commands for file management

Not covered in this talk. See pages 74–78 if interested in reading into this topic further.

File navigation in RStudio

In RStudio, the Files pane lets you navigate the file tree and do some basic file manipulations.

The Files pane is a GUI, so the actions are not as easily reproducible as the command covered earlier.