Emile Latour
August 24, 2016
Well organized files enable you to
Using tools such as R, knitr/rmarkdown, and markup languages like LaTeX require detailed knowledge of where files are stored in your computer.
To enable reproducibility, file management may require command line tools to access and organize files.
By typing commands, you are documenting all the steps that you take, which has a big advantage over clicking and dragging with a cursor.
In this chapter, we will:
Discussion in this chapter focuses on files locally stored on your computer. As opposed to remotely stored in the cloud.
Operating systems covered in the text organize files in hierarchical directories, aka file trees.
Directories can be thought of as the folders that you see in Windows or Mac. Gandrud uses the terms “directory” and “folder” interchangeably.
Directories are hierarchcial because they are located inside other directories.
A root directory is the first level in a disk such as a hard drive.
Windows notation
C:\
Unix-like systems (Macs and Linux computers) notation
/
Directories inside other directories are also referred to as child directoires of a parent directory.
Windows computers separate sub-directories with the back slash (\
). For example:
C:\ExampleProject\Data
When you type a Windows file path in R, you need to use two backslashes rather than one:
C:\\ExampleProject\\Data
Another option for writing Windows file names in R is to use one forward slash (/
).
C:/ExampleProject/Data
Unix-like systems, including Mac computers, are indicate with a forward slash (/
). For example:
/ExampleProject/Data
Note: in Unix-like systems, the forward slash (/
) with nothing before it indicates the root directory.
/ExampleProject/Data # subdirectory of the root
ExampleProject/Data # subdirectory of the working directory
It is important to now what your current working directory is.
The working directory is the directory where the program automatically looks for the files and other directories, unless indicated otherwise.
It is a good idea to used relative file paths rather than absolute to make your code less dependent on the particular file structure of a different computer.
Absolute file path specifies the child directory all the way back to the root directory
/ExampleProject/Data
Relative file path specifies the child relative to the working directory. So if the current working directory is ExampleProject, then
Data/
It is good practice to avoid spaces in file and directory names. Spaces can sometimes create problems for computer programs trying to read the file path.
This would also apply to file names.
A convention should be adopted that makes multi-word names easily readable without using spaces.
# Examples of CamelBack
/ExampleProject/Data
fitModels.R
# Underscore examples
/example_project/Data
fit_models.R
# Dash examples
/example-project/Data
fit-models.R
Sub-directory (Data) containing the data and the R files to perform the data management.
The nested file structure allows for use of relative file paths.
It is good practice to include the system information for the R session you used to create the project. This can be done by writing your README file in R Markdown and including the command sessionInfo()
in a knitr code chunk.
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.4 tools_3.3.1 htmltools_0.3.5
## [5] yaml_2.1.13 Rcpp_0.12.6 stringi_1.1.1 rmarkdown_1.0
## [9] knitr_1.14 stringr_1.1.0 digest_0.6.10 evaluate_0.9
If using RStudio, you may want to organize files as a Project. The author gives steps to turn an existing directory into a project as follows:
File
in the RStudio menu bar.New Project
and a new window will pop up.Existing Directory
.Browse
button to find the directory that you want to turn into and RStudio Project.Create Project
.Notice that RStudio has put a file with the extension .Rproj
in the directory.
Commands for handling and navigating through files.
getwd
Find your current working directory
# Show the current working directory used when knitting this presentation.
getwd()
## [1] "H:/RRR-JC/RRRR_Chapter4_Latour"
list.files
See all of the files and sub-directories in the current working directory.
# See files in current working directory
list.files()
## [1] "images" "RRRR-Chapter-4-Outline.docx"
## [3] "RRRR-Chapter-4-Outline.html" "RRRR-Chapter-4-Outline.pdf"
## [5] "RRRR-Chapter-4-Outline.Rmd" "RRRR-Chapter4-Presentation.html"
## [7] "RRRR-Chapter4-Presentation.Rmd" "RRRR_Chapter4_8-24-16.html"
## [9] "Thumbs.db"
You can also list files in other directories by specifying the path.
# See files in other directories
directoryPath <- "S:/BSR_Project"
list.files(directoryPath)
## [1] "BreeMitchell" "Jansen_Lynn" "JeffTyner"
## [4] "Lastname" "Leachman_Sancy" "Loayza"
## [7] "Luai Zarour" "Nesmith_Meghan" "Nima_Nabavizadeh"
## [10] "Oncology_surveys" "Ryan_Christopher" "Thomas_George"
## [13] "Tsikitis"
setwd
Set the current working directory.
setwd("S:/BSR_Project/lastName_firstName")
Note: Setting the directory in a code chunk will change the working directory for all subsequent code chunks.
root.dir
Resets the root (or working) directory for all code chunks.
root.dir("S:/BSR_Project/lastName_firstName")
This function does not seem to be supported or maintained anymore. There is no documentation in R and it cannot find the function.
The text says that nested file structures are preferable, rather than using this function.
dir.create
Creates a directory.
dir.create("S:/BSR_Project/lastName_firstName")
file.create
Creates a new blank file.
dir.create("S:/BSR_Project/lastName_firstName/SoureCode.R")
cat
Creates a new file and puts text into it.
cat("Reproducible Research",
"S:/BSR_Project/lastName_firstName/SoureCode.R")
Warning: The cat
command will overwrite existing files with new content. To add text to existing files use the append = TRUE
argument.
unlink
Deletes files and directories. The command permanently deletes files, so be very careful.
unlink("S:/BSR_Project/lastName_firstName/SoureCode.R")
file.rename
Renames a file.
file.rename(from = "S:/BSR_Project/lastName_firstName/SoureCode.R",
to "S:/BSR_Project/lastName_firstName/DataManagement.R")
It can also be used to move a file from one directory to another. It will not create new directories.
file.rename(from = "S:/BSR_Project/lastName_firstName/SoureCode.R",
to "S:/Share/OCC_DATA/lastName_firstName/SourceCode.R")
file.copy
Copies the file to another directory.
file.copy(from = "S:/BSR_Project/lastName_firstName/SoureCode.R",
to "S:/Share/OCC_DATA/lastName_firstName/SourceCode.R")
Not covered in this talk. See pages 74–78 if interested in reading into this topic further.
In RStudio, the Files pane lets you navigate the file tree and do some basic file manipulations.
The Files pane is a GUI, so the actions are not as easily reproducible as the command covered earlier.