RRRR 8. Statistical Modeling and Knitr

Jeong Lim

August 31, 2016

Introduction

Focus on how to make your analysis really reproducible

Code Chunks

```{r}

```

Code chunk with option

```{r NAME, OPTIONS HERE}

```

Global option

```{r,  include=FALSE }
knitr::opts_chunk$set(OPTIONS HERE)
```

Chunk options

echo=FALSE        Don't include the code
results="hide"    Don't include the output 
include=FALSE     Don't show code or output
eval=FALSE        Don't evaluate the code at all

collapse=TRUE     Collapse all the source and ouput blocks into a single block    
warning=FALSE     Don't show R warnings
message=FALSE     Don't show R messages 
error=FALSE       Don't show R error 
cache=TRUE        Cache code chunk
tidy=TRUE         Reformat code in a tidy way
comment=NA        Remove ## 

Chunk options example

code chunk without option

```{r withoutoption}
cars[1:4,]
sum(cars$speed)
sum(cars$dist)
mean(cars)
plot(cars$speed, cars$dist)
``` 

It looks like

cars[1:4,]
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
sum(cars$speed)
## [1] 770
sum(cars$dist)
## [1] 2149
mean(cars)
## Warning in mean.default(cars): argument is not numeric or logical:
## returning NA
## [1] NA
plot(cars$speed, cars$dist)

code chunk with options

```{r withoption, echo=2:3,  warning=FALSE, collapse=TRUE, comment=NA, fig.align='center', fig.width=4} 
cars[1:4,]
sum(cars$speed)
sum(cars$dist)
mean(cars)
plot(cars$speed, cars$dist)
```

It looks like

  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
  sum(cars$speed)
[1] 770
sum(cars$dist)
[1] 2149
[1] NA

Chunk options example:tidy

library(formatR)
source("C:/Users/limje/Desktop/Reproducible/Jeong/UglyScript.R")
tidy_source("UglyScript.R", file="BeautifulScript.R", arrow=getOption("formatR.arrow", TRUE))

Showing code & result inline

ex) This is example R code : MeanRiver <- mean(rivers)

ex) The mean length of 141 major rivers in North America is 591

Dynamically including Modular Analysis files

Source from a local file

source("C:/Users/limje/Desktop/Reproducible/Jeong/MainAnalysis.R")

Source from a secure URL

library(devtools)
source_url("http://bit.ly/1D5p1w6")
## SHA-1 hash of file is ff75a88b90decfcaefc9903bbc283e1fc4cd2339

SHA-1 hash is a unique number for the file. If the file changes, its SHA-1 hash will change

Computationally intensive analysis: cache=TRUE

create an object Sample to a file called Sample.RData

   ```{r gen-data, cache=TRUE} 
   # create data
   Sample<-rnorm(n=1000, mean=5, sd=2)
   # save sample
   save(Sample, file="Sample.RData")
   ```

latter code chunk for creating the histogram

   ```{r histgram, cache=TRUE, dependson='gen-data'}
   # load Sample
   load(file="Sample.RData")
   # create histogram
   hist(Sample)
   ```

Running SAS code using chunk option engine

  ```{r, engine="sas", engine.path="C:/Program Files/SASHome/SASFoundation/9.4/sas.exe"} 
  proc means data=sashelp.class;
  run;
  ```

SAS code and output

proc means data=sashelp.class;
run;
          Variable     N            Mean         Std Dev         Minimum         Maximum
          ------------------------------------------------------------------------------
          Age         19      13.3157895       1.4926722      11.0000000      16.0000000
          Height      19      62.3368421       5.1270752      51.3000000      72.0000000
          Weight      19     100.0263158      22.7739335      50.5000000     150.0000000
          ------------------------------------------------------------------------------

SAS HTML output

Variable N Mean Std Dev Minimum Maximum
Age
Height
Weight
19
19
19
13.3157895
62.3368421
100.0263158
1.4926722
5.1270752
22.7739335
11.0000000
51.3000000
50.5000000
16.0000000
72.0000000
150.0000000

Reproducibly Random : ‘set.seed’

set.seed(123)
Draw1<-rnorm(1000, mean=0, sd=2)
summary(Draw1)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -5.62000 -1.25700  0.01842  0.03226  1.32900  6.48200
hist(Draw1)

set.seed(125)
Draw2<-rnorm(1000, mean=0, sd=2)
summary(Draw2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7.2110 -1.4070 -0.1040 -0.1215  1.3160  5.6770
hist(Draw2)