In this problem, we consider a blind study where all the pigs are randomly assigned with a certain diet with or without chips, and we are interested in the effect of chips (so the covariate group) on urine cortisol. As the causal influence among hormones remains unknown, we only consider urine cortisol as the response variable in our analysis. We first import and clean the data as follows:

# Import data
cortisol_data = read.csv("data/cortisol.csv", row.names=1)

# Create data frame
dat = data.frame(group = cortisol_data$group,
                 gender = cortisol_data$gender,
                 urine_cortisol = cortisol_data$Urine.Cortisol..pg.mg.,
                 serum_cortisol = cortisol_data$Serum.Cortisol..ng.ml.,
                 ACTH = cortisol_data$Serum.ACTH..pg.ml.,
                 CRH = cortisol_data$Serum.CRH..pg.ml.,
                 testosterone = cortisol_data$Testosterone..ng.ml.,
                 LH = cortisol_data$LH..ng.ml.)

head(dat[,1:3])
##   group gender urine_cortisol
## 1     C   male      1509.7096
## 2    NC female       698.5621
## 3    NC female       187.7233
## 4     C   male       736.5923
## 5    NC female       542.4715
## 6     C female      1872.2579

Simple analysis

We start our analysis by visualizing the data to grasp an idea of how the data is like.

gg_color_hue = function(n, alpha = 1) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100, alpha = alpha)[1:n]
}

cols = gg_color_hue(2)
cols_trans = gg_color_hue(2, alpha = 0.15)

boxplot(urine_cortisol ~ group, data = dat, col = cols, ylab = "Urine Cortisol")

As we can see, the groups of Chips and Non-chips show different mean, different variance with some outliers. Therefore, it is not suitable to be analyzed by any of the methods we have discussed so far. Let’s try a log transformation on the response variable and see if we can get something better.

boxplot(log(urine_cortisol) ~ group, data = dat, col = cols, ylab = "log(Urine Cortisol)")

As we can see, after the log transformation, the two groups show different mean but relatively similar variance with no more outliers. The transformed data can then be analyzed with a Welch’s t-test with hypotheses \(H_0: \mu_{C} = \mu_{NC}\) and \(H_a: \mu_{C} > \mu_{NC}\). The R output is as follows:

t.test(log(urine_cortisol) ~ group, data = dat, col = cols, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  log(urine_cortisol) by group
## t = 6.1625, df = 50.799, p-value = 5.782e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.9003832       Inf
## sample estimates:
##  mean in group C mean in group NC 
##         7.448626         6.212052

Since the p-value is smaller than 5%, we can conclude that the chips diet significantly increases the level of urine cortisol on the log scale, at the significance level of 5%. However, what can we say on the original scale of urine cortisol? This will need to involve a more complicated study with the Generalized Linear Models (GLM).

More complex analysis with GLM

Consider the following model where we assume that the level of urine cortisol (\(Y_i\)) follows a gamma distribution

\[Y_i \sim \text{Gamma}(\mu_i, \sigma),\]

where \(\mu_i\), the conditional mean of \(Y_i\), is modelled by

\[\mu_i = \exp(\mathbf{x}_i^T \boldsymbol{\beta}),\] and \(\sigma\) is related to the conditional variance of \(Y_i\) as \(\text{var}(Y_i) = \mu_i^2 \sigma^2\). You can check out more details about the reparametrization of gamma distribution we use by loading the gamlss R package and checking the documentation of the GA function.

library(gamlss)

Now let’s try to fit the considered model with the gamlss function.

fit_ga_urine_cortisol = gamlss(urine_cortisol ~ group, data = dat, 
                               sigma.formula =~ 1, family = GA)
## GAMLSS-RS iteration 1: Global Deviance = 956.7006 
## GAMLSS-RS iteration 2: Global Deviance = 956.7006
summary(fit_ga_urine_cortisol)
## ******************************************************************
## Family:  c("GA", "Gamma") 
## 
## Call:  gamlss(formula = urine_cortisol ~ group, sigma.formula = ~1,  
##     family = GA, data = dat) 
## 
## Fitting method: RS() 
## 
## ------------------------------------------------------------------
## Mu link function:  log
## Mu Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.7315     0.1326  58.312  < 2e-16 ***
## groupNC      -1.3016     0.1750  -7.436 5.43e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## Sigma link function:  log
## Sigma Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.39145    0.08454   -4.63 2.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## No. of observations in the fit:  61 
## Degrees of Freedom for the fit:  3
##       Residual Deg. of Freedom:  58 
##                       at cycle:  2 
##  
## Global Deviance:     956.7006 
##             AIC:     962.7006 
##             SBC:     969.0333 
## ******************************************************************

Before we draw any conlusion from the fitted model, we should first check if the model is suitable given the data. It is possible to compute standardized residuals as follows:

\[r_i = \frac{Y_i - \widehat{\mu}_i}{\widehat{\sigma}_i},\] where \(\widehat{\mu}_i\) and \(\widehat{\sigma}_i\) correspond to the predicted (by the estimated model) mean and standard deviation for observation \(i\). Therefore, we can perform the following model check:

plot(fit_ga_urine_cortisol)

## ******************************************************************
##        Summary of the Quantile Residuals
##                            mean   =  -0.0005430463 
##                        variance   =  1.016182 
##                coef. of skewness  =  0.02265806 
##                coef. of kurtosis  =  2.640994 
## Filliben correlation coefficient  =  0.9919967 
## ******************************************************************

We can also obtain the following fitted densities as follows:

mu_coef = fit_ga_urine_cortisol$mu.coefficients
sigma_coef = fit_ga_urine_cortisol$sigma.coefficients

mu_chips = mu_coef["(Intercept)"] 
mu_non_chips = mu_coef["(Intercept)"] + mu_coef["groupNC"] 

xx = seq(from = 0.0001, to = 10^4, length.out = 10^4)
yy_chips = dGA(xx, mu = exp(mu_chips), sigma = exp(sigma_coef))
yy_non_chips = dGA(xx, mu = exp(mu_non_chips), sigma = exp(sigma_coef))

plot(NA, xlim = c(0, 10^4), ylim = c(0, max(na.omit(yy_non_chips))), xlab = "Urine Cortisol", ylab = "PDF")

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_chips)), border = NA, col = cols_trans[1])
lines(xx, yy_chips, col = cols[1], lwd = 1)

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_non_chips)), border = NA, col = cols_trans[2])
lines(xx, yy_non_chips, col = cols[2], lwd = 1)

legend("topright", c("Chips", "Non-chips"), col = cols, lwd = 1)

plot(NA, xlim = c(0, 10^4), ylim = c(0, max(na.omit(yy_non_chips))), xlab = "Urine Cortisol", ylab = "PDF")

hist(dat$urine_cortisol[dat$group == "NC"], probability = TRUE, add = TRUE, col = "grey80", breaks = 3, border = cols[2])

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_chips)), border = NA, col = cols_trans[1])
lines(xx, yy_chips, col = cols[1], lwd = 2)

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_non_chips)), border = NA, col = cols_trans[2])
lines(xx, yy_non_chips, col = cols[2], lwd = 2)

legend("topright", c("Chips", "Non-chips"), col = cols, lwd = 1)

plot(NA, xlim = c(0, 10^4), ylim = c(0, max(na.omit(yy_non_chips))), xlab = "Urine Cortisol", ylab = "PDF")

hist(dat$urine_cortisol[dat$group == "C"], probability = TRUE, add = TRUE, col = "grey80", breaks = 10, border = cols[1])

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_chips)), border = NA, col = cols_trans[1])
lines(xx, yy_chips, col = cols[1], lwd = 2)

polygon(c(xx, rev(xx)), c(rep(0, length(xx)), rev(yy_non_chips)), border = NA, col = cols_trans[2])
lines(xx, yy_non_chips, col = cols[2], lwd = 2)

legend("topright", c("Chips", "Non-chips"), col = cols, lwd = 1)

The above graphs show that the model fits the data very well. Therefore, it is reliable to draw conclusion based on this fitted gamma GLM. Since the p-value related to the groupNC covariate is \(5.43 \times 10^{-8}\%\), and that the estimated coefficient is \(-1.3016\), the p-value that corresponds to the hypotheses \(H_0: \mu_{C} = \mu_{NC}\) and \(H_a: \mu_{C} > \mu_{NC}\) is therefore \(5.43 \times 10^{-8}\%/2 \approx 2.72\times 10^{-8}\%\). So we can conclude that the chips diet significantly increases the level of urine cortisol at the significance level of 5%.

Supplementary analysis: Gender

As the experiment is a blind study, we expect not to see significant difference between different genders. To verify our expectation, let’s fit the model again by adding the covariate gender.

fit_ga_urine_cortisol2 = gamlss(urine_cortisol ~ group*gender, data = dat, sigma.formula =~ 1, family = GA)
## GAMLSS-RS iteration 1: Global Deviance = 956.3047 
## GAMLSS-RS iteration 2: Global Deviance = 956.3047
summary(fit_ga_urine_cortisol2)
## ******************************************************************
## Family:  c("GA", "Gamma") 
## 
## Call:  gamlss(formula = urine_cortisol ~ group * gender, sigma.formula = ~1,  
##     family = GA, data = dat) 
## 
## Fitting method: RS() 
## 
## ------------------------------------------------------------------
## Mu link function:  log
## Mu Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         7.77418    0.18018  43.147  < 2e-16 ***
## groupNC            -1.41874    0.25481  -5.568 7.59e-07 ***
## gendermale         -0.09483    0.26522  -0.358    0.722    
## groupNC:gendermale  0.21614    0.35277   0.613    0.543    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## Sigma link function:  log
## Sigma Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.39428    0.08457  -4.662 1.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## No. of observations in the fit:  61 
## Degrees of Freedom for the fit:  5
##       Residual Deg. of Freedom:  56 
##                       at cycle:  2 
##  
## Global Deviance:     956.3047 
##             AIC:     966.3047 
##             SBC:     976.859 
## ******************************************************************
plot(fit_ga_urine_cortisol)

## ******************************************************************
##        Summary of the Quantile Residuals
##                            mean   =  -0.0005430463 
##                        variance   =  1.016182 
##                coef. of skewness  =  0.02265806 
##                coef. of kurtosis  =  2.640994 
## Filliben correlation coefficient  =  0.9919967 
## ******************************************************************

As we can see, the considered model fits well the data, and the covariates related to gender are both not significant, which verifies our expectation and concludes the analysis.