Problem 1: Students with an Infectious Disease

Suppose we are interested in the number of high school students diagnosed with an infectious disease as a function of the number of days from the initial outbreak. The data can be loaded as follows:

students = read.csv("data/students.csv")
head(students)
##   day cases
## 1   1     6
## 2   2     8
## 3   3    12
## 4   3     9
## 5   4     3
## 6   4     3

Can you find a suitable model for this dataset?

Problem 2: Pima Indians Diabetes

Consider a dataset on 768 women of at least 21 years old of the Pima Indian heritage. This dataset includes the following variables:

The dataset is stored in the mlbench R package and you can load it as follows:

library(mlbench)
data(PimaIndiansDiabetes)
dim(PimaIndiansDiabetes)
## [1] 768   9
head(PimaIndiansDiabetes)
##   pregnant glucose pressure triceps insulin mass pedigree age diabetes
## 1        6     148       72      35       0 33.6    0.627  50      pos
## 2        1      85       66      29       0 26.6    0.351  31      neg
## 3        8     183       64       0       0 23.3    0.672  32      pos
## 4        1      89       66      23      94 28.1    0.167  21      neg
## 5        0     137       40      35     168 43.1    2.288  33      pos
## 6        5     116       74       0       0 25.6    0.201  30      neg

Can you find a suitable model to predict the possibility of a woman (Pima Indian heritage) being diagnosed with diabetes based on these variables (or a subset of them)?

Problem 3: Breast Cancer

Let’s consider a breast cancer dataset from the mlbench R package. In this example, the objective is to predict whether a cancer is malignant or benign from biopsy details. This dataset includes 699 observations on 11 variables. You can load the data as follows:

data(BreastCancer)
dim(BreastCancer)
## [1] 699  11
head(BreastCancer)
##        Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
## 1 1000025            5         1          1             1            2
## 2 1002945            5         4          4             5            7
## 3 1015425            3         1          1             1            2
## 4 1016277            6         8          8             1            3
## 5 1017023            4         1          1             3            2
## 6 1017122            8        10         10             8            7
##   Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses     Class
## 1           1           3               1       1    benign
## 2          10           3               2       1    benign
## 3           2           3               1       1    benign
## 4           4           3               7       1    benign
## 5           1           3               1       1    benign
## 6          10           9               7       1 malignant
table(BreastCancer$Class)
## 
##    benign malignant 
##       458       241

More information about this dataset can be found by typing ?BreastCancer in R. Can you find a suitable model to identify benign or malignant classes based on these variables (or a subset of them)?