Suppose we are interested in the number of high school students diagnosed with an infectious disease as a function of the number of days from the initial outbreak. The data can be loaded as follows:
students = read.csv("data/students.csv")
head(students)
## day cases
## 1 1 6
## 2 2 8
## 3 3 12
## 4 3 9
## 5 4 3
## 6 4 3
Can you find a suitable model for this dataset?
Consider a dataset on 768 women of at least 21 years old of the Pima Indian heritage. This dataset includes the following variables:
pregnant: Number of times pregnantglucose: Plasma glucose concentration in an oral glucose tolerance testpressure: Diastolic blood pressure (mm Hg)triceps: Triceps skin fold thickness (mm)insulin: 2-Hour serum insulin (mu U/ml)mass: Body mass index (weight in kg/(height in m)^2)pedigree: Diabetes pedigree functionage: Age of the patients (years)diabetes: Class variable (test for diabetes)The dataset is stored in the mlbench R package and you can load it as follows:
library(mlbench)
data(PimaIndiansDiabetes)
dim(PimaIndiansDiabetes)
## [1] 768 9
head(PimaIndiansDiabetes)
## pregnant glucose pressure triceps insulin mass pedigree age diabetes
## 1 6 148 72 35 0 33.6 0.627 50 pos
## 2 1 85 66 29 0 26.6 0.351 31 neg
## 3 8 183 64 0 0 23.3 0.672 32 pos
## 4 1 89 66 23 94 28.1 0.167 21 neg
## 5 0 137 40 35 168 43.1 2.288 33 pos
## 6 5 116 74 0 0 25.6 0.201 30 neg
Can you find a suitable model to predict the possibility of a woman (Pima Indian heritage) being diagnosed with diabetes based on these variables (or a subset of them)?
Let’s consider a breast cancer dataset from the mlbench R package. In this example, the objective is to predict whether a cancer is malignant or benign from biopsy details. This dataset includes 699 observations on 11 variables. You can load the data as follows:
data(BreastCancer)
dim(BreastCancer)
## [1] 699 11
head(BreastCancer)
## Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
## 1 1000025 5 1 1 1 2
## 2 1002945 5 4 4 5 7
## 3 1015425 3 1 1 1 2
## 4 1016277 6 8 8 1 3
## 5 1017023 4 1 1 3 2
## 6 1017122 8 10 10 8 7
## Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses Class
## 1 1 3 1 1 benign
## 2 10 3 2 1 benign
## 3 2 3 1 1 benign
## 4 4 3 7 1 benign
## 5 1 3 1 1 benign
## 6 10 9 7 1 malignant
table(BreastCancer$Class)
##
## benign malignant
## 458 241
More information about this dataset can be found by typing ?BreastCancer in R. Can you find a suitable model to identify benign or malignant classes based on these variables (or a subset of them)?