this function shows how common possible missingness patterns are. Emulates misschk in stata.
excludes any variables that don't have any missings, so as not to clutter output. Disable using omit_complete
sorts variables by number of missings, so that the usual suspects show up at the front.
displays number of missings accounted for by each pattern
missingness_patterns(df, min_freq = ifelse(relative, 1/nrow(df), 1), long_pattern = FALSE, print_legend = ifelse(long_pattern, FALSE, TRUE), show_culprit = TRUE, relative = FALSE, omit_complete = TRUE)
df | dataset |
---|---|
min_freq | show only patterns that occur at least this often. Defaults to 1 observation. |
long_pattern | by default (FALSE) only shows column indices for space and legibility reasons. |
print_legend | prints a legend for the column indices, defaults to FALSE if long_pattern is set |
show_culprit | defaults to TRUE. In case a missingness pattern boils down to one variable, it will be shown here. |
relative | defaults to FALSE. If true, percentages are shown (relative to total before excluding minimum frequency). |
omit_complete | defaults to TRUE. Columns that don't have any missings are excluded. |
data(ChickWeight) ChickWeight[1:2,c('weight','Chick')] = NA ChickWeight[3:5,'Diet'] = NA names(ChickWeight); nrow(ChickWeight)#> [1] "weight" "Time" "Chick" "Diet"#> [1] 578missingness_patterns(ChickWeight)#> index col missings #> 1 Diet 3 #> 2 weight 2 #> 3 Chick 2#> Pattern Freq Culprit #> 1 _____ 573 _ #> 2 1____ 3 Diet #> 3 __2_3 2