Builds a contingency table of the n-gram counts versus their class labels.

table_ngrams(seq, ngrams, target)

Arguments

seq

vector or matrix describing sequence(s).

ngrams

vector of n-grams.

target

integer vector with target information (e.g. class labels). Must have at least two values.

Value

a data frame with the number of columns equal to the length of the target plus 1. The first column contains names of the n-grams. Further columns represents counts of n-grams for respective value of the target.

Examples

seqs_pos <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE, prob = c(0.2, 0.4, 0.35, 0.05)), ncol = 5) seqs_neg <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE), ncol = 5) tab <- table_ngrams(seq = rbind(seqs_pos, seqs_neg), ngrams = c("1_c.t_0", "1_g.g_0", "2_t.c_0", "2_g.g_0", "3_c.c_0", "3_g.c_0"), target = c(rep(1, 20), rep(0, 20))) # see the results print(tab)
#> ngram target0 target1 #> 1 1_c.t_0 1 3 #> 2 1_g.g_0 1 4 #> 3 2_t.c_0 0 2 #> 4 2_g.g_0 1 1 #> 5 3_c.c_0 1 6 #> 6 3_g.c_0 0 8
# easily plot the results using ggplot2