Builds a contingency table of the n-gram counts versus their class labels.
table_ngrams(seq, ngrams, target)
seq | vector or matrix describing sequence(s). |
---|---|
ngrams | vector of n-grams. |
target |
|
a data frame with the number of columns equal to the length of the
target
plus 1. The first column contains names of the n-grams. Further
columns represents counts of n-grams for respective value of the
target
.
seqs_pos <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE, prob = c(0.2, 0.4, 0.35, 0.05)), ncol = 5) seqs_neg <- matrix(sample(c("a", "c", "g", "t"), 100, replace = TRUE), ncol = 5) tab <- table_ngrams(seq = rbind(seqs_pos, seqs_neg), ngrams = c("1_c.t_0", "1_g.g_0", "2_t.c_0", "2_g.g_0", "3_c.c_0", "3_g.c_0"), target = c(rep(1, 20), rep(0, 20))) # see the results print(tab)#> ngram target0 target1 #> 1 1_c.t_0 1 3 #> 2 1_g.g_0 1 4 #> 3 2_t.c_0 0 2 #> 4 2_g.g_0 1 1 #> 5 3_c.c_0 1 6 #> 6 3_g.c_0 0 8# easily plot the results using ggplot2