Builds and selects important n-grams stepwise.
construct_ngrams( target, seq, u, n_max, conf_level = 0.95, gap = TRUE, use_heuristics = TRUE )
target |
|
---|---|
seq | a vector or matrix describing sequence(s). |
u |
|
n_max | size of constructed n-grams. |
conf_level | confidence level. |
gap |
|
use_heuristics, | if |
a vector of n-grams.
construct_ngrams
starts by
extracting unigrams from the sequences, pasting them together in all combination and
choosing from them significant features (with p-value below conf_level
). The
chosen n-grams are further extended to the specified by n_max
size by pasting
unigrams at both ends.
The gap
parameter determines if construct_ngrams
performs the
feature selection on exact n-grams (gap
equal to FALSE) or on all features in the
Hamming distance 1 from the n-gram (gap
equal to TRUE).
Feature filtering method: test_features
.
# to make the example faster, we run construct_ngrams() on the # subset of data deg_seqs <- degenerate(human_cleave[c(1L:100, 801L:900), 1L:9], list(`1` = c(1, 6, 8, 10, 11, 18), `2` = c(2, 13, 14, 16, 17), `3` = c(5, 19, 20), `4` = c(7, 9, 12, 15), '5' = c(3, 4))) bigrams <- construct_ngrams(human_cleave[c(1L:100, 801L:900), "tar"], deg_seqs, 1L:5, 2)