Computes total number of n-grams that can be extracted from sequences.

count_total(seq, n, d)

Arguments

seq

a vector or matrix describing sequence(s).

n

integer size of n-gram.

d

integer vector of distances between elements of n-gram (0 means consecutive elements). See Details.

Value

An integer rperesenting the total number of n-grams.

Details

The maximum number of possible n-grams is limited by their length and the distance between elements of the n-gram.

Note

A format of d vector is discussed in Details of count_ngrams. The maximum

Examples

seqs <- matrix(sample(1L:4, 600, replace = TRUE), ncol = 50) # make several sequences shorter by replacing them partially with NA seqs[8L:11, 46L:50] <- NA seqs[1L, 31L:50] <- NA count_total(seqs, 3, c(1, 0))
#> [1] 524