fastmath.stats

Statistics functions.

  • Descriptive statistics for sequence.
  • Correlation / covariance of two sequences.
  • Outliers

All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

  • :Size - size of the samples, (count ...)
  • :Min - minimum value
  • :Max - maximum value
  • :Mean - mean/average
  • :Median - median, see also: median-3
  • :Mode - mode, see also: modes
  • :Q1 - first quartile, use: percentile, quartile
  • :Q3 - third quartile, use: percentile, quartile
  • :Total - sum of all samples
  • :SD - standard deviation of population, corrected sample standard deviation, use: population-stddev
  • :MAD - median-absolute-deviation
  • :SEM - standard error of mean
  • :LAV - lower adjacent value, use: adjacent-values
  • :UAV - upper adjacent value, use: adjacent-values
  • :IQR - interquartile range, (- q3 q1)
  • :LOF - lower outer fence, (- q1 (* 3.0 iqr))
  • :UOF - upper outer fence, (+ q3 (* 3.0 iqr))
  • :LIF - lower inner fence, (- q1 (* 1.5 iqr))
  • :UIF - upper inner fence, (+ q3 (* 1.5 iqr))
  • :Outliers - number of outliers, samples which are outside outer fences
  • :Kurtosis - kurtosis
  • :Skewness - skewness
  • :SecMoment - second central moment, use: second-moment

Note: percentile and quartile can have 10 different interpolation strategies. See docs

Correlation / Covariance / Divergence

Other

Normalize samples to have mean=0 and standard deviation = 1 with standardize.

histogram to count samples in evenly spaced ranges.

adjacent-values

(adjacent-values vs)(adjacent-values vs estimation-strategy)(adjacent-values vs q1 q3)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
  • UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

[LAV, UAV]

(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0]

Gaussian distribution [LAV, UAV]

(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.7000857687493234 2.6996610841795916]

correlation

(correlation vs1 vs2)

Correlation of two sequences.

Examples

Correlation of uniform and gaussian distribution samples.

(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
             (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.00678949591847233

covariance

(covariance vs1 vs2)

Covariance of two sequences.

Examples

Covariance of uniform and gaussian distribution samples.

(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
            (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0019254850635683056

estimation-strategies-list

Examples

List of estimation strategies for percentile

(keys estimation-strategies-list)
;;=> (:r7 :r6 :r8 :r2 :r9 :r3 :r1 :legacy :r4 :r5)

extent

(extent vs)

Return extent (min, max) values from sequence

Examples

min/max from gaussian distribution

(extent (repeatedly 100000 r/grand))
;;=> [-4.284832010490651 4.5215900810413405]

histogram

(histogram vs bins)(histogram vs bins [mn mx])

Calculate histogram.

Returns map with keys:

  • :size - number of bins
  • :step - distance between bins
  • :bins - list of pairs of range lower value and number of hits

Examples

3 bins from uniform distribution.

(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([8.212692395922483E-4 303]
;;=>         [0.3331197753070451 362]
;;=>         [0.665418281374498 335]),
;;=>  :size 3,
;;=>  :step 0.33229850606745287}

3 bins from uniform distribution for given range.

(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins
;;=>  ([0.1 1331] [0.23333333333333334 1301] [0.3666666666666667 1323]),
;;=>  :size 3,
;;=>  :step 0.13333333333333333}

5 bins from normal distribution.

(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.8255442971705595 104]
;;=>         [-2.3194795153248613 1970]
;;=>         [-0.8134147334791635 5407]
;;=>         [0.6926500483665348 2369]
;;=>         [2.1987148302122326 150]),
;;=>  :size 5,
;;=>  :step 1.506064781845698}

jensen-shannon-divergence

(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Examples

Jensen-Shannon divergence

(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                           (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 498.62287946302877

kendall-correlation

(kendall-correlation vs1 vs2)

Kendall’s correlation of two sequences.

Examples

Kendall’s correlation of uniform and gaussian distribution samples.

(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 8.147273472734727E-4

kullback-leibler-divergence

(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Examples

Kullback-Leibler divergence.

(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                             (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2025.8557601936284

kurtosis

(kurtosis vs)

Calculate kurtosis from sequence.

Examples

Kurtosis

(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099

maximum

(maximum vs)

Maximum value from sequence.

Examples

Maximum value

(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0

mean

(mean vs)

Calculate mean of vs

Examples

Mean (average value)

(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109

median

(median vs)

Calculate median of vs. See median-3.

Examples

Median (percentile 50%).

(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0

For three elements use faster median-3.

(median [7 1 4])
;;=> 4.0

median-3

(median-3 a b c)

Median of three values. See median.

Examples

Median of [7 1 4]

(median-3 7 1 4)
;;=> 4.0

median-absolute-deviation

(median-absolute-deviation vs)

Calculate MAD

Examples

MAD

(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0

minimum

(minimum vs)

Minimum value from sequence.

Examples

Minimum value

(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

mode

(mode vs)

Find the value that appears most often in a dataset vs.

See also modes.

Examples

Example

(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

Returns lowest value when every element appears equally.

(mode [5 1 2 3 4])
;;=> 1.0

modes

(modes vs)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

See also mode.

Examples

Example

(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)

Returns lowest value when every element appears equally.

(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)

outliers

(outliers vs)(outliers vs estimation-strategy)(outliers vs q1 q3)

Find outliers defined as values outside outer fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LOF (Lower Outer Fence) equals (- Q1 (* 3.0 IQR)).
  • UOF (Upper Outer Fence) equals (+ Q3 (* 3.0 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Outliers

(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)

Gaussian distribution outliers

(outliers (repeatedly 3000000 r/grand))
;;=> (-4.864618557049428
;;=>  -4.780633031723624
;;=>  -4.7376202235271165
;;=>  4.728896589466469
;;=>  4.744055356963435
;;=>  4.747383207352102
;;=>  4.769288560936935
;;=>  4.909931823383421
;;=>  5.043073126955698)

pearson-correlation

(pearson-correlation vs1 vs2)

Pearson’s correlation of two sequences.

Examples

Pearson’s correlation of uniform and gaussian distribution samples.

(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.005570841963401116

percentile

(percentile vs p)(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Examples

Percentile 25%

(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0

Percentile 50% (median)

(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0

Percentile 75%

(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0

Percentile 90%

(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0

Various estimation strategies.

(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007

population-stddev

(population-stddev vs)(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Examples

Population standard deviation.

(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403

population-variance

(population-variance vs)(population-variance vs u)

Calculate population variance of vs.

See variance.

Examples

Population variance

(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543

quantile

(quantile vs p)(quantile vs p estimation-strategy)

Calculate quantile of a vs.

Percentile p is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentile.

Examples

Quantile 0.25

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0

Quantile 0.5 (median)

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0

Quantile 0.75

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0

Quantile 0.9

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0

Various estimation strategies.

(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966

second-moment

(second-moment vs)

Calculate second moment from sequence.

It’s a sum of squared deviations from the sample mean

Examples

Second Moment

(second-moment [1 2 3 -1 -1 2 -1 11 111])
;;=> 10670.888888888889

skewness

(skewness vs)

Calculate kurtosis from sequence.

Examples

Skewness

(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954

spearman-correlation

(spearman-correlation vs1 vs2)

Spearman’s correlation of two sequences.

Examples

Spearsman’s correlation of uniform and gaussian distribution samples.

(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                      (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.002944189590700948

standardize

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Examples

Standardize

(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=>  -0.33161081278713267
;;=>  -0.30423010347443363
;;=>  -0.4137529407252298
;;=>  -0.4137529407252298
;;=>  -0.33161081278713267
;;=>  -0.4137529407252298
;;=>  -0.08518442897284138
;;=>  2.652886502297062)

stats-map

(stats-map vs)(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Stats

(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=>  :Kurtosis 8.732515263272099,
;;=>  :LAV -1.0,
;;=>  :LIF -13.0,
;;=>  :LOF -25.0,
;;=>  :MAD 3.0,
;;=>  :Max 111.0,
;;=>  :Mean 14.11111111111111,
;;=>  :Median 2.0,
;;=>  :Min -1.0,
;;=>  :Mode -1.0,
;;=>  :Outliers 1,
;;=>  :Q1 -1.0,
;;=>  :Q3 7.0,
;;=>  :SD 34.4333315406403,
;;=>  :SEM 11.477777180213435,
;;=>  :SecMoment 10670.888888888889,
;;=>  :Size 9,
;;=>  :Skewness 2.94268445417954,
;;=>  :Total 127.0,
;;=>  :UAV 11.0,
;;=>  :UIF 19.0,
;;=>  :UOF 31.0}

stddev

(stddev vs)(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Examples

Standard deviation.

(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084

sum

(sum vs)

Sum of all vs values.

Examples

Sum

(sum [1 2 3 -1 -1 2 -1 11 111])
;;=> 127.0

variance

(variance vs)(variance vs u)

Calculate variance of vs.

See population-variance.

Examples

Variance.

(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111