fastmath.stats

Statistics functions.

  • Descriptive statistics for sequence.
  • Correlation / covariance of two sequences.
  • Outliers

All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

Note: percentile and quartile can have 10 different interpolation strategies. See docs

Correlation / Covariance / Divergence

Other

Normalize samples to have mean=0 and standard deviation = 1 with standardize.

histogram to count samples in evenly spaced ranges.

adjacent-values

(adjacent-values vs)(adjacent-values vs estimation-strategy)(adjacent-values vs q1 q3)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
  • UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

[LAV, UAV]

(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0]

Gaussian distribution [LAV, UAV]

(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.7000857687493234 2.6996610841795916]

correlation

(correlation vs1 vs2)

Correlation of two sequences.

Examples

Correlation of uniform and gaussian distribution samples.

(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
             (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.00678949591847233

covariance

(covariance vs1 vs2)

Covariance of two sequences.

Examples

Covariance of uniform and gaussian distribution samples.

(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
            (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0019254850635683056

estimate-bins

(estimate-bins vs)(estimate-bins vs method)

Estimate number of bins for histogram.

Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).

Examples

Estimate number of bins for various methods. vs contains 1000 random samples from Log-Normal distribution.

(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 17
(estimate-bins vs :scott)
;;=> 53
(estimate-bins vs :freedman-diaconis)
;;=> 170

estimation-strategies-list

Examples

List of estimation strategies for percentile

(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)

extent

(extent vs)

Return extent (min, max) values from sequence

Examples

min/max from gaussian distribution

(extent (repeatedly 100000 r/grand))
;;=> [-4.284832010490651 4.5215900810413405]

histogram

(histogram vs)(histogram vs bins-or-estimate-method)(histogram vs bins [mn mx])

Calculate histogram.

Returns map with keys:

  • :size - number of bins
  • :step - distance between bins
  • :bins - list of triples of range lower value, number of hits and ratio of used samples
  • :min - min value
  • :max - max value
  • :samples - number of used samples

For estimation methods check estimate-bins.

Examples

3 bins from uniform distribution.

(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([0.0025107458839882524 342 0.342]
;;=>         [0.33485978353276935 333 0.333]
;;=>         [0.6672088211815504 325 0.325]),
;;=>  :max 0.9995578588303315,
;;=>  :min 0.0025107458839882524,
;;=>  :samples 1000,
;;=>  :size 3,
;;=>  :step 0.3323490376487811}

3 bins from uniform distribution for given range.

(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins ([0.1 1358 0.34086345381526106]
;;=>         [0.23333333333333334 1306 0.3278112449799197]
;;=>         [0.3666666666666667 1320 0.3313253012048193]),
;;=>  :max 0.5,
;;=>  :min 0.1,
;;=>  :samples 3984,
;;=>  :size 3,
;;=>  :step 0.13333333333333333}

5 bins from normal distribution.

(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.8255442971705595 103 0.0103]
;;=>         [-2.3194795153248613 1973 0.1973]
;;=>         [-0.8134147334791635 5453 0.5453]
;;=>         [0.6926500483665348 2325 0.2325]
;;=>         [2.1987148302122326 146 0.0146]),
;;=>  :max 3.7047796120579304,
;;=>  :min -3.8255442971705595,
;;=>  :samples 10000,
;;=>  :size 5,
;;=>  :step 1.506064781845698}

Estimate number of bins

(:size (histogram (repeatedly 10000 r/grand)))
;;=> 60

Estimate number of bins, Rice rule

(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44

iqr

(iqr vs)(iqr vs estimation-strategy)

Interquartile range.

Examples

IQR

(iqr (repeatedly 100000 r/grand))
;;=> 1.3507989631201418

jensen-shannon-divergence

(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Examples

Jensen-Shannon divergence

(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                           (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 507.81767699492116

kendall-correlation

(kendall-correlation vs1 vs2)

Kendall’s correlation of two sequences.

Examples

Kendall’s correlation of uniform and gaussian distribution samples.

(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -7.405886058860589E-4

kernel-density

(kernel-density vs h)(kernel-density vs)

Creates kernel density function for given series vs and optional bandwidth h.

Examples

Usage

(let [kd (kernel-density [0 10 10 10 10 10 10 10 10 0 0 0 0 1 1 1] 1)]
  (map (comp m/approx kd) (range -5 15)))
;;=> (0.0
;;=>  0.0
;;=>  0.0
;;=>  0.02
;;=>  0.09
;;=>  0.17
;;=>  0.15
;;=>  0.06
;;=>  0.01
;;=>  0.0
;;=>  0.0
;;=>  0.0
;;=>  0.0
;;=>  0.03
;;=>  0.12
;;=>  0.2
;;=>  0.12
;;=>  0.03
;;=>  0.0
;;=>  0.0)

kullback-leibler-divergence

(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Examples

Kullback-Leibler divergence.

(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
                             (repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2199.662874529706

kurtosis

(kurtosis vs)

Calculate kurtosis from sequence.

Examples

Kurtosis

(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099

maximum

(maximum vs)

Maximum value from sequence.

Examples

Maximum value

(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0

mean

(mean vs)

Calculate mean of vs

Examples

Mean (average value)

(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109

median

(median vs)

Calculate median of vs. See median-3.

Examples

Median (percentile 50%).

(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0

For three elements use faster median-3.

(median [7 1 4])
;;=> 4.0

median-3

(median-3 a b c)

Median of three values. See median.

Examples

Median of [7 1 4]

(median-3 7 1 4)
;;=> 4.0

median-absolute-deviation

(median-absolute-deviation vs)

Calculate MAD

Examples

MAD

(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0

minimum

(minimum vs)

Minimum value from sequence.

Examples

Minimum value

(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

mode

(mode vs)

Find the value that appears most often in a dataset vs.

See also modes.

Examples

Example

(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0

Returns lowest value when every element appears equally.

(mode [5 1 2 3 4])
;;=> 1.0

modes

(modes vs)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

See also mode.

Examples

Example

(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)

Returns lowest value when every element appears equally.

(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)

outliers

(outliers vs)(outliers vs estimation-strategy)(outliers vs q1 q3)

Find outliers defined as values outside outer fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LOF (Lower Outer Fence) equals (- Q1 (* 3.0 IQR)).
  • UOF (Upper Outer Fence) equals (+ Q3 (* 3.0 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Outliers

(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)

Gaussian distribution outliers

(outliers (repeatedly 3000000 r/grand))
;;=> (-4.864618557049428
;;=>  -4.780633031723624
;;=>  -4.7376202235271165
;;=>  4.728896589466469
;;=>  4.744055356963435
;;=>  4.747383207352102
;;=>  4.769288560936935
;;=>  4.909931823383421
;;=>  5.043073126955698)

pearson-correlation

(pearson-correlation vs1 vs2)

Pearson’s correlation of two sequences.

Examples

Pearson’s correlation of uniform and gaussian distribution samples.

(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                     (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.007076444050676723

percentile

(percentile vs p)(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Examples

Percentile 25%

(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0

Percentile 50% (median)

(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0

Percentile 75%

(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0

Percentile 90%

(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0

Various estimation strategies.

(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007

population-stddev

(population-stddev vs)(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Examples

Population standard deviation.

(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403

population-variance

(population-variance vs)(population-variance vs u)

Calculate population variance of vs.

See variance.

Examples

Population variance

(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543

quantile

(quantile vs p)(quantile vs p estimation-strategy)

Calculate quantile of a vs.

Percentile p is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentile.

Examples

Quantile 0.25

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0

Quantile 0.5 (median)

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0

Quantile 0.75

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0

Quantile 0.9

(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0

Various estimation strategies.

(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966

second-moment

(second-moment vs)

Calculate second moment from sequence.

It’s a sum of squared deviations from the sample mean

Examples

Second Moment

(second-moment [1 2 3 -1 -1 2 -1 11 111])
;;=> 10670.888888888889

skewness

(skewness vs)

Calculate kurtosis from sequence.

Examples

Skewness

(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954

spearman-correlation

(spearman-correlation vs1 vs2)

Spearman’s correlation of two sequences.

Examples

Spearsman’s correlation of uniform and gaussian distribution samples.

(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
                      (repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 6.338276569602102E-4

standardize

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Examples

Standardize

(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=>  -0.33161081278713267
;;=>  -0.30423010347443363
;;=>  -0.4137529407252298
;;=>  -0.4137529407252298
;;=>  -0.33161081278713267
;;=>  -0.4137529407252298
;;=>  -0.08518442897284138
;;=>  2.652886502297062)

stats-map

(stats-map vs)(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See estimation-strategies.

Examples

Stats

(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=>  :Kurtosis 8.846742084858873,
;;=>  :LAV 0.0,
;;=>  :LIF -13.0,
;;=>  :LOF -25.0,
;;=>  :MAD 3.0,
;;=>  :Max 111.0,
;;=>  :Mean 14.11111111111111,
;;=>  :Median 2.0,
;;=>  :Min -1.0,
;;=>  :Mode 3.0,
;;=>  :Outliers (109.0),
;;=>  :Q1 -1.0,
;;=>  :Q3 7.0,
;;=>  :SD 34.4333315406403,
;;=>  :SEM 11.477777180213435,
;;=>  :SecMoment 10142.0,
;;=>  :Size 9,
;;=>  :Skewness 2.9666775258488958,
;;=>  :Total 127.0,
;;=>  :UAV 9.0,
;;=>  :UIF 19.0,
;;=>  :UOF 31.0}

stddev

(stddev vs)(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Examples

Standard deviation.

(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084

sum

(sum vs)

Sum of all vs values.

Examples

Sum

(sum [1 2 3 -1 -1 2 -1 11 111])
;;=> 127.0

variance

(variance vs)(variance vs u)

Calculate variance of vs.

See population-variance.

Examples

Variance.

(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111