fastmath.stats
Statistics functions.
- Descriptive statistics for sequence.
- Correlation / covariance of two sequences.
- Outliers
All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.
Descriptive statistics
All in one function stats-map contains:
:Size
- size of the samples,(count ...)
:Min
- minimum value:Max
- maximum value:Mean
- mean/average:Median
- median, see also: median-3:Mode
- mode, see also: modes:Q1
- first quartile, use: percentile, quartile:Q3
- third quartile, use: percentile, quartile:Total
- sum of all samples:SD
- standard deviation of population, corrected sample standard deviation, use: population-stddev:MAD
- median-absolute-deviation:SEM
- standard error of mean:LAV
- lower adjacent value, use: adjacent-values:UAV
- upper adjacent value, use: adjacent-values:IQR
- interquartile range,(- q3 q1)
:LOF
- lower outer fence,(- q1 (* 3.0 iqr))
:UOF
- upper outer fence,(+ q3 (* 3.0 iqr))
:LIF
- lower inner fence,(- q1 (* 1.5 iqr))
:UIF
- upper inner fence,(+ q3 (* 1.5 iqr))
:Outliers
- list of outliers, samples which are outside outer fences:Kurtosis
- kurtosis:Skewness
- skewness:SecMoment
- second central moment, use: second-moment
Note: percentile and quartile can have 10 different interpolation strategies. See docs
Correlation / Covariance / Divergence
- covariance
- correlation
- pearson-correlation
- spearman-correlation
- kendall-correlation
- kullback-leibler-divergence
- jensen-shannon-divergence
Other
Normalize samples to have mean=0 and standard deviation = 1 with standardize.
histogram to count samples in evenly spaced ranges.
Categories
- Correlation: correlation covariance jensen-shannon-divergence kendall-correlation kullback-leibler-divergence pearson-correlation spearman-correlation
- Descriptive statistics: adjacent-values estimate-bins estimation-strategies-list extent histogram iqr kernel-density kurtosis maximum mean median median-3 median-absolute-deviation minimum mode modes outliers percentile population-stddev population-variance quantile second-moment skewness stats-map stddev sum variance
Other vars: standardize
adjacent-values
(adjacent-values vs)
(adjacent-values vs estimation-strategy)
(adjacent-values vs q1 q3)
Lower and upper adjacent values (LAV and UAV).
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LAV is smallest value which is greater or equal to the LIF =
(- Q1 (* 1.5 IQR))
. - UAV is largest value which is lower or equal to the UIF =
(+ Q3 (* 1.5 IQR))
.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
[LAV, UAV]
(adjacent-values [1 2 3 -1 -1 2 -1 11 111])
;;=> [-1.0 11.0]
Gaussian distribution [LAV, UAV]
(adjacent-values (repeatedly 1000000 r/grand))
;;=> [-2.7000857687493234 2.6996610841795916]
correlation
(correlation vs1 vs2)
Correlation of two sequences.
Examples
Correlation of uniform and gaussian distribution samples.
(correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.00678949591847233
covariance
(covariance vs1 vs2)
Covariance of two sequences.
Examples
Covariance of uniform and gaussian distribution samples.
(covariance (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -0.0019254850635683056
estimate-bins
(estimate-bins vs)
(estimate-bins vs method)
Estimate number of bins for histogram.
Possible methods are: :sqrt
:sturges
:rice
:doane
:scott
:freedman-diaconis
(default).
Examples
Estimate number of bins for various methods.
vs
contains 1000 random samples from Log-Normal distribution.
(estimate-bins vs :sqrt)
;;=> 31
(estimate-bins vs :sturges)
;;=> 11
(estimate-bins vs :rice)
;;=> 20
(estimate-bins vs :doane)
;;=> 17
(estimate-bins vs :scott)
;;=> 53
(estimate-bins vs :freedman-diaconis)
;;=> 170
estimation-strategies-list
Examples
List of estimation strategies for percentile
(sort (keys estimation-strategies-list))
;;=> (:legacy :r1 :r2 :r3 :r4 :r5 :r6 :r7 :r8 :r9)
extent
(extent vs)
Return extent (min, max) values from sequence
Examples
min/max from gaussian distribution
(extent (repeatedly 100000 r/grand))
;;=> [-4.284832010490651 4.5215900810413405]
histogram
(histogram vs)
(histogram vs bins-or-estimate-method)
(histogram vs bins [mn mx])
Calculate histogram.
Returns map with keys:
:size
- number of bins:step
- distance between bins:bins
- list of triples of range lower value, number of hits and ratio of used samples:min
- min value:max
- max value:samples
- number of used samples
For estimation methods check estimate-bins.
Examples
3 bins from uniform distribution.
(histogram (repeatedly 1000 rand) 3)
;;=> {:bins ([0.0025107458839882524 342 0.342]
;;=> [0.33485978353276935 333 0.333]
;;=> [0.6672088211815504 325 0.325]),
;;=> :max 0.9995578588303315,
;;=> :min 0.0025107458839882524,
;;=> :samples 1000,
;;=> :size 3,
;;=> :step 0.3323490376487811}
3 bins from uniform distribution for given range.
(histogram (repeatedly 10000 rand) 3 [0.1 0.5])
;;=> {:bins ([0.1 1358 0.34086345381526106]
;;=> [0.23333333333333334 1306 0.3278112449799197]
;;=> [0.3666666666666667 1320 0.3313253012048193]),
;;=> :max 0.5,
;;=> :min 0.1,
;;=> :samples 3984,
;;=> :size 3,
;;=> :step 0.13333333333333333}
5 bins from normal distribution.
(histogram (repeatedly 10000 r/grand) 5)
;;=> {:bins ([-3.8255442971705595 103 0.0103]
;;=> [-2.3194795153248613 1973 0.1973]
;;=> [-0.8134147334791635 5453 0.5453]
;;=> [0.6926500483665348 2325 0.2325]
;;=> [2.1987148302122326 146 0.0146]),
;;=> :max 3.7047796120579304,
;;=> :min -3.8255442971705595,
;;=> :samples 10000,
;;=> :size 5,
;;=> :step 1.506064781845698}
Estimate number of bins
(:size (histogram (repeatedly 10000 r/grand)))
;;=> 60
Estimate number of bins, Rice rule
(:size (histogram (repeatedly 10000 r/grand) :rice))
;;=> 44
iqr
(iqr vs)
(iqr vs estimation-strategy)
Interquartile range.
Examples
IQR
(iqr (repeatedly 100000 r/grand))
;;=> 1.3507989631201418
jensen-shannon-divergence
(jensen-shannon-divergence vs1 vs2)
Jensen-Shannon divergence of two sequences.
Examples
Jensen-Shannon divergence
(jensen-shannon-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 507.81767699492116
kendall-correlation
(kendall-correlation vs1 vs2)
Kendall’s correlation of two sequences.
Examples
Kendall’s correlation of uniform and gaussian distribution samples.
(kendall-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> -7.405886058860589E-4
kernel-density
(kernel-density vs h)
(kernel-density vs)
Creates kernel density function for given series vs
and optional bandwidth h
.
Examples
Usage
(let [kd (kernel-density [0 10 10 10 10 10 10 10 10 0 0 0 0 1 1 1] 1)]
(map (comp m/approx kd) (range -5 15)))
;;=> (0.0
;;=> 0.0
;;=> 0.0
;;=> 0.02
;;=> 0.09
;;=> 0.17
;;=> 0.15
;;=> 0.06
;;=> 0.01
;;=> 0.0
;;=> 0.0
;;=> 0.0
;;=> 0.0
;;=> 0.03
;;=> 0.12
;;=> 0.2
;;=> 0.12
;;=> 0.03
;;=> 0.0
;;=> 0.0)
kullback-leibler-divergence
(kullback-leibler-divergence vs1 vs2)
Kullback-Leibler divergence of two sequences.
Examples
Kullback-Leibler divergence.
(kullback-leibler-divergence (repeatedly 100 (fn* [] (r/irand 100)))
(repeatedly 100 (fn* [] (r/irand 100))))
;;=> 2199.662874529706
kurtosis
(kurtosis vs)
Calculate kurtosis from sequence.
Examples
Kurtosis
(kurtosis [1 2 3 -1 -1 2 -1 11 111])
;;=> 8.732515263272099
maximum
(maximum vs)
Maximum value from sequence.
Examples
Maximum value
(maximum [1 2 3 -1 -1 2 -1 11 111])
;;=> 111.0
mean
(mean vs)
Calculate mean of vs
Examples
Mean (average value)
(mean [1 2 3 -1 -1 2 -1 11 111])
;;=> 14.111111111111109
median
(median vs)
Calculate median of vs
. See median-3.
Examples
Median (percentile 50%).
(median [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.0
For three elements use faster median-3.
(median [7 1 4])
;;=> 4.0
median-3
(median-3 a b c)
Median of three values. See median.
Examples
Median of [7 1 4]
(median-3 7 1 4)
;;=> 4.0
median-absolute-deviation
(median-absolute-deviation vs)
Calculate MAD
Examples
MAD
(median-absolute-deviation [1 2 3 -1 -1 2 -1 11 111])
;;=> 3.0
minimum
(minimum vs)
Minimum value from sequence.
Examples
Minimum value
(minimum [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0
mode
(mode vs)
Find the value that appears most often in a dataset vs
.
See also modes.
Examples
Example
(mode [1 2 3 -1 -1 2 -1 11 111])
;;=> -1.0
Returns lowest value when every element appears equally.
(mode [5 1 2 3 4])
;;=> 1.0
modes
(modes vs)
Find the values that appears most often in a dataset vs
.
Returns sequence with all most appearing values in increasing order.
See also mode.
Examples
Example
(modes [1 2 3 -1 -1 2 -1 11 111])
;;=> (-1.0)
Returns lowest value when every element appears equally.
(modes [5 5 1 1 2 3 4 4])
;;=> (1.0 4.0 5.0)
outliers
(outliers vs)
(outliers vs estimation-strategy)
(outliers vs q1 q3)
Find outliers defined as values outside outer fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
- LOF (Lower Outer Fence) equals
(- Q1 (* 3.0 IQR))
. - UOF (Upper Outer Fence) equals
(+ Q3 (* 3.0 IQR))
.
Returns sequence.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Outliers
(outliers [1 2 3 -1 -1 2 -1 11 111])
;;=> (111.0)
Gaussian distribution outliers
(outliers (repeatedly 3000000 r/grand))
;;=> (-4.864618557049428
;;=> -4.780633031723624
;;=> -4.7376202235271165
;;=> 4.728896589466469
;;=> 4.744055356963435
;;=> 4.747383207352102
;;=> 4.769288560936935
;;=> 4.909931823383421
;;=> 5.043073126955698)
pearson-correlation
(pearson-correlation vs1 vs2)
Pearson’s correlation of two sequences.
Examples
Pearson’s correlation of uniform and gaussian distribution samples.
(pearson-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 0.007076444050676723
percentile
(percentile vs p)
(percentile vs p estimation-strategy)
Examples
Percentile 25%
(percentile [1 2 3 -1 -1 2 -1 11 111] 25.0)
;;=> -1.0
Percentile 50% (median)
(percentile [1 2 3 -1 -1 2 -1 11 111] 50.0)
;;=> 2.0
Percentile 75%
(percentile [1 2 3 -1 -1 2 -1 11 111] 75.0)
;;=> 7.0
Percentile 90%
(percentile [1 2 3 -1 -1 2 -1 11 111] 90.0)
;;=> 111.0
Various estimation strategies.
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :legacy)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r1)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r2)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r3)
;;=> 11.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r4)
;;=> 8.199999999999996
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r5)
;;=> 25.999999999999858
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r6)
;;=> 61.0
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r7)
;;=> 9.399999999999999
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r8)
;;=> 37.66666666666675
(percentile [1 2 3 -1 -1 2 -1 11 111] 85.0 :r9)
;;=> 34.75000000000007
population-stddev
(population-stddev vs)
(population-stddev vs u)
Calculate population standard deviation of vs
.
See stddev.
Examples
Population standard deviation.
(population-stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 34.4333315406403
population-variance
(population-variance vs)
(population-variance vs u)
Calculate population variance of vs
.
See variance.
Examples
Population variance
(population-variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1185.6543209876543
quantile
(quantile vs p)
(quantile vs p estimation-strategy)
Calculate quantile of a vs
.
Percentile p
is from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentile.
Examples
Quantile 0.25
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.25)
;;=> -1.0
Quantile 0.5 (median)
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.5)
;;=> 2.0
Quantile 0.75
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.75)
;;=> 7.0
Quantile 0.9
(quantile [1 2 3 -1 -1 2 -1 11 111] 0.9)
;;=> 111.0
Various estimation strategies.
(quantile [1 11 111 1111] 0.7 :legacy)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r1)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r2)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r3)
;;=> 111.0
(quantile [1 11 111 1111] 0.7 :r4)
;;=> 90.99999999999999
(quantile [1 11 111 1111] 0.7 :r5)
;;=> 410.99999999999983
(quantile [1 11 111 1111] 0.7 :r6)
;;=> 611.0
(quantile [1 11 111 1111] 0.7 :r7)
;;=> 210.99999999999966
(quantile [1 11 111 1111] 0.7 :r8)
;;=> 477.66666666666623
(quantile [1 11 111 1111] 0.7 :r9)
;;=> 460.99999999999966
second-moment
(second-moment vs)
Calculate second moment from sequence.
It’s a sum of squared deviations from the sample mean
Examples
Second Moment
(second-moment [1 2 3 -1 -1 2 -1 11 111])
;;=> 10670.888888888889
skewness
(skewness vs)
Calculate kurtosis from sequence.
Examples
Skewness
(skewness [1 2 3 -1 -1 2 -1 11 111])
;;=> 2.94268445417954
spearman-correlation
(spearman-correlation vs1 vs2)
Spearman’s correlation of two sequences.
Examples
Spearsman’s correlation of uniform and gaussian distribution samples.
(spearman-correlation (repeatedly 100000 (partial r/grand 1.0 10.0))
(repeatedly 100000 (partial r/drand -10.0 -5.0)))
;;=> 6.338276569602102E-4
standardize
(standardize vs)
Normalize samples to have mean = 0 and stddev = 1.
Examples
Standardize
(standardize [1 2 3 -1 -1 2 -1 11 111])
;;=> (-0.3589915220998317
;;=> -0.33161081278713267
;;=> -0.30423010347443363
;;=> -0.4137529407252298
;;=> -0.4137529407252298
;;=> -0.33161081278713267
;;=> -0.4137529407252298
;;=> -0.08518442897284138
;;=> 2.652886502297062)
stats-map
(stats-map vs)
(stats-map vs estimation-strategy)
Calculate several statistics of vs
and return as map.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See estimation-strategies.
Examples
Stats
(stats-map [1 2 3 -1 -1 2 -1 11 111])
;;=> {:IQR 8.0,
;;=> :Kurtosis 8.846742084858873,
;;=> :LAV 0.0,
;;=> :LIF -13.0,
;;=> :LOF -25.0,
;;=> :MAD 3.0,
;;=> :Max 111.0,
;;=> :Mean 14.11111111111111,
;;=> :Median 2.0,
;;=> :Min -1.0,
;;=> :Mode 3.0,
;;=> :Outliers (109.0),
;;=> :Q1 -1.0,
;;=> :Q3 7.0,
;;=> :SD 34.4333315406403,
;;=> :SEM 11.477777180213435,
;;=> :SecMoment 10142.0,
;;=> :Size 9,
;;=> :Skewness 2.9666775258488958,
;;=> :Total 127.0,
;;=> :UAV 9.0,
;;=> :UIF 19.0,
;;=> :UOF 31.0}
stddev
(stddev vs)
(stddev vs u)
Calculate standard deviation of vs
.
See population-stddev.
Examples
Standard deviation.
(stddev [1 2 3 -1 -1 2 -1 11 111])
;;=> 36.522063346847084
variance
(variance vs)
(variance vs u)
Calculate variance of vs
.
See population-variance.
Examples
Variance.
(variance [1 2 3 -1 -1 2 -1 11 111])
;;=> 1333.861111111111