Overview

Dataset statistics

Number of variables6
Number of observations865
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory190.2 KiB
Average record size in memory225.2 B

Variable types

CAT3
NUM3

Reproduction

Analysis started2020-02-13 23:57:46.007107
Analysis finished2020-02-13 23:57:49.053188
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Code has a high cardinality: 865 distinct values High cardinality
Name has a high cardinality: 865 distinct values High cardinality
Hex has a high cardinality: 765 distinct values High cardinality
R has 81 (9.4%) zeros Zeros
G has 58 (6.7%) zeros Zeros
B has 80 (9.2%) zeros Zeros

Variables

Code
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count865
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
wild_watermelon
 
1
ube
 
1
fuchsia
 
1
camouflage_green
 
1
violet_ryb
 
1
Other values (860)
860
ValueCountFrequency (%) 
wild_watermelon 1 0.1%
 
ube 1 0.1%
 
fuchsia 1 0.1%
 
camouflage_green 1 0.1%
 
violet_ryb 1 0.1%
 
fluorescent_yellow 1 0.1%
 
air_force_blue_usaf 1 0.1%
 
medium_aquamarine 1 0.1%
 
pink_pearl 1 0.1%
 
electric_cyan 1 0.1%
 
Other values (855) 855 98.8%
 

Length

Max length39
Mean length11.37572254
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 26 83.9%
 
Decimal_Number 4 12.9%
 
Connector_Punctuation 1 3.2%
 
ValueCountFrequency (%) 
Latin 26 83.9%
 
Common 5 16.1%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count865
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Light Yellow
 
1
Light Red Ochre
 
1
Pastel Pink
 
1
Resolution Blue
 
1
Office Green
 
1
Other values (860)
860
ValueCountFrequency (%) 
Light Yellow 1 0.1%
 
Light Red Ochre 1 0.1%
 
Pastel Pink 1 0.1%
 
Resolution Blue 1 0.1%
 
Office Green 1 0.1%
 
Electric Lavender 1 0.1%
 
Dark Slate Blue 1 0.1%
 
Copper Rose 1 0.1%
 
School Bus Yellow 1 0.1%
 
Flavescent 1 0.1%
 
Other values (855) 855 98.8%
 

Length

Max length41
Mean length11.59190751
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 29 42.0%
 
Uppercase_Letter 26 37.7%
 
Other_Punctuation 5 7.2%
 
Decimal_Number 4 5.8%
 
Open_Punctuation 1 1.4%
 
Close_Punctuation 1 1.4%
 
Final_Punctuation 1 1.4%
 
Space_Separator 1 1.4%
 
Dash_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 55 79.7%
 
Common 14 20.3%
 
ValueCountFrequency (%) 
ASCII 65 98.5%
 
Punctuation 1 1.5%
 

Hex
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count765
Unique (%)88.4%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
#c19a6b
 
5
#fada5e
 
4
#967117
 
4
#fad6a5
 
3
#d2691e
 
3
Other values (760)
846
ValueCountFrequency (%) 
#c19a6b 5 0.6%
 
#fada5e 4 0.5%
 
#967117 4 0.5%
 
#fad6a5 3 0.3%
 
#d2691e 3 0.3%
 
#a52a2a 3 0.3%
 
#008000 3 0.3%
 
#0ff 3 0.3%
 
#808080 3 0.3%
 
#0f0 3 0.3%
 
Other values (755) 831 96.1%
 

Length

Max length7
Mean length6.798843931
Min length4
ValueCountFrequency (%) 
Decimal_Number 10 58.8%
 
Lowercase_Letter 6 35.3%
 
Other_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Common 11 64.7%
 
Latin 6 35.3%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

R
Real number (ℝ≥0)

ZEROS
Distinct count221
Unique (%)25.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean158.59884393063584
Minimum0
Maximum255
Zeros81
Zeros (%)9.4%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1101
median178
Q3236
95-th percentile255
Maximum255
Range255
Interquartile range (IQR)135

Descriptive statistics

Standard deviation85.33843164
Coefficient of variation (CV)0.5380772617
Kurtosis-0.9264508707
Mean158.5988439
Median Absolute Deviation (MAD)72.69125464
Skewness-0.5936792074
Sum137188
Variance7282.647915
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 100.5 203.5 205.5 214.5 249.5 254.5 255. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
255 110 12.7%
 
0 81 9.4%
 
250 15 1.7%
 
204 13 1.5%
 
128 11 1.3%
 
150 11 1.3%
 
227 10 1.2%
 
153 10 1.2%
 
244 10 1.2%
 
240 9 1.0%
 
Other values (211) 585 67.6%
 
ValueCountFrequency (%) 
0 81 9.4%
 
1 4 0.5%
 
2 1 0.1%
 
3 2 0.2%
 
5 1 0.1%
 
ValueCountFrequency (%) 
255 110 12.7%
 
254 7 0.8%
 
253 8 0.9%
 
252 6 0.7%
 
251 9 1.0%
 

G
Real number (ℝ≥0)

ZEROS
Distinct count234
Unique (%)27.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean124.68323699421966
Minimum0
Maximum255
Zeros58
Zeros (%)6.7%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q164
median123
Q3190
95-th percentile250
Maximum255
Range255
Interquartile range (IQR)126

Descriptive statistics

Standard deviation76.27022506
Coefficient of variation (CV)0.6117119422
Kurtosis-1.097846721
Mean124.683237
Median Absolute Deviation (MAD)64.8274944
Skewness0.0522334723
Sum107851
Variance5817.14723
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 27.5 126.5 132.5 254.5 255. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 58 6.7%
 
255 35 4.0%
 
128 13 1.5%
 
105 12 1.4%
 
51 11 1.3%
 
204 11 1.3%
 
66 9 1.0%
 
102 9 1.0%
 
218 9 1.0%
 
160 9 1.0%
 
Other values (224) 689 79.7%
 
ValueCountFrequency (%) 
0 58 6.7%
 
1 2 0.2%
 
2 2 0.2%
 
3 2 0.2%
 
6 2 0.2%
 
ValueCountFrequency (%) 
255 35 4.0%
 
254 3 0.3%
 
253 2 0.2%
 
252 2 0.2%
 
251 1 0.1%
 

B
Real number (ℝ≥0)

ZEROS
Distinct count230
Unique (%)26.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean119.0878612716763
Minimum0
Maximum255
Zeros80
Zeros (%)9.2%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q153
median119
Q3186
95-th percentile253.6
Maximum255
Range255
Interquartile range (IQR)133

Descriptive statistics

Standard deviation78.34386249
Coefficient of variation (CV)0.6578660634
Kurtosis-1.13796004
Mean119.0878613
Median Absolute Deviation (MAD)67.11706773
Skewness0.1072876893
Sum103011
Variance6137.76079
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1. 29.5 106.5 107.5 126.5 128.5 240.5 254.5 255. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 80 9.2%
 
255 41 4.7%
 
107 15 1.7%
 
128 14 1.6%
 
204 10 1.2%
 
120 9 1.0%
 
94 9 1.0%
 
51 8 0.9%
 
33 8 0.9%
 
59 8 0.9%
 
Other values (220) 663 76.6%
 
ValueCountFrequency (%) 
0 80 9.2%
 
2 3 0.3%
 
3 1 0.1%
 
5 2 0.2%
 
7 2 0.2%
 
ValueCountFrequency (%) 
255 41 4.7%
 
254 3 0.3%
 
252 1 0.1%
 
251 1 0.1%
 
250 7 0.8%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

CodeNameHexRGB
0air_force_blue_rafAir Force Blue (Raf)#5d8aa893138168
1air_force_blue_usafAir Force Blue (Usaf)#00308f048143
2air_superiority_blueAir Superiority Blue#72a0c1114160193
3alabama_crimsonAlabama Crimson#a326381633856
4alice_blueAlice Blue#f0f8ff240248255
5alizarin_crimsonAlizarin Crimson#e326362273854
6alloy_orangeAlloy Orange#c462101969816
7almondAlmond#efdecd239222205
8amaranthAmaranth#e52b502294380
9amberAmber#ffbf002551910

Last rows

CodeNameHexRGB
855yale_blueYale Blue#0f4d921577146
856yellowYellow#ff02552550
857yellow_greenYellow-Green#9acd3215420550
858yellow_munsellYellow (Munsell)#efcc002392040
859yellow_ncsYellow (Ncs)#ffd3002552110
860yellow_orangeYellow Orange#ffae4225517466
861yellow_processYellow (Process)#ffef002552390
862yellow_rybYellow (Ryb)#fefe3325425451
863zaffreZaffre#0014a8020168
864zinnwaldite_brownZinnwaldite Brown#2c160844228