Overview

Dataset statistics

Number of variables12
Number of observations74
Missing cells5
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.6 KiB
Average record size in memory187.9 B

Variable types

NUM9
CAT3

Reproduction

Analysis started2020-02-13 13:04:46.750360
Analysis finished2020-02-13 13:04:59.921940
Versionpandas-profiling v2.4.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
make has a high cardinality: 74 distinct values High cardinality
length is highly correlated with weightHigh Correlation
weight is highly correlated with lengthHigh Correlation
rep78 has 5 (6.8%) missing values Missing

Variables

make
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count74
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
Olds Toronado
 
1
Volvo 260
 
1
Buick Skylark
 
1
Merc. Marquis
 
1
VW Rabbit
 
1
Other values (69)
69
ValueCountFrequency (%) 
Olds Toronado 1 1.4%
 
Volvo 260 1 1.4%
 
Buick Skylark 1 1.4%
 
Merc. Marquis 1 1.4%
 
VW Rabbit 1 1.4%
 
Linc. Continental 1 1.4%
 
Merc. Monarch 1 1.4%
 
Ford Fiesta 1 1.4%
 
Olds Omega 1 1.4%
 
Toyota Celica 1 1.4%
 
Other values (64) 64 86.5%
 

Length

Max length17
Mean length11.77027027
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 25 42.4%
 
Uppercase_Letter 21 35.6%
 
Decimal_Number 10 16.9%
 
Other_Punctuation 1 1.7%
 
Dash_Punctuation 1 1.7%
 
Space_Separator 1 1.7%
 
ValueCountFrequency (%) 
Latin 46 78.0%
 
Common 13 22.0%
 
ValueCountFrequency (%) 
ASCII 59 100.0%
 

price
Real number (ℝ≥0)

UNIQUE
Distinct count74
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6165.256756756757
Minimum3291
Maximum15906
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum3291
5-th percentile3780.5
Q14220.25
median5006.5
Q36332.25
95-th percentile13156.6
Maximum15906
Range12615
Interquartile range (IQR)2112

Descriptive statistics

Standard deviation2949.495885
Coefficient of variation (CV)0.4784060099
Kurtosis2.034047676
Mean6165.256757
Median Absolute Deviation (MAD)2169.739226
Skewness1.687840988
Sum456229
Variance8699525.974
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 3291. 6414. 15906.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5886 1 1.4%
 
6303 1 1.4%
 
4296 1 1.4%
 
15906 1 1.4%
 
14500 1 1.4%
 
4389 1 1.4%
 
5798 1 1.4%
 
4647 1 1.4%
 
4010 1 1.4%
 
5397 1 1.4%
 
Other values (64) 64 86.5%
 
ValueCountFrequency (%) 
3291 1 1.4%
 
3299 1 1.4%
 
3667 1 1.4%
 
3748 1 1.4%
 
3798 1 1.4%
 
ValueCountFrequency (%) 
15906 1 1.4%
 
14500 1 1.4%
 
13594 1 1.4%
 
13466 1 1.4%
 
12990 1 1.4%
 

mpg
Real number (ℝ≥0)

Distinct count21
Unique (%)28.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.2972972972973
Minimum12
Maximum41
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum12
5-th percentile14
Q118
median20
Q324.75
95-th percentile32.05
Maximum41
Range29
Interquartile range (IQR)6.75

Descriptive statistics

Standard deviation5.78550321
Coefficient of variation (CV)0.2716543385
Kurtosis1.129919829
Mean21.2972973
Median Absolute Deviation (MAD)4.507669832
Skewness0.9684601369
Sum1576
Variance33.47204739
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[12. 25.5 41. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
18 9 12.2%
 
19 8 10.8%
 
14 6 8.1%
 
25 5 6.8%
 
22 5 6.8%
 
21 5 6.8%
 
16 4 5.4%
 
17 4 5.4%
 
24 4 5.4%
 
23 3 4.1%
 
Other values (11) 21 28.4%
 
ValueCountFrequency (%) 
12 2 2.7%
 
14 6 8.1%
 
15 2 2.7%
 
16 4 5.4%
 
17 4 5.4%
 
ValueCountFrequency (%) 
41 1 1.4%
 
35 2 2.7%
 
34 1 1.4%
 
31 1 1.4%
 
30 2 2.7%
 

rep78
Categorical

MISSING
Distinct count5
Unique (%)6.8%
Missing5
Missing (%)6.8%
Memory size3.3 KiB
Average
30
Good
18
Excellent
11
Fair
8
Poor
 
2
ValueCountFrequency (%) 
Average 30 40.5%
 
Good 18 24.3%
 
Excellent 11 14.9%
 
Fair 8 10.8%
 
Poor 2 2.7%
 
(Missing) 5 6.8%
 

Length

Max length9
Mean length5.891891892
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 13 72.2%
 
Uppercase_Letter 5 27.8%
 
ValueCountFrequency (%) 
Latin 18 100.0%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

headroom
Real number (ℝ≥0)

Distinct count8
Unique (%)10.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9932432
Minimum1.5
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB

Quantile statistics

Minimum1.5
5-th percentile1.825
Q12.5
median3
Q33.5
95-th percentile4.5
Maximum5
Range3.5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8459947705
Coefficient of variation (CV)0.2826348245
Kurtosis-0.7620739341
Mean2.993243217
Median Absolute Deviation (MAD)0.6970416903
Skewness0.1437965482
Sum221.5
Variance0.7157071233
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.5 4.25 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3.5 15 20.3%
 
2.5 14 18.9%
 
2 13 17.6%
 
3 13 17.6%
 
4 10 13.5%
 
1.5 4 5.4%
 
4.5 4 5.4%
 
5 1 1.4%
 
ValueCountFrequency (%) 
1.5 4 5.4%
 
2 13 17.6%
 
2.5 14 18.9%
 
3 13 17.6%
 
3.5 15 20.3%
 
ValueCountFrequency (%) 
5 1 1.4%
 
4.5 4 5.4%
 
4 10 13.5%
 
3.5 15 20.3%
 
3 13 17.6%
 

trunk
Real number (ℝ≥0)

Distinct count18
Unique (%)24.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.756756756756756
Minimum5
Maximum23
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum5
5-th percentile7
Q110.25
median14
Q316.75
95-th percentile20.35
Maximum23
Range18
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation4.277404189
Coefficient of variation (CV)0.3109311493
Kurtosis-0.7796393143
Mean13.75675676
Median Absolute Deviation (MAD)3.61431702
Skewness0.02981113321
Sum1018
Variance18.2961866
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 15.5 17.5 23. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
16 12 16.2%
 
17 8 10.8%
 
11 8 10.8%
 
20 6 8.1%
 
8 5 6.8%
 
15 5 6.8%
 
10 5 6.8%
 
13 4 5.4%
 
14 4 5.4%
 
9 4 5.4%
 
Other values (8) 13 17.6%
 
ValueCountFrequency (%) 
5 1 1.4%
 
6 1 1.4%
 
7 3 4.1%
 
8 5 6.8%
 
9 4 5.4%
 
ValueCountFrequency (%) 
23 1 1.4%
 
22 1 1.4%
 
21 2 2.7%
 
20 6 8.1%
 
18 1 1.4%
 

weight
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count64
Unique (%)86.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3019.4594594594596
Minimum1760
Maximum4840
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum1760
5-th percentile1895
Q12250
median3190
Q33600
95-th percentile4186
Maximum4840
Range3080
Interquartile range (IQR)1350

Descriptive statistics

Standard deviation777.1935671
Coefficient of variation (CV)0.2573949336
Kurtosis-0.8585177502
Mean3019.459459
Median Absolute Deviation (MAD)668.6778671
Skewness0.1511986317
Sum223440
Variance604029.8408
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1760. 4105. 4840.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4060 2 2.7%
 
3420 2 2.7%
 
3690 2 2.7%
 
1800 2 2.7%
 
2830 2 2.7%
 
3600 2 2.7%
 
3370 2 2.7%
 
2750 2 2.7%
 
2650 2 2.7%
 
2200 2 2.7%
 
Other values (54) 54 73.0%
 
ValueCountFrequency (%) 
1760 1 1.4%
 
1800 2 2.7%
 
1830 1 1.4%
 
1930 1 1.4%
 
1980 1 1.4%
 
ValueCountFrequency (%) 
4840 1 1.4%
 
4720 1 1.4%
 
4330 1 1.4%
 
4290 1 1.4%
 
4130 1 1.4%
 

length
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count47
Unique (%)63.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean187.93243243243242
Minimum142
Maximum233
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum142
5-th percentile154.65
Q1170
median192.5
Q3203.75
95-th percentile221
Maximum233
Range91
Interquartile range (IQR)33.75

Descriptive statistics

Standard deviation22.2663399
Coefficient of variation (CV)0.1184805603
Kurtosis-0.9408177208
Mean187.9324324
Median Absolute Deviation (MAD)19.28743608
Skewness-0.0418272235
Sum13907
Variance495.7898926
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[142. 233.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
200 4 5.4%
 
170 4 5.4%
 
198 4 5.4%
 
179 3 4.1%
 
165 3 4.1%
 
201 3 4.1%
 
206 3 4.1%
 
204 2 2.7%
 
172 2 2.7%
 
193 2 2.7%
 
Other values (37) 44 59.5%
 
ValueCountFrequency (%) 
142 1 1.4%
 
147 1 1.4%
 
149 1 1.4%
 
154 1 1.4%
 
155 2 2.7%
 
ValueCountFrequency (%) 
233 1 1.4%
 
230 1 1.4%
 
222 1 1.4%
 
221 2 2.7%
 
220 2 2.7%
 

turn
Real number (ℝ≥0)

Distinct count18
Unique (%)24.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.648648648648646
Minimum31
Maximum51
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum31
5-th percentile33.65
Q136
median40
Q343
95-th percentile46
Maximum51
Range20
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.399353727
Coefficient of variation (CV)0.1109584785
Kurtosis-0.7395773616
Mean39.64864865
Median Absolute Deviation (MAD)3.794740687
Skewness0.1264026823
Sum2934
Variance19.35431322
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[31. 45.5 51. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
43 12 16.2%
 
36 9 12.2%
 
42 7 9.5%
 
40 6 8.1%
 
35 6 8.1%
 
34 6 8.1%
 
41 4 5.4%
 
37 4 5.4%
 
46 3 4.1%
 
45 3 4.1%
 
Other values (8) 14 18.9%
 
ValueCountFrequency (%) 
31 1 1.4%
 
32 1 1.4%
 
33 2 2.7%
 
34 6 8.1%
 
35 6 8.1%
 
ValueCountFrequency (%) 
51 1 1.4%
 
48 2 2.7%
 
46 3 4.1%
 
45 3 4.1%
 
44 3 4.1%
 

displacement
Real number (ℝ≥0)

Distinct count31
Unique (%)41.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean197.2972972972973
Minimum79
Maximum425
Zeros0
Zeros (%)0.0%
Memory size3.7 KiB

Quantile statistics

Minimum79
5-th percentile87.95
Q1119
median196
Q3245.25
95-th percentile350
Maximum425
Range346
Interquartile range (IQR)126.25

Descriptive statistics

Standard deviation91.83721896
Coefficient of variation (CV)0.4654763153
Kurtosis-0.5830817597
Mean197.2972973
Median Absolute Deviation (MAD)77.87289993
Skewness0.6039687276
Sum14600
Variance8434.074787
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 79. 153.5 228. 240.5 425. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
231 13 17.6%
 
350 5 6.8%
 
97 5 6.8%
 
302 4 5.4%
 
250 3 4.1%
 
151 3 4.1%
 
121 3 4.1%
 
140 3 4.1%
 
119 3 4.1%
 
225 2 2.7%
 
Other values (21) 30 40.5%
 
ValueCountFrequency (%) 
79 1 1.4%
 
85 1 1.4%
 
86 2 2.7%
 
89 1 1.4%
 
90 1 1.4%
 
ValueCountFrequency (%) 
425 1 1.4%
 
400 2 2.7%
 
350 5 6.8%
 
318 2 2.7%
 
304 1 1.4%
 

gear_ratio
Real number (ℝ≥0)

Distinct count36
Unique (%)48.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.0148647
Minimum2.19
Maximum3.89
Zeros0
Zeros (%)0.0%
Memory size3.4 KiB

Quantile statistics

Minimum2.190000057
5-th percentile2.364500046
Q12.730000019
median2.955000043
Q33.352499902
95-th percentile3.779999971
Maximum3.890000105
Range1.700000048
Interquartile range (IQR)0.6224998832

Descriptive statistics

Standard deviation0.4562871158
Coefficient of variation (CV)0.1513458043
Kurtosis-0.8762872815
Mean3.014864683
Median Absolute Deviation (MAD)0.3697223663
Skewness0.2237261981
Sum223.0999908
Variance0.2081979215
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.19000006 3.8900001 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2.730000019 9 12.2%
 
2.930000067 8 10.8%
 
3.079999924 7 9.5%
 
2.470000029 5 6.8%
 
3.539999962 3 4.1%
 
2.410000086 3 4.1%
 
3.049999952 3 4.1%
 
3.779999971 3 4.1%
 
3.369999886 2 2.7%
 
2.559999943 2 2.7%
 
Other values (26) 29 39.2%
 
ValueCountFrequency (%) 
2.190000057 1 1.4%
 
2.24000001 1 1.4%
 
2.25999999 1 1.4%
 
2.279999971 1 1.4%
 
2.410000086 3 4.1%
 
ValueCountFrequency (%) 
3.890000105 1 1.4%
 
3.809999943 1 1.4%
 
3.779999971 3 4.1%
 
3.74000001 1 1.4%
 
3.730000019 1 1.4%
 

foreign
Categorical

Distinct count2
Unique (%)2.7%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
Domestic
52
Foreign
22
ValueCountFrequency (%) 
Domestic 52 70.3%
 
Foreign 22 29.7%
 

Length

Max length8
Mean length7.702702703
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 10 83.3%
 
Uppercase_Letter 2 16.7%
 
ValueCountFrequency (%) 
Latin 12 100.0%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Missing values

Sample

First rows

makepricempgrep78headroomtrunkweightlengthturndisplacementgear_ratioforeign
0AMC Concord409922Average2.5112930186401213.58Domestic
1AMC Pacer474917Average3.0113350173402582.53Domestic
2AMC Spirit379922NaN3.0122640168351213.08Domestic
3Buick Century481620Average4.5163250196401962.93Domestic
4Buick Electra782715Good4.0204080222433502.41Domestic
5Buick LeSabre578818Average4.0213670218432312.73Domestic
6Buick Opel445326NaN3.0102230170343042.87Domestic
7Buick Regal518920Average2.0163280200421962.93Domestic
8Buick Riviera1037216Average3.5173880207432312.93Domestic
9Buick Skylark408219Average3.5133400200422313.08Domestic

Last rows

makepricempgrep78headroomtrunkweightlengthturndisplacementgear_ratioforeign
64Renault Le Car389526Average3.010183014234793.72Foreign
65Subaru379835Excellent2.511205016436973.81Foreign
66Toyota Celica589918Excellent2.5142410174361343.06Foreign
67Toyota Corolla374831Excellent3.09220016535973.21Foreign
68Toyota Corona571918Excellent2.0112670175361343.05Foreign
69VW Dasher714023Good2.512216017236973.74Foreign
70VW Diesel539741Excellent3.015204015535903.78Foreign
71VW Rabbit469725Good3.015193015535893.78Foreign
72VW Scirocco685025Good2.016199015636973.78Foreign
73Volvo 2601199517Excellent2.5143170193371632.98Foreign