Overview

Dataset statistics

Number of variables14
Number of observations45726
Missing cells29703
Missing cells (%)4.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.8 MiB
Average record size in memory498.8 B

Variable types

CAT7
NUM5
BOOL1
DATE1

Reproduction

Analysis started2020-02-13 23:57:54.892937
Analysis finished2020-02-13 23:58:48.269928
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
name has a high cardinality: 45726 distinct values High cardinality
recclass has a high cardinality: 466 distinct values High cardinality
GeoLocation has a high cardinality: 17100 distinct values High cardinality
reclat_city is highly correlated with reclatHigh Correlation
reclat is highly correlated with reclat_cityHigh Correlation
reclat has 7315 (16.0%) missing values Missing
reclong has 7315 (16.0%) missing values Missing
GeoLocation has 7315 (16.0%) missing values Missing
reclat_city has 7315 (16.0%) missing values Missing
mass (g) is highly skewed (γ1 = 76.91847245) Skewed
reclat has 6438 (14.1%) zeros Zeros
reclong has 6214 (13.6%) zeros Zeros

Variables

name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count45726
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
Northwest Africa 042
 
1
Miller Range 090994
 
1
Gao-Guenie (b)
 
1
Hammadah al Hamra 294
 
1
Graves Nunataks 06155
 
1
Other values (45721)
45721
ValueCountFrequency (%) 
Northwest Africa 042 1 < 0.1%
 
Miller Range 090994 1 < 0.1%
 
Gao-Guenie (b) 1 < 0.1%
 
Hammadah al Hamra 294 1 < 0.1%
 
Graves Nunataks 06155 1 < 0.1%
 
Österplana 034 1 < 0.1%
 
Miller Range 090442 1 < 0.1%
 
Asuka 880637 1 < 0.1%
 
Allan Hills A77119 1 < 0.1%
 
Miller Range 07112 1 < 0.1%
 
Other values (45716) 45716 > 99.9%
 

Length

Max length28
Mean length17.78358046
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 49 51.0%
 
Uppercase_Letter 31 32.3%
 
Decimal_Number 10 10.4%
 
Other_Punctuation 2 2.1%
 
Close_Punctuation 1 1.0%
 
Space_Separator 1 1.0%
 
Dash_Punctuation 1 1.0%
 
Open_Punctuation 1 1.0%
 
ValueCountFrequency (%) 
Latin 80 83.3%
 
Common 16 16.7%
 
ValueCountFrequency (%) 
ASCII 68 100.0%
 

id
Real number (ℝ≥0)

Distinct count45716
Unique (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26883.906202160695
Minimum1
Maximum57458
Zeros0
Zeros (%)0.0%
Memory size357.4 KiB

Quantile statistics

Minimum1
5-th percentile2388.75
Q112681.25
median24256.5
Q340653.5
95-th percentile54890.75
Maximum57458
Range57457
Interquartile range (IQR)27972.25

Descriptive statistics

Standard deviation16863.44557
Coefficient of variation (CV)0.6272691713
Kurtosis-1.160130804
Mean26883.9062
Median Absolute Deviation (MAD)14489.93531
Skewness0.2665300704
Sum1229293495
Variance284375796.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e+00 1.16250e+03 1.24350e+03 2.35650e+03 2.43150e+03 ... 5.49015e+04 5.49255e+04 5.72245e+04 5.72885e+04 5.74580e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
417 2 < 0.1%
 
398 2 < 0.1%
 
1 2 < 0.1%
 
6 2 < 0.1%
 
392 2 < 0.1%
 
370 2 < 0.1%
 
379 2 < 0.1%
 
2 2 < 0.1%
 
390 2 < 0.1%
 
10 2 < 0.1%
 
Other values (45706) 45706 > 99.9%
 
ValueCountFrequency (%) 
1 2 < 0.1%
 
2 2 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
6 2 < 0.1%
 
ValueCountFrequency (%) 
57458 1 < 0.1%
 
57457 1 < 0.1%
 
57456 1 < 0.1%
 
57455 1 < 0.1%
 
57454 1 < 0.1%
 

nametype
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
Valid
45651
Relict
 
75
ValueCountFrequency (%) 
Valid 45651 99.8%
 
Relict 75 0.2%
 

Length

Max length6
Mean length5.001640205
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 7 77.8%
 
Uppercase_Letter 2 22.2%
 
ValueCountFrequency (%) 
Latin 9 100.0%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 

recclass
Categorical

HIGH CARDINALITY
Distinct count466
Unique (%)1.0%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
L6
8287
H5
7143
L5
4797
H6
4529
H4
4211
Other values (461)
16759
ValueCountFrequency (%) 
L6 8287 18.1%
 
H5 7143 15.6%
 
L5 4797 10.5%
 
H6 4529 9.9%
 
H4 4211 9.2%
 
LL5 2766 6.0%
 
LL6 2043 4.5%
 
L4 1253 2.7%
 
H4/5 428 0.9%
 
CM2 416 0.9%
 
Other values (456) 9853 21.5%
 

Length

Max length26
Mean length3.052530289
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 22 35.5%
 
Uppercase_Letter 20 32.3%
 
Decimal_Number 10 16.1%
 
Other_Punctuation 4 6.5%
 
Math_Symbol 2 3.2%
 
Close_Punctuation 1 1.6%
 
Space_Separator 1 1.6%
 
Dash_Punctuation 1 1.6%
 
Open_Punctuation 1 1.6%
 
ValueCountFrequency (%) 
Latin 42 67.7%
 
Common 20 32.3%
 
ValueCountFrequency (%) 
ASCII 62 100.0%
 

mass (g)
Real number (ℝ≥0)

SKEWED
Distinct count12576
Unique (%)27.6%
Missing131
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean13278.426464261429
Minimum0.0
Maximum60000000.0
Zeros19
Zeros (%)< 0.1%
Memory size357.4 KiB

Quantile statistics

Minimum0
5-th percentile1.1
Q17.2
median32.61
Q3202.9
95-th percentile4000
Maximum60000000
Range60000000
Interquartile range (IQR)195.7

Descriptive statistics

Standard deviation574926.0121
Coefficient of variation (CV)43.2977517
Kurtosis6798.398388
Mean13278.42646
Median Absolute Deviation (MAD)25112.89201
Skewness76.91847245
Sum605429854.6
Variance3.305399193e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.3 171 0.4%
 
1.2 140 0.3%
 
1.4 138 0.3%
 
2.1 130 0.3%
 
2.4 126 0.3%
 
1.6 120 0.3%
 
0.5 119 0.3%
 
1.1 116 0.3%
 
3.8 114 0.2%
 
0.7 111 0.2%
 
Other values (12566) 44310 96.9%
 
(Missing) 131 0.3%
 
ValueCountFrequency (%) 
0 19 < 0.1%
 
0.01 2 < 0.1%
 
0.013 1 < 0.1%
 
0.02 1 < 0.1%
 
0.03 1 < 0.1%
 
ValueCountFrequency (%) 
60000000 1 < 0.1%
 
58200000 1 < 0.1%
 
50000000 1 < 0.1%
 
30000000 1 < 0.1%
 
28000000 1 < 0.1%
 

fall
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
Found
44609
Fell
 
1117
ValueCountFrequency (%) 
Found 44609 97.6%
 
Fell 1117 2.4%
 

Length

Max length5
Mean length4.975571885
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 6 85.7%
 
Uppercase_Letter 1 14.3%
 
ValueCountFrequency (%) 
Latin 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

year
Date

Distinct count245
Unique (%)0.5%
Missing312
Missing (%)0.7%
Memory size357.4 KiB
Minimum1688-01-01 00:00:00
Maximum2101-01-01 00:00:00
Histogram

reclat
Real number (ℝ)

HIGH CORRELATION
MISSING
ZEROS
Distinct count12738
Unique (%)33.2%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean-39.107095143292284
Minimum-87.36667
Maximum81.16667
Zeros6438
Zeros (%)14.1%
Memory size357.4 KiB

Quantile statistics

Minimum-87.36667
5-th percentile-84.35476
Q1-76.71377
median-71.5
Q30
95-th percentile34.494325
Maximum81.16667
Range168.53334
Interquartile range (IQR)76.71377

Descriptive statistics

Standard deviation46.38601095
Coefficient of variation (CV)-1.186127755
Kurtosis-1.476865084
Mean-39.10709514
Median Absolute Deviation (MAD)43.93747025
Skewness0.4913157316
Sum-1502142.632
Variance2151.662012
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 6438 14.1%
 
-71.5 4761 10.4%
 
-84 3040 6.6%
 
-72 1506 3.3%
 
-79.68333 1130 2.5%
 
-76.71667 680 1.5%
 
-76.18333 539 1.2%
 
-84.21667 263 0.6%
 
-86.36667 226 0.5%
 
-86.71667 217 0.5%
 
Other values (12728) 19611 42.9%
 
(Missing) 7315 16.0%
 
ValueCountFrequency (%) 
-87.36667 4 < 0.1%
 
-87.03333 3 < 0.1%
 
-86.93333 3 < 0.1%
 
-86.71667 217 0.5%
 
-86.56667 17 < 0.1%
 
ValueCountFrequency (%) 
81.16667 1 < 0.1%
 
76.53333 1 < 0.1%
 
76.13333 1 < 0.1%
 
72.88333 1 < 0.1%
 
72.68333 1 < 0.1%
 

reclong
Real number (ℝ)

MISSING
ZEROS
Distinct count14640
Unique (%)38.1%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean61.05259359027361
Minimum-165.43333
Maximum354.47333
Zeros6214
Zeros (%)13.6%
Memory size357.4 KiB

Quantile statistics

Minimum-165.43333
5-th percentile-90.427
Q10
median35.66667
Q3157.16667
95-th percentile168
Maximum354.47333
Range519.90666
Interquartile range (IQR)157.16667

Descriptive statistics

Standard deviation80.65525774
Coefficient of variation (CV)1.321078319
Kurtosis-0.7313935567
Mean61.05259359
Median Absolute Deviation (MAD)67.60562132
Skewness-0.1743813291
Sum2345091.172
Variance6505.2706
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 6214 13.6%
 
35.66667 4985 10.9%
 
168 3040 6.6%
 
26 1506 3.3%
 
159.75 657 1.4%
 
159.66667 637 1.4%
 
157.16667 542 1.2%
 
155.75 473 1.0%
 
160.5 263 0.6%
 
-70 228 0.5%
 
Other values (14630) 19866 43.4%
 
(Missing) 7315 16.0%
 
ValueCountFrequency (%) 
-165.43333 9 < 0.1%
 
-165.11667 17 < 0.1%
 
-163.16667 1 < 0.1%
 
-162.55 1 < 0.1%
 
-157.86667 1 < 0.1%
 
ValueCountFrequency (%) 
354.47333 1 < 0.1%
 
178.2 1 < 0.1%
 
178.08333 1 < 0.1%
 
175.73028 1 < 0.1%
 
175.13333 1 < 0.1%
 

GeoLocation
Categorical

HIGH CARDINALITY
MISSING
Distinct count17100
Unique (%)44.5%
Missing7315
Missing (%)16.0%
Memory size357.4 KiB
(0.0, 0.0)
6214
(-71.5, 35.66667)
4761
(-84.0, 168.0)
 
3040
(-72.0, 26.0)
 
1505
(-79.68333, 159.75)
 
657
Other values (17095)
22234
ValueCountFrequency (%) 
(0.0, 0.0) 6214 13.6%
 
(-71.5, 35.66667) 4761 10.4%
 
(-84.0, 168.0) 3040 6.6%
 
(-72.0, 26.0) 1505 3.3%
 
(-79.68333, 159.75) 657 1.4%
 
(-76.71667, 159.66667) 637 1.4%
 
(-76.18333, 157.16667) 539 1.2%
 
(-79.68333, 155.75) 473 1.0%
 
(-84.21667, 160.5) 263 0.6%
 
(-86.36667, -70.0) 226 0.5%
 
Other values (17090) 20096 43.9%
 
(Missing) 7315 16.0%
 

Length

Max length24
Mean length15.01640205
Min length3
ValueCountFrequency (%) 
Decimal_Number 10 55.6%
 
Lowercase_Letter 2 11.1%
 
Other_Punctuation 2 11.1%
 
Open_Punctuation 1 5.6%
 
Close_Punctuation 1 5.6%
 
Dash_Punctuation 1 5.6%
 
Space_Separator 1 5.6%
 
ValueCountFrequency (%) 
Common 16 88.9%
 
Latin 2 11.1%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

source
Categorical

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
NASA
45726
ValueCountFrequency (%) 
NASA 45726 100.0%
 

Length

Max length4
Mean length4
Min length4
ValueCountFrequency (%) 
Uppercase_Letter 3 100.0%
 
ValueCountFrequency (%) 
Latin 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

boolean
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
True
22901
False
22825
ValueCountFrequency (%) 
True 22901 50.1%
 
False 22825 49.9%
 

mixed
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size357.4 KiB
A
22977
1
22749
ValueCountFrequency (%) 
A 22977 50.2%
 
1 22749 49.8%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 1 50.0%
 
Uppercase_Letter 1 50.0%
 
ValueCountFrequency (%) 
Common 1 50.0%
 
Latin 1 50.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

reclat_city
Real number (ℝ)

HIGH CORRELATION
MISSING
Distinct count38401
Unique (%)> 99.9%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean-39.11494442027192
Minimum-101.86835412810719
Maximum83.02049375472033
Zeros0
Zeros (%)0.0%
Memory size357.4 KiB

Quantile statistics

Minimum-101.8683541
5-th percentile-87.82498897
Q1-78.25072439
median-68.96321154
Q34.888493136
95-th percentile35.6598553
Maximum83.02049375
Range184.8888479
Interquartile range (IQR)83.13921753

Descriptive statistics

Standard deviation46.67600798
Coefficient of variation (CV)-1.193303702
Kurtosis-1.443297303
Mean-39.11494442
Median Absolute Deviation (MAD)43.96653332
Skewness0.4833374582
Sum-1502444.13
Variance2178.649721
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.81201244 2 < 0.1%
 
-45.81130518 2 < 0.1%
 
42.71826242 2 < 0.1%
 
42.10570186 2 < 0.1%
 
33.94875012 2 < 0.1%
 
51.85779161 2 < 0.1%
 
-30.16178123 2 < 0.1%
 
62.76737248 2 < 0.1%
 
-37.36226675 2 < 0.1%
 
50.07114414 2 < 0.1%
 
Other values (38391) 38391 84.0%
 
(Missing) 7315 16.0%
 
ValueCountFrequency (%) 
-101.8683541 1 < 0.1%
 
-101.423367 1 < 0.1%
 
-101.0704242 1 < 0.1%
 
-100.5818084 1 < 0.1%
 
-100.5102236 1 < 0.1%
 
ValueCountFrequency (%) 
83.02049375 1 < 0.1%
 
81.11338369 1 < 0.1%
 
79.99646477 1 < 0.1%
 
78.2480926 1 < 0.1%
 
77.78812128 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

nameidnametyperecclassmass (g)fallyearreclatreclongGeoLocationsourcebooleanmixedreclat_city
0Aachen1ValidL521.0Fell1880-01-0150.775006.08333(50.775, 6.08333)NASATrue150.071144
1Aarhus2ValidH6720.0Fell1951-01-0156.1833310.23333(56.18333, 10.23333)NASATrue162.767372
2Abee6ValidEH4107000.0Fell1952-01-0154.21667-113.00000(54.21667, -113.0)NASAFalse151.857792
3Acapulco10ValidAcapulcoite1914.0Fell1976-01-0116.88333-99.90000(16.88333, -99.9)NASATrueA18.812012
4Achiras370ValidL6780.0Fell1902-01-01-33.16667-64.95000(-33.16667, -64.95)NASATrue1-45.811305
5Adhi Kot379ValidEH44239.0Fell1919-01-0132.1000071.80000(32.1, 71.8)NASATrueA33.948750
6Adzhi-Bogdo (stone)390ValidLL3-6910.0Fell1949-01-0144.8333395.16667(44.83333, 95.16667)NASATrueA42.105702
7Agen392ValidH530000.0Fell1814-01-0144.216670.61667(44.21667, 0.61667)NASAFalseA42.718262
8Aguada398ValidL61620.0Fell1930-01-01-31.60000-65.23333(-31.6, -65.23333)NASATrue1-37.362267
9Aguila Blanca417ValidL1440.0Fell1920-01-01-30.86667-64.55000(-30.86667, -64.55)NASAFalse1-30.161781

Last rows

nameidnametyperecclassmass (g)fallyearreclatreclongGeoLocationsourcebooleanmixedreclat_city
45716Aachen copy1ValidL521.0Fell1880-01-0150.775006.08333(50.775, 6.08333)NASATrue150.071144
45717Aarhus copy2ValidH6720.0Fell1951-01-0156.1833310.23333(56.18333, 10.23333)NASATrue162.767372
45718Abee copy6ValidEH4107000.0Fell1952-01-0154.21667-113.00000(54.21667, -113.0)NASAFalse151.857792
45719Acapulco copy10ValidAcapulcoite1914.0Fell1976-01-0116.88333-99.90000(16.88333, -99.9)NASATrueA18.812012
45720Achiras copy370ValidL6780.0Fell1902-01-01-33.16667-64.95000(-33.16667, -64.95)NASATrue1-45.811305
45721Adhi Kot copy379ValidEH44239.0Fell1919-01-0132.1000071.80000(32.1, 71.8)NASATrueA33.948750
45722Adzhi-Bogdo (stone) copy390ValidLL3-6910.0Fell1949-01-0144.8333395.16667(44.83333, 95.16667)NASATrueA42.105702
45723Agen copy392ValidH530000.0Fell1814-01-0144.216670.61667(44.21667, 0.61667)NASAFalseA42.718262
45724Aguada copy398ValidL61620.0Fell1930-01-01-31.60000-65.23333(-31.6, -65.23333)NASATrue1-37.362267
45725Aguila Blanca copy417ValidL1440.0Fell1920-01-01-30.86667-64.55000(-30.86667, -64.55)NASAFalse1-30.161781