Overview

Dataset statistics

Number of variables8
Number of observations33586
Missing cells59224
Missing cells (%)22.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.6 MiB
Average record size in memory363.4 B

Variable types

CAT6
NUM2

Reproduction

Analysis started2020-02-13 23:57:25.183377
Analysis finished2020-02-13 23:57:40.418888
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Name has a high cardinality: 33264 distinct values High cardinality
Job Titles has a high cardinality: 1111 distinct values High cardinality
Typical Hours is highly correlated with Full or Part-TimeHigh Correlation
Full or Part-Time is highly correlated with Typical HoursHigh Correlation
Typical Hours has 25638 (76.3%) missing values Missing
Annual Salary has 7948 (23.7%) missing values Missing
Hourly Rate has 25638 (76.3%) missing values Missing

Variables

Name
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count33264
Unique (%)99.0%
Missing0
Missing (%)0.0%
Memory size262.5 KiB
PEREZ, JOSE A
 
4
RIVERA, RICARDO
 
4
HERNANDEZ, JUAN C
 
4
LOPEZ, ROBERT
 
4
ROMERO, MIGUEL A
 
4
Other values (33259)
33566
ValueCountFrequency (%) 
PEREZ, JOSE A 4 < 0.1%
 
RIVERA, RICARDO 4 < 0.1%
 
HERNANDEZ, JUAN C 4 < 0.1%
 
LOPEZ, ROBERT 4 < 0.1%
 
ROMERO, MIGUEL A 4 < 0.1%
 
JONES, DANIEL 3 < 0.1%
 
HERNANDEZ, RUBEN 3 < 0.1%
 
RODRIGUEZ, RICHARD 3 < 0.1%
 
JOHNSON, NICHOLAS 3 < 0.1%
 
RODRIGUEZ, JOSE 3 < 0.1%
 
Other values (33254) 33551 99.9%
 

Length

Max length31
Mean length17.20067885
Min length8
ValueCountFrequency (%) 
Uppercase_Letter 26 66.7%
 
Lowercase_Letter 6 15.4%
 
Other_Punctuation 3 7.7%
 
Close_Punctuation 1 2.6%
 
Space_Separator 1 2.6%
 
Dash_Punctuation 1 2.6%
 
Open_Punctuation 1 2.6%
 
ValueCountFrequency (%) 
Latin 32 82.1%
 
Common 7 17.9%
 
ValueCountFrequency (%) 
ASCII 39 100.0%
 

Job Titles
Categorical

HIGH CARDINALITY
Distinct count1111
Unique (%)3.3%
Missing0
Missing (%)0.0%
Memory size262.5 KiB
POLICE OFFICER
9796
FIREFIGHTER-EMT
 
1313
SERGEANT
 
1306
POLICE OFFICER (ASSIGNED AS DETECTIVE)
 
1081
MOTOR TRUCK DRIVER
 
1057
Other values (1106)
19033
ValueCountFrequency (%) 
POLICE OFFICER 9796 29.2%
 
FIREFIGHTER-EMT 1313 3.9%
 
SERGEANT 1306 3.9%
 
POLICE OFFICER (ASSIGNED AS DETECTIVE) 1081 3.2%
 
MOTOR TRUCK DRIVER 1057 3.1%
 
SANITATION LABORER 557 1.7%
 
FIREFIGHTER-EMT (RECRUIT) 529 1.6%
 
CONSTRUCTION LABORER 457 1.4%
 
TRAFFIC CONTROL AIDE-HOURLY 428 1.3%
 
LIEUTENANT-EMT 408 1.2%
 
Other values (1101) 16654 49.6%
 

Length

Max length50
Mean length18.50726493
Min length5
ValueCountFrequency (%) 
Uppercase_Letter 26 68.4%
 
Other_Punctuation 4 10.5%
 
Decimal_Number 3 7.9%
 
Close_Punctuation 1 2.6%
 
Dash_Punctuation 1 2.6%
 
Space_Separator 1 2.6%
 
Lowercase_Letter 1 2.6%
 
Open_Punctuation 1 2.6%
 
ValueCountFrequency (%) 
Latin 27 71.1%
 
Common 11 28.9%
 
ValueCountFrequency (%) 
ASCII 38 100.0%
 

Department
Categorical

Distinct count36
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size262.5 KiB
POLICE
13929
FIRE
4775
STREETS & SAN
 
2075
OEMC
 
2012
WATER MGMNT
 
1925
Other values (31)
8870
ValueCountFrequency (%) 
POLICE 13929 41.5%
 
FIRE 4775 14.2%
 
STREETS & SAN 2075 6.2%
 
OEMC 2012 6.0%
 
WATER MGMNT 1925 5.7%
 
AVIATION 1433 4.3%
 
TRANSPORTN 1214 3.6%
 
PUBLIC LIBRARY 1080 3.2%
 
GENERAL SERVICES 953 2.8%
 
FAMILY & SUPPORT 622 1.9%
 
Other values (26) 3568 10.6%
 

Length

Max length21
Mean length7.645000893
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 22 84.6%
 
Other_Punctuation 2 7.7%
 
Space_Separator 1 3.8%
 
Lowercase_Letter 1 3.8%
 
ValueCountFrequency (%) 
Latin 23 88.5%
 
Common 3 11.5%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

Full or Part-Time
Categorical

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size262.5 KiB
F
31603
P
 
1983
ValueCountFrequency (%) 
F 31603 94.1%
 
P 1983 5.9%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 2 100.0%
 
ValueCountFrequency (%) 
Latin 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

Salary or Hourly
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size262.5 KiB
Salary
25638
Hourly
7948
ValueCountFrequency (%) 
Salary 25638 76.3%
 
Hourly 7948 23.7%
 

Length

Max length6
Mean length6
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 6 75.0%
 
Uppercase_Letter 2 25.0%
 
ValueCountFrequency (%) 
Latin 8 100.0%
 
ValueCountFrequency (%) 
ASCII 8 100.0%
 

Typical Hours
Categorical

HIGH CORRELATION
MISSING
Distinct count4
Unique (%)0.1%
Missing25638
Missing (%)76.3%
Memory size262.5 KiB
40
5807
20
1971
35
 
167
10
 
3
ValueCountFrequency (%) 
40 5807 17.3%
 
20 1971 5.9%
 
35 167 0.5%
 
10 3 < 0.1%
 
(Missing) 25638 76.3%
 

Length

Max length4
Mean length3.236646222
Min length3
ValueCountFrequency (%) 
Decimal_Number 6 66.7%
 
Lowercase_Letter 2 22.2%
 
Other_Punctuation 1 11.1%
 
ValueCountFrequency (%) 
Common 7 77.8%
 
Latin 2 22.2%
 
ValueCountFrequency (%) 
ASCII 9 100.0%
 

Annual Salary
Real number (ℝ≥0)

MISSING
Distinct count914
Unique (%)3.6%
Missing7948
Missing (%)23.7%
Infinite0
Infinite (%)0.0%
Mean87845.37224432483
Minimum8400.0
Maximum275004.0
Zeros0
Zeros (%)0.0%
Memory size262.5 KiB

Quantile statistics

Minimum8400
5-th percentile48078
Q175408
median90024
Q397440
95-th percentile121818
Maximum275004
Range266604
Interquartile range (IQR)22032

Descriptive statistics

Standard deviation20827.71411
Coefficient of variation (CV)0.2370951773
Kurtosis1.847945506
Mean87845.37224
Median Absolute Deviation (MAD)15322.51063
Skewness0.3214555987
Sum2252179654
Variance433793674.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
90024 1772 5.3%
 
93354 1426 4.2%
 
87006 1379 4.1%
 
84054 1343 4.0%
 
72510 1149 3.4%
 
76266 856 2.5%
 
48078 743 2.2%
 
96060 721 2.1%
 
68616 692 2.1%
 
92274 492 1.5%
 
Other values (904) 15065 44.9%
 
(Missing) 7948 23.7%
 
ValueCountFrequency (%) 
8400 1 < 0.1%
 
12840 1 < 0.1%
 
14400 1 < 0.1%
 
20400 1 < 0.1%
 
20568 1 < 0.1%
 
ValueCountFrequency (%) 
275004 1 < 0.1%
 
260004 1 < 0.1%
 
216210 1 < 0.1%
 
202728 1 < 0.1%
 
197736 1 < 0.1%
 

Hourly Rate
Real number (ℝ≥0)

MISSING
Distinct count177
Unique (%)2.2%
Missing25638
Missing (%)76.3%
Infinite0
Infinite (%)0.0%
Mean34.11270130850528
Minimum2.65
Maximum128.0
Zeros0
Zeros (%)0.0%
Memory size262.5 KiB

Quantile statistics

Minimum2.65
5-th percentile13.11
Q121.73
median36.45
Q344.085
95-th percentile51.85
Maximum128
Range125.35
Interquartile range (IQR)22.355

Descriptive statistics

Standard deviation13.64080214
Coefficient of variation (CV)0.3998745808
Kurtosis-0.6225255281
Mean34.11270131
Median Absolute Deviation (MAD)11.5956499
Skewness-0.3779717266
Sum271127.75
Variance186.0714829
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
36.45 1334 4.0%
 
43.72 816 2.4%
 
37.76 542 1.6%
 
19.86 428 1.3%
 
49.35 309 0.9%
 
2.65 216 0.6%
 
51.1 209 0.6%
 
48.93 201 0.6%
 
51.85 198 0.6%
 
49.26 158 0.5%
 
Other values (167) 3537 10.5%
 
(Missing) 25638 76.3%
 
ValueCountFrequency (%) 
2.65 216 0.6%
 
8.25 42 0.1%
 
9.27 1 < 0.1%
 
9.46 2 < 0.1%
 
9.5 1 < 0.1%
 
ValueCountFrequency (%) 
128 1 < 0.1%
 
60.44 1 < 0.1%
 
58.4 13 < 0.1%
 
57.04 69 0.2%
 
55.1 12 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

NameJob TitlesDepartmentFull or Part-TimeSalary or HourlyTypical HoursAnnual SalaryHourly Rate
0AARON, JEFFERY MSERGEANTPOLICEFSalaryNaN101442.0NaN
1AARON, KARINAPOLICE OFFICER (ASSIGNED AS DETECTIVE)POLICEFSalaryNaN94122.0NaN
2AARON, KIMBERLEI RCHIEF CONTRACT EXPEDITERGENERAL SERVICESFSalaryNaN111024.0NaN
3ABAD JR, VICENTE MCIVIL ENGINEER IVWATER MGMNTFSalaryNaN114780.0NaN
4ABARCA, EMMANUELCONCRETE LABORERTRANSPORTNFHourly40.0NaN43.72
5ABARCA, FRANCES JPOLICE OFFICERPOLICEFSalaryNaN48078.0NaN
6ABASCAL, REECE ETRAFFIC CONTROL AIDE-HOURLYOEMCPHourly20.0NaN19.86
7ABBATACOLA, ROBERT JELECTRICAL MECHANICAVIATIONFHourly40.0NaN49.35
8ABBATEMARCO, JAMES JFIRE ENGINEER-EMTFIREFSalaryNaN103350.0NaN
9ABBATE, TERRY MPOLICE OFFICERPOLICEFSalaryNaN93354.0NaN

Last rows

NameJob TitlesDepartmentFull or Part-TimeSalary or HourlyTypical HoursAnnual SalaryHourly Rate
33576ZYDEK, BRYANPOLICE OFFICERPOLICEFSalaryNaN87006.0NaN
33577ZYGADLO, MICHAEL JFRM OF MACHINISTS - AUTOMOTIVEGENERAL SERVICESFHourly40.0NaN51.43
33578ZYGMUNT, ARTURPOLICE OFFICERPOLICEFSalaryNaN72510.0NaN
33579ZYGMUNT, DAWIDPOLICE OFFICERPOLICEFSalaryNaN76266.0NaN
33580ZYLINSKA, KATARZYNAPOLICE OFFICERPOLICEFSalaryNaN76266.0NaN
33581ZYLINSKA, KLAUDIAPOLICE OFFICERPOLICEFSalaryNaN68616.0NaN
33582ZYMANTAS, LAURA CPOLICE OFFICERPOLICEFSalaryNaN72510.0NaN
33583ZYMANTAS, MARK EPOLICE OFFICERPOLICEFSalaryNaN90024.0NaN
33584ZYRKOWSKI, CARLO EPOLICE OFFICERPOLICEFSalaryNaN93354.0NaN
33585ZYSKOWSKI, DARIUSZCHIEF DATA BASE ANALYSTDoITFSalaryNaN119412.0NaN