Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 33586 |
Missing cells | 59224 |
Missing cells (%) | 22.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 11.6 MiB |
Average record size in memory | 363.4 B |
Variable types
CAT | 6 |
---|---|
NUM | 2 |
Reproduction
Analysis started | 2020-02-13 23:57:25.183377 |
---|---|
Analysis finished | 2020-02-13 23:57:40.418888 |
Version | pandas-profiling v2.5.0 |
Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
Download configuration | config.yaml |
Name has a high cardinality: 33264 distinct values | High cardinality |
Job Titles has a high cardinality: 1111 distinct values | High cardinality |
Typical Hours is highly correlated with Full or Part-Time | High Correlation |
Full or Part-Time is highly correlated with Typical Hours | High Correlation |
Typical Hours has 25638 (76.3%) missing values | Missing |
Annual Salary has 7948 (23.7%) missing values | Missing |
Hourly Rate has 25638 (76.3%) missing values | Missing |
Distinct count | 33264 |
---|---|
Unique (%) | 99.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.5 KiB |
PEREZ, JOSE A | 4 |
---|---|
RIVERA, RICARDO | 4 |
HERNANDEZ, JUAN C | 4 |
LOPEZ, ROBERT | 4 |
ROMERO, MIGUEL A | 4 |
Other values (33259) |
Value | Count | Frequency (%) | |
PEREZ, JOSE A | 4 | < 0.1% | |
RIVERA, RICARDO | 4 | < 0.1% | |
HERNANDEZ, JUAN C | 4 | < 0.1% | |
LOPEZ, ROBERT | 4 | < 0.1% | |
ROMERO, MIGUEL A | 4 | < 0.1% | |
JONES, DANIEL | 3 | < 0.1% | |
HERNANDEZ, RUBEN | 3 | < 0.1% | |
RODRIGUEZ, RICHARD | 3 | < 0.1% | |
JOHNSON, NICHOLAS | 3 | < 0.1% | |
RODRIGUEZ, JOSE | 3 | < 0.1% | |
Other values (33254) | 33551 | 99.9% |
Length
Max length | 31 |
---|---|
Mean length | 17.20067885 |
Min length | 8 |
Value | Count | Frequency (%) | |
Uppercase_Letter | 26 | 66.7% | |
Lowercase_Letter | 6 | 15.4% | |
Other_Punctuation | 3 | 7.7% | |
Close_Punctuation | 1 | 2.6% | |
Space_Separator | 1 | 2.6% | |
Dash_Punctuation | 1 | 2.6% | |
Open_Punctuation | 1 | 2.6% |
Value | Count | Frequency (%) | |
Latin | 32 | 82.1% | |
Common | 7 | 17.9% |
Value | Count | Frequency (%) | |
ASCII | 39 | 100.0% |
Distinct count | 1111 |
---|---|
Unique (%) | 3.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.5 KiB |
POLICE OFFICER | |
---|---|
FIREFIGHTER-EMT | 1313 |
SERGEANT | 1306 |
POLICE OFFICER (ASSIGNED AS DETECTIVE) | 1081 |
MOTOR TRUCK DRIVER | 1057 |
Other values (1106) |
Value | Count | Frequency (%) | |
POLICE OFFICER | 9796 | 29.2% | |
FIREFIGHTER-EMT | 1313 | 3.9% | |
SERGEANT | 1306 | 3.9% | |
POLICE OFFICER (ASSIGNED AS DETECTIVE) | 1081 | 3.2% | |
MOTOR TRUCK DRIVER | 1057 | 3.1% | |
SANITATION LABORER | 557 | 1.7% | |
FIREFIGHTER-EMT (RECRUIT) | 529 | 1.6% | |
CONSTRUCTION LABORER | 457 | 1.4% | |
TRAFFIC CONTROL AIDE-HOURLY | 428 | 1.3% | |
LIEUTENANT-EMT | 408 | 1.2% | |
Other values (1101) | 16654 | 49.6% |
Length
Max length | 50 |
---|---|
Mean length | 18.50726493 |
Min length | 5 |
Value | Count | Frequency (%) | |
Uppercase_Letter | 26 | 68.4% | |
Other_Punctuation | 4 | 10.5% | |
Decimal_Number | 3 | 7.9% | |
Close_Punctuation | 1 | 2.6% | |
Dash_Punctuation | 1 | 2.6% | |
Space_Separator | 1 | 2.6% | |
Lowercase_Letter | 1 | 2.6% | |
Open_Punctuation | 1 | 2.6% |
Value | Count | Frequency (%) | |
Latin | 27 | 71.1% | |
Common | 11 | 28.9% |
Value | Count | Frequency (%) | |
ASCII | 38 | 100.0% |
Department
Categorical
Distinct count | 36 |
---|---|
Unique (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.5 KiB |
POLICE | |
---|---|
FIRE | |
STREETS & SAN | 2075 |
OEMC | 2012 |
WATER MGMNT | 1925 |
Other values (31) |
Value | Count | Frequency (%) | |
POLICE | 13929 | 41.5% | |
FIRE | 4775 | 14.2% | |
STREETS & SAN | 2075 | 6.2% | |
OEMC | 2012 | 6.0% | |
WATER MGMNT | 1925 | 5.7% | |
AVIATION | 1433 | 4.3% | |
TRANSPORTN | 1214 | 3.6% | |
PUBLIC LIBRARY | 1080 | 3.2% | |
GENERAL SERVICES | 953 | 2.8% | |
FAMILY & SUPPORT | 622 | 1.9% | |
Other values (26) | 3568 | 10.6% |
Length
Max length | 21 |
---|---|
Mean length | 7.645000893 |
Min length | 3 |
Value | Count | Frequency (%) | |
Uppercase_Letter | 22 | 84.6% | |
Other_Punctuation | 2 | 7.7% | |
Space_Separator | 1 | 3.8% | |
Lowercase_Letter | 1 | 3.8% |
Value | Count | Frequency (%) | |
Latin | 23 | 88.5% | |
Common | 3 | 11.5% |
Value | Count | Frequency (%) | |
ASCII | 26 | 100.0% |
Distinct count | 2 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.5 KiB |
F | |
---|---|
P | 1983 |
Value | Count | Frequency (%) | |
F | 31603 | 94.1% | |
P | 1983 | 5.9% |
Length
Max length | 1 |
---|---|
Mean length | 1 |
Min length | 1 |
Value | Count | Frequency (%) | |
Uppercase_Letter | 2 | 100.0% |
Value | Count | Frequency (%) | |
Latin | 2 | 100.0% |
Value | Count | Frequency (%) | |
ASCII | 2 | 100.0% |
Salary or Hourly
Categorical
Distinct count | 2 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 262.5 KiB |
Salary | |
---|---|
Hourly |
Value | Count | Frequency (%) | |
Salary | 25638 | 76.3% | |
Hourly | 7948 | 23.7% |
Length
Max length | 6 |
---|---|
Mean length | 6 |
Min length | 6 |
Value | Count | Frequency (%) | |
Lowercase_Letter | 6 | 75.0% | |
Uppercase_Letter | 2 | 25.0% |
Value | Count | Frequency (%) | |
Latin | 8 | 100.0% |
Value | Count | Frequency (%) | |
ASCII | 8 | 100.0% |
Distinct count | 4 |
---|---|
Unique (%) | 0.1% |
Missing | 25638 |
Missing (%) | 76.3% |
Memory size | 262.5 KiB |
40 | |
---|---|
20 | |
35 | 167 |
10 | 3 |
Value | Count | Frequency (%) | |
40 | 5807 | 17.3% | |
20 | 1971 | 5.9% | |
35 | 167 | 0.5% | |
10 | 3 | < 0.1% | |
(Missing) | 25638 | 76.3% |
Length
Max length | 4 |
---|---|
Mean length | 3.236646222 |
Min length | 3 |
Value | Count | Frequency (%) | |
Decimal_Number | 6 | 66.7% | |
Lowercase_Letter | 2 | 22.2% | |
Other_Punctuation | 1 | 11.1% |
Value | Count | Frequency (%) | |
Common | 7 | 77.8% | |
Latin | 2 | 22.2% |
Value | Count | Frequency (%) | |
ASCII | 9 | 100.0% |
Distinct count | 914 |
---|---|
Unique (%) | 3.6% |
Missing | 7948 |
Missing (%) | 23.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 87845.37224432483 |
---|---|
Minimum | 8400.0 |
Maximum | 275004.0 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 262.5 KiB |
Quantile statistics
Minimum | 8400 |
---|---|
5-th percentile | 48078 |
Q1 | 75408 |
median | 90024 |
Q3 | 97440 |
95-th percentile | 121818 |
Maximum | 275004 |
Range | 266604 |
Interquartile range (IQR) | 22032 |
Descriptive statistics
Standard deviation | 20827.71411 |
---|---|
Coefficient of variation (CV) | 0.2370951773 |
Kurtosis | 1.847945506 |
Mean | 87845.37224 |
Median Absolute Deviation (MAD) | 15322.51063 |
Skewness | 0.3214555987 |
Sum | 2252179654 |
Variance | 433793674.9 |
Histogram with fixed size bins (bins=10)
Value | Count | Frequency (%) | |
90024 | 1772 | 5.3% | |
93354 | 1426 | 4.2% | |
87006 | 1379 | 4.1% | |
84054 | 1343 | 4.0% | |
72510 | 1149 | 3.4% | |
76266 | 856 | 2.5% | |
48078 | 743 | 2.2% | |
96060 | 721 | 2.1% | |
68616 | 692 | 2.1% | |
92274 | 492 | 1.5% | |
Other values (904) | 15065 | 44.9% | |
(Missing) | 7948 | 23.7% |
Value | Count | Frequency (%) | |
8400 | 1 | < 0.1% | |
12840 | 1 | < 0.1% | |
14400 | 1 | < 0.1% | |
20400 | 1 | < 0.1% | |
20568 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
275004 | 1 | < 0.1% | |
260004 | 1 | < 0.1% | |
216210 | 1 | < 0.1% | |
202728 | 1 | < 0.1% | |
197736 | 1 | < 0.1% |
Distinct count | 177 |
---|---|
Unique (%) | 2.2% |
Missing | 25638 |
Missing (%) | 76.3% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 34.11270130850528 |
---|---|
Minimum | 2.65 |
Maximum | 128.0 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 262.5 KiB |
Quantile statistics
Minimum | 2.65 |
---|---|
5-th percentile | 13.11 |
Q1 | 21.73 |
median | 36.45 |
Q3 | 44.085 |
95-th percentile | 51.85 |
Maximum | 128 |
Range | 125.35 |
Interquartile range (IQR) | 22.355 |
Descriptive statistics
Standard deviation | 13.64080214 |
---|---|
Coefficient of variation (CV) | 0.3998745808 |
Kurtosis | -0.6225255281 |
Mean | 34.11270131 |
Median Absolute Deviation (MAD) | 11.5956499 |
Skewness | -0.3779717266 |
Sum | 271127.75 |
Variance | 186.0714829 |
Histogram with fixed size bins (bins=10)
Value | Count | Frequency (%) | |
36.45 | 1334 | 4.0% | |
43.72 | 816 | 2.4% | |
37.76 | 542 | 1.6% | |
19.86 | 428 | 1.3% | |
49.35 | 309 | 0.9% | |
2.65 | 216 | 0.6% | |
51.1 | 209 | 0.6% | |
48.93 | 201 | 0.6% | |
51.85 | 198 | 0.6% | |
49.26 | 158 | 0.5% | |
Other values (167) | 3537 | 10.5% | |
(Missing) | 25638 | 76.3% |
Value | Count | Frequency (%) | |
2.65 | 216 | 0.6% | |
8.25 | 42 | 0.1% | |
9.27 | 1 | < 0.1% | |
9.46 | 2 | < 0.1% | |
9.5 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
128 | 1 | < 0.1% | |
60.44 | 1 | < 0.1% | |
58.4 | 13 | < 0.1% | |
57.04 | 69 | 0.2% | |
55.1 | 12 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
Name | Job Titles | Department | Full or Part-Time | Salary or Hourly | Typical Hours | Annual Salary | Hourly Rate | |
---|---|---|---|---|---|---|---|---|
0 | AARON, JEFFERY M | SERGEANT | POLICE | F | Salary | NaN | 101442.0 | NaN |
1 | AARON, KARINA | POLICE OFFICER (ASSIGNED AS DETECTIVE) | POLICE | F | Salary | NaN | 94122.0 | NaN |
2 | AARON, KIMBERLEI R | CHIEF CONTRACT EXPEDITER | GENERAL SERVICES | F | Salary | NaN | 111024.0 | NaN |
3 | ABAD JR, VICENTE M | CIVIL ENGINEER IV | WATER MGMNT | F | Salary | NaN | 114780.0 | NaN |
4 | ABARCA, EMMANUEL | CONCRETE LABORER | TRANSPORTN | F | Hourly | 40.0 | NaN | 43.72 |
5 | ABARCA, FRANCES J | POLICE OFFICER | POLICE | F | Salary | NaN | 48078.0 | NaN |
6 | ABASCAL, REECE E | TRAFFIC CONTROL AIDE-HOURLY | OEMC | P | Hourly | 20.0 | NaN | 19.86 |
7 | ABBATACOLA, ROBERT J | ELECTRICAL MECHANIC | AVIATION | F | Hourly | 40.0 | NaN | 49.35 |
8 | ABBATEMARCO, JAMES J | FIRE ENGINEER-EMT | FIRE | F | Salary | NaN | 103350.0 | NaN |
9 | ABBATE, TERRY M | POLICE OFFICER | POLICE | F | Salary | NaN | 93354.0 | NaN |
Last rows
Name | Job Titles | Department | Full or Part-Time | Salary or Hourly | Typical Hours | Annual Salary | Hourly Rate | |
---|---|---|---|---|---|---|---|---|
33576 | ZYDEK, BRYAN | POLICE OFFICER | POLICE | F | Salary | NaN | 87006.0 | NaN |
33577 | ZYGADLO, MICHAEL J | FRM OF MACHINISTS - AUTOMOTIVE | GENERAL SERVICES | F | Hourly | 40.0 | NaN | 51.43 |
33578 | ZYGMUNT, ARTUR | POLICE OFFICER | POLICE | F | Salary | NaN | 72510.0 | NaN |
33579 | ZYGMUNT, DAWID | POLICE OFFICER | POLICE | F | Salary | NaN | 76266.0 | NaN |
33580 | ZYLINSKA, KATARZYNA | POLICE OFFICER | POLICE | F | Salary | NaN | 76266.0 | NaN |
33581 | ZYLINSKA, KLAUDIA | POLICE OFFICER | POLICE | F | Salary | NaN | 68616.0 | NaN |
33582 | ZYMANTAS, LAURA C | POLICE OFFICER | POLICE | F | Salary | NaN | 72510.0 | NaN |
33583 | ZYMANTAS, MARK E | POLICE OFFICER | POLICE | F | Salary | NaN | 90024.0 | NaN |
33584 | ZYRKOWSKI, CARLO E | POLICE OFFICER | POLICE | F | Salary | NaN | 93354.0 | NaN |
33585 | ZYSKOWSKI, DARIUSZ | CHIEF DATA BASE ANALYST | DoIT | F | Salary | NaN | 119412.0 | NaN |