Overview

Dataset info

Number of variables14
Number of observations32561
Missing cells4262 (0.9%)
Duplicate rows25 (0.1%)
Total size in memory3.5 MiB
Average record size in memory112.0 B

Variables types

Numeric6
Categorical8
Boolean0
Date0
URL0
Text (Unique)0
Rejected0
Unsupported0

Warnings

Dataset has 25 (0.1%) duplicate rows Warning
capital-gain has 29849 (91.7%) zeros Zeros
capital-loss has 31042 (95.3%) zeros Zeros
native-country has 583 (1.8%) missing values Missing
occupation has 1843 (5.7%) missing values Missing
workclass has 1836 (5.6%) missing values Missing

Variables

age
Numeric

Distinct count73
Unique (%)0.2%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean38.58164676
Minimum17
Maximum90
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum17
5-th percentile19
Q128
Median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range20

Descriptive statistics

Standard deviation13.64043255
Coef of variation0.3535471837
Kurtosis-0.1661274596
Mean38.58164676
MAD11.18918162
Skewness0.5587433694
Sum1256257
Variance186.0614002
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[17. 17.5 18.5 22.5 41.5 ... 76.5 81.5 84.5 89. 90. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
36 898 2.8%
 
31 888 2.7%
 
34 886 2.7%
 
23 877 2.7%
 
35 876 2.7%
 
33 875 2.7%
 
28 867 2.7%
 
30 861 2.6%
 
37 858 2.6%
 
25 841 2.6%
 
Other values (63) 23834 73.2%
 

Minimum 5 values

ValueCountFrequency (%) 
17 395 1.2%
 
18 550 1.7%
 
19 712 2.2%
 
20 753 2.3%
 
21 720 2.2%
 

Maximum 5 values

ValueCountFrequency (%) 
90 43 0.1%
 
88 3 < 0.1%
 
87 1 < 0.1%
 
86 1 < 0.1%
 
85 3 < 0.1%
 

capital-gain
Numeric

Distinct count119
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1077.648844
Minimum0
Maximum99999
Zeros (%)91.7%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range0

Descriptive statistics

Standard deviation7385.292085
Coef of variation6.853152702
Kurtosis154.7994379
Mean1077.648844
MAD1977.373437
Skewness11.95384769
Sum35089324
Variance54542539.18
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0.00000e+00 5.70000e+01 4.97500e+02 7.54000e+02 1.02300e+03 ... 2.51800e+04 3.09615e+04 3.77025e+04 7.06545e+04 9.99990e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 29849 91.7%
 
15024 347 1.1%
 
7688 284 0.9%
 
7298 246 0.8%
 
99999 159 0.5%
 
5178 97 0.3%
 
3103 97 0.3%
 
4386 70 0.2%
 
5013 69 0.2%
 
8614 55 0.2%
 
Other values (109) 1288 4.0%
 

Minimum 5 values

ValueCountFrequency (%) 
0 29849 91.7%
 
114 6 < 0.1%
 
401 2 < 0.1%
 
594 34 0.1%
 
914 8 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
99999 159 0.5%
 
41310 2 < 0.1%
 
34095 5 < 0.1%
 
27828 34 0.1%
 
25236 11 < 0.1%
 

capital-loss
Numeric

Distinct count92
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean87.30382973
Minimum0
Maximum4356
Zeros (%)95.3%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q10
Median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range0

Descriptive statistics

Standard deviation402.9602186
Coef of variation4.615607584
Kurtosis20.37680171
Mean87.30382973
MAD166.4620548
Skewness4.594629122
Sum2842700
Variance162376.9378
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 77.5 1299. 1394. 1409.5 ... 2462. 2553. 2581. 2914. 4356. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 31042 95.3%
 
1902 202 0.6%
 
1977 168 0.5%
 
1887 159 0.5%
 
1848 51 0.2%
 
1485 51 0.2%
 
2415 49 0.2%
 
1602 47 0.1%
 
1740 42 0.1%
 
1590 40 0.1%
 
Other values (82) 710 2.2%
 

Minimum 5 values

ValueCountFrequency (%) 
0 31042 95.3%
 
155 1 < 0.1%
 
213 4 < 0.1%
 
323 3 < 0.1%
 
419 3 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
4356 3 < 0.1%
 
3900 2 < 0.1%
 
3770 2 < 0.1%
 
3683 2 < 0.1%
 
3004 2 < 0.1%
 

education
Categorical

Distinct count16
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
HS-grad
10501
Some-college
7291
Bachelors
5355
Other values (13)
9414
ValueCountFrequency (%) 
HS-grad 10501 32.3%
 
Some-college 7291 22.4%
 
Bachelors 5355 16.4%
 
Masters 1723 5.3%
 
Assoc-voc 1382 4.2%
 
11th 1175 3.6%
 
Assoc-acdm 1067 3.3%
 
10th 933 2.9%
 
7th-8th 646 2.0%
 
Prof-school 576 1.8%
 
Other values (6) 1912 5.9%
 
Max length13
Mean length9.433709038
Min length4
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

education-num
Numeric

Distinct count16
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean10.08067934
Minimum1
Maximum16
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1
5-th percentile5
Q19
Median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range3

Descriptive statistics

Standard deviation2.572720332
Coef of variation0.2552129916
Kurtosis0.6234440748
Mean10.08067934
MAD1.90304819
Skewness-0.3116758679
Sum328237
Variance6.618889907
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
9 10501 32.3%
 
10 7291 22.4%
 
13 5355 16.4%
 
14 1723 5.3%
 
11 1382 4.2%
 
7 1175 3.6%
 
12 1067 3.3%
 
6 933 2.9%
 
4 646 2.0%
 
15 576 1.8%
 
Other values (6) 1912 5.9%
 

Minimum 5 values

ValueCountFrequency (%) 
1 51 0.2%
 
2 168 0.5%
 
3 333 1.0%
 
4 646 2.0%
 
5 514 1.6%
 

Maximum 5 values

ValueCountFrequency (%) 
16 413 1.3%
 
15 576 1.8%
 
14 1723 5.3%
 
13 5355 16.4%
 
12 1067 3.3%
 

fnlwgt
Numeric

Distinct count21648
Unique (%)66.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean189778.3665
Minimum12285
Maximum1484705
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum12285
5-th percentile39460
Q1117827
Median178356
Q3237051
95-th percentile379682
Maximum1484705
Range1472420
Interquartile range119224

Descriptive statistics

Standard deviation105549.9777
Coef of variation0.5561749721
Kurtosis6.218810978
Mean189778.3665
MAD77608.21854
Skewness1.446980095
Sum6179373392
Variance1.114079779e+10
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 12285. 19258. 22154.5 26644.5 29808.5 ... 456939. 511885.5 610482. 766759. 1484705. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
164190 13 < 0.1%
 
203488 13 < 0.1%
 
123011 13 < 0.1%
 
113364 12 < 0.1%
 
121124 12 < 0.1%
 
126675 12 < 0.1%
 
148995 12 < 0.1%
 
123983 11 < 0.1%
 
190290 11 < 0.1%
 
126569 11 < 0.1%
 
Other values (21638) 32441 99.6%
 

Minimum 5 values

ValueCountFrequency (%) 
12285 1 < 0.1%
 
13769 1 < 0.1%
 
14878 1 < 0.1%
 
18827 1 < 0.1%
 
19214 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1484705 1 < 0.1%
 
1455435 1 < 0.1%
 
1366120 1 < 0.1%
 
1268339 1 < 0.1%
 
1226583 1 < 0.1%
 

hours-per-week
Numeric

Distinct count94
Unique (%)0.3%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean40.43745585
Minimum1
Maximum99
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1
5-th percentile18
Q140
Median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range5

Descriptive statistics

Standard deviation12.34742868
Coef of variation0.3053463286
Kurtosis2.916686796
Mean40.43745585
MAD7.58322751
Skewness0.2276425368
Sum1316684
Variance152.4589951
Memory size254.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 1. 3.5 6.5 7.5 8.5 ... 89.5 90.5 97.5 98.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
40 15217 46.7%
 
50 2819 8.7%
 
45 1824 5.6%
 
60 1475 4.5%
 
35 1297 4.0%
 
20 1224 3.8%
 
30 1149 3.5%
 
55 694 2.1%
 
25 674 2.1%
 
48 517 1.6%
 
Other values (84) 5671 17.4%
 

Minimum 5 values

ValueCountFrequency (%) 
1 20 0.1%
 
2 32 0.1%
 
3 39 0.1%
 
4 54 0.2%
 
5 60 0.2%
 

Maximum 5 values

ValueCountFrequency (%) 
99 85 0.3%
 
98 11 < 0.1%
 
97 2 < 0.1%
 
96 5 < 0.1%
 
95 2 < 0.1%
 

marital-status
Categorical

Distinct count7
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Married-civ-spouse
14976
Never-married
10683
Divorced
4443
Other values (4)
 
2459
ValueCountFrequency (%) 
Married-civ-spouse 14976 46.0%
 
Never-married 10683 32.8%
 
Divorced 4443 13.6%
 
Separated 1025 3.1%
 
Widowed 993 3.0%
 
Married-spouse-absent 418 1.3%
 
Married-AF-spouse 23 0.1%
 
Max length22
Mean length15.41405362
Min length8
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

native-country
Categorical

Distinct count42
Unique (%)0.1%
Missing (%)1.8%
Missing (n)583
United-States
29170
Mexico
 
643
Philippines
 
198
Other values (38)
 
1967
(Missing)
 
583
ValueCountFrequency (%) 
United-States 29170 89.6%
 
Mexico 643 2.0%
 
Philippines 198 0.6%
 
Germany 137 0.4%
 
Canada 121 0.4%
 
Puerto-Rico 114 0.4%
 
El-Salvador 106 0.3%
 
India 100 0.3%
 
Cuba 95 0.3%
 
England 90 0.3%
 
Other values (31) 1204 3.7%
 
(Missing) 583 1.8%
 
Max length27
Mean length13.31175332
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

occupation
Categorical

Distinct count15
Unique (%)< 0.1%
Missing (%)5.7%
Missing (n)1843
Prof-specialty
4140
Craft-repair
4099
Exec-managerial
4066
Other values (11)
18413
ValueCountFrequency (%) 
Prof-specialty 4140 12.7%
 
Craft-repair 4099 12.6%
 
Exec-managerial 4066 12.5%
 
Adm-clerical 3770 11.6%
 
Sales 3650 11.2%
 
Other-service 3295 10.1%
 
Machine-op-inspct 2002 6.1%
 
Transport-moving 1597 4.9%
 
Handlers-cleaners 1370 4.2%
 
Farming-fishing 994 3.1%
 
Other values (4) 1735 5.3%
 
(Missing) 1843 5.7%
 
Max length18
Mean length13.25849943
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

race
Categorical

Distinct count5
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
White
27816
Black
 
3124
Asian-Pac-Islander
 
1039
Other values (2)
 
582
ValueCountFrequency (%) 
White 27816 85.4%
 
Black 3124 9.6%
 
Asian-Pac-Islander 1039 3.2%
 
Amer-Indian-Eskimo 311 1.0%
 
Other 271 0.8%
 
Max length19
Mean length6.53898836
Min length6
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

relationship
Categorical

Distinct count6
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Husband
13193
Not-in-family
8305
Own-child
5068
Other values (3)
5995
ValueCountFrequency (%) 
Husband 13193 40.5%
 
Not-in-family 8305 25.5%
 
Own-child 5068 15.6%
 
Unmarried 3446 10.6%
 
Wife 1568 4.8%
 
Other-relative 981 3.0%
 
Max length15
Mean length10.11974448
Min length5
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

sex
Categorical

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Male
21790
Female
10771
ValueCountFrequency (%) 
Male 21790 66.9%
 
Female 10771 33.1%
 
Max length7
Mean length5.661589018
Min length5
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

workclass
Categorical

Distinct count9
Unique (%)< 0.1%
Missing (%)5.6%
Missing (n)1836
Private
22696
Self-emp-not-inc
 
2541
Local-gov
 
2093
Other values (5)
 
3395
(Missing)
 
1836
ValueCountFrequency (%) 
Private 22696 69.7%
 
Self-emp-not-inc 2541 7.8%
 
Local-gov 2093 6.4%
 
State-gov 1298 4.0%
 
Self-emp-inc 1116 3.4%
 
Federal-gov 960 2.9%
 
Without-pay 14 < 0.1%
 
Never-worked 7 < 0.1%
 
(Missing) 1836 5.6%
 
Max length17
Mean length8.920794816
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

Correlations

Missing values

Sample

First rows

agecapital-gaincapital-losseducationeducation-numfnlwgthours-per-weekmarital-statusnative-countryoccupationracerelationshipsexworkclass
03921740Bachelors137751640Never-marriedUnited-StatesAdm-clericalWhiteNot-in-familyMaleState-gov
15000Bachelors138331113Married-civ-spouseUnited-StatesExec-managerialWhiteHusbandMaleSelf-emp-not-inc
23800HS-grad921564640DivorcedUnited-StatesHandlers-cleanersWhiteNot-in-familyMalePrivate
3530011th723472140Married-civ-spouseUnited-StatesHandlers-cleanersBlackHusbandMalePrivate
42800Bachelors1333840940Married-civ-spouseCubaProf-specialtyBlackWifeFemalePrivate
53700Masters1428458240Married-civ-spouseUnited-StatesExec-managerialWhiteWifeFemalePrivate
649009th516018716Married-spouse-absentJamaicaOther-serviceBlackNot-in-familyFemalePrivate
75200HS-grad920964245Married-civ-spouseUnited-StatesExec-managerialWhiteHusbandMaleSelf-emp-not-inc
831140840Masters144578150Never-marriedUnited-StatesProf-specialtyWhiteNot-in-familyFemalePrivate
94251780Bachelors1315944940Married-civ-spouseUnited-StatesExec-managerialWhiteHusbandMalePrivate

Last rows

agecapital-gaincapital-losseducationeducation-numfnlwgthours-per-weekmarital-statusnative-countryoccupationracerelationshipsexworkclass
32551320010th63406640Married-civ-spouseUnited-StatesHandlers-cleanersAmer-Indian-EskimoHusbandMalePrivate
325524300Assoc-voc118466145Married-civ-spouseUnited-StatesSalesWhiteHusbandMalePrivate
325533200Masters1411613811Never-marriedTaiwanTech-supportAsian-Pac-IslanderNot-in-familyMalePrivate
325545300Masters1432186540Married-civ-spouseUnited-StatesExec-managerialWhiteHusbandMalePrivate
325552200Some-college1031015240Never-marriedUnited-StatesProtective-servWhiteNot-in-familyMalePrivate
325562700Assoc-acdm1225730238Married-civ-spouseUnited-StatesTech-supportWhiteWifeFemalePrivate
325574000HS-grad915437440Married-civ-spouseUnited-StatesMachine-op-inspctWhiteHusbandMalePrivate
325585800HS-grad915191040WidowedUnited-StatesAdm-clericalWhiteUnmarriedFemalePrivate
325592200HS-grad920149020Never-marriedUnited-StatesAdm-clericalWhiteOwn-childMalePrivate
3256052150240HS-grad928792740Married-civ-spouseUnited-StatesExec-managerialWhiteWifeFemaleSelf-emp-inc