Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory235.7 KiB
Average record size in memory241.4 B

Variable types

CAT3

Reproduction

Analysis started2020-02-14 00:01:11.687667
Analysis finished2020-02-14 00:01:12.883221
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
russian has a high cardinality: 995 distinct values High cardinality
english has a high cardinality: 961 distinct values High cardinality

Variables

russian
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count995
Unique (%)99.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
знать
 
2
мало
 
2
пора
 
2
много
 
2
что
 
2
Other values (990)
990
ValueCountFrequency (%) 
знать 2 0.2%
 
мало 2 0.2%
 
пора 2 0.2%
 
много 2 0.2%
 
что 2 0.2%
 
тяжёлый 1 0.1%
 
улица 1 0.1%
 
готовый 1 0.1%
 
заниматься 1 0.1%
 
зато 1 0.1%
 
Other values (985) 985 98.5%
 

Length

Max length19
Mean length6.117
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 34 79.1%
 
Uppercase_Letter 3 7.0%
 
Decimal_Number 2 4.7%
 
Close_Punctuation 1 2.3%
 
Space_Separator 1 2.3%
 
Open_Punctuation 1 2.3%
 
Other_Punctuation 1 2.3%
 
ValueCountFrequency (%) 
Cyrillic 35 81.4%
 
Common 6 14.0%
 
Latin 2 4.7%
 
ValueCountFrequency (%) 
Cyrillic 35 81.4%
 
ASCII 8 18.6%
 

english
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count961
Unique (%)96.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
to fit, fall; have to
 
3
to ask
 
3
Russian
 
2
before, in front of
 
2
to put, place, set
 
2
Other values (956)
988
ValueCountFrequency (%) 
to fit, fall; have to 3 0.3%
 
to ask 3 0.3%
 
Russian 2 0.2%
 
before, in front of 2 0.2%
 
to put, place, set 2 0.2%
 
night 2 0.2%
 
Moscow 2 0.2%
 
glass 2 0.2%
 
or 2 0.2%
 
to send 2 0.2%
 
Other values (951) 978 97.8%
 

Length

Max length130
Mean length13.169
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 44 58.7%
 
Decimal_Number 10 13.3%
 
Other_Punctuation 10 13.3%
 
Uppercase_Letter 6 8.0%
 
Close_Punctuation 1 1.3%
 
Space_Separator 1 1.3%
 
Nonspacing_Mark 1 1.3%
 
Open_Punctuation 1 1.3%
 
Dash_Punctuation 1 1.3%
 
ValueCountFrequency (%) 
Latin 32 42.7%
 
Common 24 32.0%
 
Cyrillic 18 24.0%
 
Inherited 1 1.3%
 
ValueCountFrequency (%) 
ASCII 55 73.3%
 
Cyrillic 18 24.0%
 
Punctuation 1 1.3%
 
Diacriticals 1 1.3%
 

part of speech
Categorical

Distinct count37
Unique (%)3.7%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
noun
374
verb
232
adjective
127
adverb
112
preposition
 
37
Other values (32)
118
ValueCountFrequency (%) 
noun 374 37.4%
 
verb 232 23.2%
 
adjective 127 12.7%
 
adverb 112 11.2%
 
preposition 37 3.7%
 
pronoun 36 3.6%
 
conjunction 12 1.2%
 
misc 12 1.2%
 
cardinal number 11 1.1%
 
particle 7 0.7%
 
Other values (27) 40 4.0%
 

Length

Max length26
Mean length5.885
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 20 83.3%
 
Open_Punctuation 1 4.2%
 
Close_Punctuation 1 4.2%
 
Other_Punctuation 1 4.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 19 79.2%
 
Common 4 16.7%
 
Cyrillic 1 4.2%
 
ValueCountFrequency (%) 
ASCII 23 95.8%
 
Cyrillic 1 4.2%
 

Missing values

Sample

First rows

russianenglishpart of speech
0иand, thoughconjunction
1вin, atpreposition
2неnotparticle
3онhepronoun
4наon, it, at, topreposition
5яIpronoun
6чтоwhat, that, whyсonjunction, pronoun
7тотthatadjective, pronoun
8бытьto beverb
9сwith, and, from, ofpreposition

Last rows

russianenglishpart of speech
990художникpainter, artistnoun
991знакsignnoun
992заводfactorynoun
993кулакfistnoun
994использоватьto use, utilize, make use ofverb
995стаканglassnoun
996пахнутьto smellverb
997отсюдаfrom hereadverb
998ротmouthnoun
999пораit's time;at times, now and then(See #279)misc