Summaries example¶
The following example demonstrates the summary of several visions_string
types.
summaries_example.py¶
import pandas as pd
from visions.application.summaries import CompleteSummary
from visions.functional import detect_type
from visions.typesets import CompleteSet
# Create a DataFrame with various string columns
df = pd.DataFrame(
{
"latin": ["orange", "apple", "pear"],
"cyrillic": ["Кириллица", "гласность", "демократија"],
"mixed": ["Кириллица", "soep", "демократија"],
"burmese": ["ရေကြီးခြင်း", "စက်သင်ယူမှု", "ဉာဏ်ရည်တု"],
"digits": ["01234", "121223", "123123"],
"specials": ["$", "%^&*(", "!!!~``"],
"whitespace": ["\t", "\n", " "],
"jiddisch": ["רעכט צו לינקס", "שאָסיי 61", "פּיצאַ איז אָנגענעם"],
"arabic": ["بوب ديلان", "باتي فالنتين", "السيد الدف الرجل"],
"playing_cards": ["🂶", "🃁", "🂻"],
}
)
# Initialize the typeset
typeset = CompleteSet()
# Infer the column type
types = detect_type(df, typeset)
# Generate a summary
summarizer = CompleteSummary()
summary = summarizer.summarize(df, types)
print(
"| {h1: <15}| {h2: <17}| {h3: <84}| {h4: <25}|".format(
h1="Column", h2="Scripts", h3="Categories", h4="Blocks"
)
)
print("{e:-<17}+{e:-<18}+{e:-<85}+{e:-<26}+".format(e=""))
for column, variable_summary in summary["series"].items():
scripts = ", ".join(set(variable_summary["script_values"].values()))
categories = ", ".join(set(variable_summary["category_alias_values"].values()))
blocks = ", ".join(set(variable_summary["block_values"].values()))
print(
"| {column: <15}| {scripts: <17}| {categories: <84}| {blocks: <25}|".format(
column=column, scripts=scripts, categories=categories, blocks=blocks
)
)
Which prints:
| Column | Scripts | Categories | Blocks |
-----------------+------------------+-------------------------------------------------------------------------------------+--------------------------+
| latin | Latin | Lowercase_Letter | Basic Latin |
| cyrillic | Cyrillic | Lowercase_Letter, Uppercase_Letter | Cyrillic |
| mixed | Latin, Cyrillic | Lowercase_Letter, Uppercase_Letter | Basic Latin, Cyrillic |
| burmese | Myanmar | Nonspacing_Mark, Spacing_Mark, Other_Letter | Myanmar |
| digits | Common | Decimal_Number | Basic Latin |
| specials | Common | Modifier_Symbol, Currency_Symbol, Math_Symbol, Other_Punctuation, Open_Punctuation | Basic Latin |
| whitespace | Common | Space_Separator, Control | Basic Latin |
| jiddisch | Hebrew, Common | Space_Separator, Nonspacing_Mark, Other_Letter, Decimal_Number | Basic Latin, Hebrew |
| arabic | Arabic, Common | Space_Separator, Other_Letter | Basic Latin, Arabic |
| playing_cards | Common | Other_Symbol | Playing Cards |