Module lingua.writer

Functions

def check_input_file_path(input_file_path: pathlib.Path)
def check_output_directory_path(output_directory_path: pathlib.Path)

Classes

class LanguageModelFilesWriter

This class creates language model files and writes them to a directory.

Static methods

def create_and_write_language_model_files(input_file_path: pathlib.Path, output_directory_path: pathlib.Path, language: Language, char_class: str)

Create language model files for accuracy report generation and write them to a directory.

Args

input_file_path
The path to a txt file used for language model creation. The assumed encoding of the txt file is UTF-8.
output_directory_path
The path to an existing directory where the language model files are to be written.
language
The language for which to create language models.
char_class
A regex character class such as \p{L} to restrict the set of characters that the language models are built from.

Raises

Exception
if the input file path is not absolute or does not point to an existing txt file; if the input file's encoding is not UTF-8; if the output directory path is not absolute or does not point to an existing directory; if the character class cannot be compiled to a valid regular expression
class TestDataFilesWriter

This class creates test data files for accuracy report generation and writes them to a directory.

Static methods

def create_and_write_test_data_files(input_file_path: pathlib.Path, output_directory_path: pathlib.Path, char_class: str, maximum_lines: int)

Create test data files for accuracy report generation and write them to a directory.

Args

input_file_path
The path to a txt file used for test data creation. The assumed encoding of the txt file is UTF-8.
output_directory_path
The path to an existing directory where the test data files are to be written.
char_class
A regex character class such as \p{L} to restrict the set of characters that the test data are built from.
maximum_lines
The maximum number of lines each test data file should have.

Raises

Exception
if the input file path is not absolute or does not point to an existing txt file; if the input file's encoding is not UTF-8; if the output directory path is not absolute or does not point to an existing directory; if the character class cannot be compiled to a valid regular expression