Module lingua.builder

Classes

class LanguageDetectorBuilder (languages: frozenset)

This class configures and creates an instance of LanguageDetector.

Static methods

def from_all_languages() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages.

def from_all_languages_with_arabic_script() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Arabic script.

def from_all_languages_with_cyrillic_script() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Cyrillic script.

def from_all_languages_with_devanagari_script() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Devanagari script.

def from_all_languages_with_latin_script() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Latin script.

def from_all_languages_without(*languages: Language) ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in languages except those passed to this method.

def from_all_spoken_languages() ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with all built-in spoken languages.

def from_iso_codes_639_1(*iso_codes: IsoCode639_1) ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with the languages specified by the ISO 639-1 codes passed to this method.

Raises

ValueError
if less than two ISO codes are specified
def from_iso_codes_639_3(*iso_codes: IsoCode639_3) ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with the languages specified by the ISO 639-3 codes passed to this method.

Raises

ValueError
if less than two ISO codes are specified
def from_languages(*languages: Language) ‑> LanguageDetectorBuilder

Create and return an instance of LanguageDetectorBuilder with the languages passed to this method.

Methods

def build(self) ‑> LanguageDetector

Create and return the configured LanguageDetector instance.

def with_minimum_relative_distance(self, distance: float) ‑> LanguageDetectorBuilder

Set the desired value for the minimum relative distance measure.

By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word 'prologue', for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy.

Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise you will get most results returned as None which is the return value for cases where language detection is not reliably possible.

Raises

ValueError
if distance is smaller than 0.0 or greater than 0.99
def with_preloaded_language_models(self) ‑> LanguageDetectorBuilder

Preload all language models when creating the LanguageDetector instance.

By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. This method allows to switch between these two loading modes.