Module lingua.builder
Classes
class LanguageDetectorBuilder (languages: frozenset)
-
This class configures and creates an instance of LanguageDetector.
Static methods
def from_all_languages() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages.
def from_all_languages_with_arabic_script() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Arabic script.
def from_all_languages_with_cyrillic_script() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Cyrillic script.
def from_all_languages_with_devanagari_script() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Devanagari script.
def from_all_languages_with_latin_script() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages supporting the Latin script.
def from_all_languages_without(*languages: Language) ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in languages except those passed to this method.
def from_all_spoken_languages() ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with all built-in spoken languages.
def from_iso_codes_639_1(*iso_codes: IsoCode639_1) ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with the languages specified by the ISO 639-1 codes passed to this method.
Raises
ValueError
- if less than two ISO codes are specified
def from_iso_codes_639_3(*iso_codes: IsoCode639_3) ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with the languages specified by the ISO 639-3 codes passed to this method.
Raises
ValueError
- if less than two ISO codes are specified
def from_languages(*languages: Language) ‑> LanguageDetectorBuilder
-
Create and return an instance of LanguageDetectorBuilder with the languages passed to this method.
Methods
def build(self) ‑> LanguageDetector
-
Create and return the configured LanguageDetector instance.
def with_minimum_relative_distance(self, distance: float) ‑> LanguageDetectorBuilder
-
Set the desired value for the minimum relative distance measure.
By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word 'prologue', for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy.
Be aware that the distance between the language probabilities is dependent on the length of the input text. The longer the input text, the larger the distance between the languages. So if you want to classify very short text phrases, do not set the minimum relative distance too high. Otherwise you will get most results returned as None which is the return value for cases where language detection is not reliably possible.
Raises
ValueError
- if distance is smaller than 0.0 or greater than 0.99
def with_preloaded_language_models(self) ‑> LanguageDetectorBuilder
-
Preload all language models when creating the LanguageDetector instance.
By default, Lingua uses lazy-loading to load only those language models on demand which are considered relevant by the rule-based filter engine. For web services, for instance, it is rather beneficial to preload all language models into memory to avoid unexpected latency while waiting for the service response. This method allows to switch between these two loading modes.