Module lingua.detector
Classes
class LanguageDetector (_languages: frozenset, _minimum_relative_distance: float, _languages_with_unique_characters: frozenset, _one_language_alphabets: dict, _unigram_language_models: dict, _bigram_language_models: dict, _trigram_language_models: dict, _quadrigram_language_models: dict, _fivegram_language_models: dict)
-
This class detects the language of text.
Methods
def compute_language_confidence_values(self, text: str) ‑> List[Tuple[Language, float]]
-
Compute confidence values for each language considered possible for the given text.
A list of all possible languages is returned, sorted by their confidence value in descending order. The values that this method computes are part of a relative confidence metric, not of an absolute one. Each value is a number between 0.0 and 1.0. The most likely language is always returned with value 1.0. All other languages get values assigned which are lower than 1.0, denoting how less likely those languages are in comparison to the most likely language.
The list returned by this method does not necessarily contain all languages which this LanguageDetector instance was built from. If the rule-based engine decides that a specific language is truly impossible, then it will not be part of the returned list. Likewise, if no ngram probabilities can be found within the detector's languages for the given text, the returned list will be empty. The confidence value for each language not being part of the returned list is assumed to be 0.0.
Args
text
:str
- The text for which to compute confidence values.
Returns
A list of 2-element tuples. Each tuple contains a language and the associated confidence value.
def detect_language_of(self, text: str) ‑> Optional[Language]
-
Detect the language of text.
Args
text
:str
- The text whose language should be identified.
Returns
The identified language. If the language cannot be reliably detected, None is returned.