Class Tokenizer
Implements
System.IDisposable
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Keras.PreProcessing.text
Assembly: Keras.dll
Syntax
public class Tokenizer : Base, IDisposable
Constructors
| Improve this Doc View SourceTokenizer(Nullable<Int32>, String, Boolean, String, Boolean, Nullable<Int32>, Int32)
Initializes a new instance of the Tokenizer class.
Declaration
public Tokenizer(int? num_words = default(int? ), string filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", bool lower = true, string split = " ", bool char_level = false, int? oov_token = default(int? ), int document_count = 0)
Parameters
Type | Name | Description |
---|---|---|
System.Nullable<System.Int32> | num_words | the maximum number of words to keep, based on word frequency. Only the most common num_words-1 words will be kept. |
System.String | filters | a string where each element is a character that will be filtered from the texts. The default is all punctuation, plus tabs and line breaks, minus the ' character. |
System.Boolean | lower | boolean. Whether to convert the texts to lowercase. |
System.String | split | str. Separator for word splitting. |
System.Boolean | char_level | if True, every character will be treated as a token. |
System.Nullable<System.Int32> | oov_token | if given, it will be added to word_index and used to replace out-of-vocabulary words during text_to_sequence calls |
System.Int32 | document_count |
Implements
System.IDisposable