Due Dec 21, 2:59 PM +07
What is the name of the object used to tokenize sentences?
What is the name of the method used to tokenize a list of sentences?
Once you have the corpus tokenized, what’s the method used to encode a list of sentences to use those tokens?
When initializing the tokenizer, how to you specify a token to use for unknown words?
If you don’t use a token for out of vocabulary words, what happens at encoding?
If you have a number of sequences of different lengths, how do you ensure that they are understood when fed into a neural network?
If you have a number of sequences of different length, and call pad_sequences on them, what’s the default result?
When padding sequences, if you want the padding to be at the end of the sequence, how do you do it?