How to feed data for ngram model?
-
02-11-2019 - |
Question
I want to train an ngram language model
Let's say I have the following corpus:
The sliding cat is not able to dance
He is only able to slide
Because obviously he is the sliding cat
I am planning to use tf.data.Dataset to feed my model, which is fine
But I don't know if it is better to use a sliding window to iterate through my copus or simply feed my corpus n words at a time
Using a sliding window, my model (assuming a bigram) will see:
The sliding
sliding cat
cat is
is not
...
Going n word at a time:
The sliding
cat is
not able
...
I'd appreciate any recommandation, thanks
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange