How to format the input
This is an extraction from wikipedia.org
Here are two simple text documents:
John likes to watch movies. Mary likes too.
John also likes to watch football games.
Based on these two text documents, a dictionary is constructed as:
{
"John": 1,
"likes": 2,
"to": 3,
"watch": 4,
"movies": 5,
"also": 6,
"football": 7,
"games": 8,
"Mary": 9,
"too": 10
}
which has 10 distinct words. And using the indexes of the dictionary, each document is represented by a 10-entry vector:
[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 0, 1, 1, 1, 0, 0]
Your input will remain the same size, regardless of the length of your document. I hope this will help you.