I'm going to be honest with you. Frankly, there are not that many words in the English language. If you have a very large input vectors (say tens of thousands), you will probably have the most effective performance. In fact, you may even be able to get buy with no hidden layers.
For image recognition, etc., having an input vector of thousands of points is not uncommon.