Features in Document clustering/classification?

Question

What you are talking about here is just one of possibilities, in fact the most trivial way of defining features for documents. In machine learning terminology feature is any mapping from the input space (in this particular example - from space of documents) into some abstract space, which is suited for a particular machine learning model. Most of the ML models (like neural networks, support vector machines, etc.) work on the numerical vectors, so features has to be mappings from documents to (constant size) vectors of numbers. This is a reason for sometimes choosing a representation of bag of owrds, where we have a words' count vector as a document representation. This limitation can be overcomed by using specific models, like for example Naive Bayes (or a custom kernel for SVM, which enables them to work with nonnumeric data), which can work on any objects, as long as we can define perticular conditional probabilities - here, the most basic approach is treating document containing a particular word or not as a "feature". In general this is not the only possibility, there are dozens of methods that use statistical features, semantic features (based on some ontologies like wordnet) etc.

To sum up - this is only one, most simple representation of document for the machine learning model. Good to start with, good to understand the basics, but far from being a "feature definition".

EDIT

no_of_features is size of the vector you use for your documents' representation, so if you use tf-idf, then size of resulting vecor is a no_of_featuers.