Please be more specific. Anyway:
If the text in the file corresponds to a single document (that it, a single instance), then all you need is to replace all "new lines" with the escape code
\n
to make the full text be in a single line, then manually format as an arff with a single text attribute and a single instance.If the text corresponds to several instances (e.g. documents), then I suggest to make an script to break it into several files and to apply
TextDirectoryLoader
. If there is any specific formating (e.g. instances are enclosed in XML tags), you can either do the same (by taking advantage of the XML format), or to write a custom Loader class in WEKA to recognize your format and build an Instances object.
If you post an example, it would be easier to get a more precise suggestion.