I would use the Groovy console for this (load the "Groovy" plugin, then start the console from the Tools menu).
The following code assumes that
- you have opened the datastore in GATE developer
- you have loaded the source corpus, and its name is "fullCorpus"
- you have created three (or however many you need) other empty corpora and saved them (empty) to the same datastore. These will receive the partitions
- you have no other corpora open in GATE developer apart from these four
- you have no documents open
Then you can run the following in the Groovy console:
def rnd = new Random()
def fullCorpus = corpora.find { it.name == 'fullCorpus' }
def parts = corpora.findAll {it.name != 'fullCorpus' }
fullCorpus.each { doc ->
def targetCorpus = parts[rnd.nextInt(parts.size())]
targetCorpus.add(doc)
targetCorpus.unloadDocument(doc)
}
return null
The way this works is to iterate over the documents and pick a corpus at random for each document to be added to. The target sub-corpora should end up roughly (but not necessarily exactly) the same size.
The script does not save the final sub-corpora, so if it messes up you can just close them and then re-open them (empty) from the original datastore, the fix and re-run the script. Once you're happy with the final result, right click on each sub-corpus in turn in the left hand tree and "save to its datastore" to write it all to disk.