Well, there's no problem having smaller/greater elements in any of the files. Here is an example of the external sort process:
Your initial data:
data = [2, 5, 3, 7, 1, 6, 4, 8, 9]
Considering you have only 3 units of memory, you'd have the following shards, and results of sorting:
d1 = [2, 5, 3] -> sorting -> d1 = [2, 3, 5]
d2 = [7, 1, 6] -> sorting -> d2 = [1, 6, 7]
d3 = [4, 8, 9] -> sorting -> d3 = [4, 8, 9]
As you have three available units, you can read from the three shards at the same time, so, you'd have:
d = [], d1 = [2, 3, 5], d2 = [1, 6, 7], d3 = [4, 8, 9] -> min(d1, d2, d3) = 1
d = [1], d1 = [2, 3, 5], d2 = [6, 7], d3 = [4, 8, 9] -> min(d1, d2, d3) = 2
d = [1, 2], d1 = [3, 5], d2 = [6, 7], d3 = [4, 8, 9] -> min(d1, d2, d3) = 3
d = [1, 2, 3], d1 = [5], d2 = [6, 7], d3 = [4, 8, 9] -> min(d1, d2, d3) = 4
d = [1, 2, 3, 4], d1 = [5], d2 = [6, 7], d3 = [8, 9] -> min(d1, d2, d3) = 5
d = [1, 2, 3, 4, 5], d1 = [], d2 = [6, 7], d3 = [8, 9] -> min(d1, d2, d3) = 6
d = [1, 2, 3, 4, 5, 6], d1 = [], d2 = [7], d3 = [8, 9] -> min(d1, d2, d3) = 7
d = [1, 2, 3, 4, 5, 6, 7], d1 = [], d2 = [], d3 = [8, 9] -> min(d1, d2, d3) = 8
d = [1, 2, 3, 4, 5, 6, 7, 8], d1 = [], d2 = [], d3 = [9] -> min(d1, d2, d3) = 9
d = [1, 2, 3, 4, 5, 6, 7, 8, 9], d1 = [], d2 = [], d3 = [] -> []
What you may be concerned about is when you have limitation enough to allow you not to read at least one element from each file, or even if the decision was simply to read more elements from a given file, leaving behind another file to be read.
This is the same as the process above, with the only difference that, after reading, say, two files, and merging the data between them, you'd have to read from the third file and from the last file generated, which is the merge of files 1 and 2.
Since both the third file and the last file generated are surely sorted, you are able to sequentially scan the data from both files, merging the entries into an unique result.