C++ writing a list of names to a file in a specific order without loading them all in memory

StackOverflow https://stackoverflow.com/questions/23434760

  •  14-07-2023
  •  | 
  •  

Question

I have a school task to load a list of names from one text file to another while ordering them, yet I am not allowed to keep them all in the memory (array for example) at the same time. What would be the best way to do this. I have to do a binary search on them afterwards.

My first thought was to generate a hash key for each of them, and then write them in a location that is relative to their key, but the fact that I have to do a binary search afterwards makes me think that this is redundant. The problem is not knowing them all beforehand (that means I have to somehow push some names in the middle).

Was it helpful?

Solution

This is probably the easiest way

1) read the file line by line and find the first name in your sorting method

e.g.
-read name_1.
-read next name_2.
If name_1 < name_2 then name_2 = name_1 and repeat.
2) read the file line by line again and find the second name. i.e. The lowest name that is still higher than the first name.
3) write the first name into a file.
4) now read line by line for the third name
5) add the second name into a file etc...

This will not be fast, but it will have virtual no memory overhead. You will never have more than 3 names stored in memory.

OTHER TIPS

Some ways:

1) You could split the data into multiple temporary files; sort each file separately; merge the files.

2) Call the operating system to sort the file, something like

 system ("sort input>output")

Ok, I don't know if I used the term 'lexical tree' right in my comment, but I would make a tree, like a binary, but not with only two possible nodes, but whole alphabet possible. I believe this is called a 'Trie'.

In the nodes you keep a counter how many entries ended on that particular node. You create the nodes dynamically as you need them, so the space consumption is kept low.

Then you can traverse the whole tree and retrieve all elements in an order. That would be non-trivial sort, that would work very well for entries with common prefixes. It would be fast, as all inserts are linear, travesal is also linear. So it would take O(2*N), where N is number of characters in whole set to sort. And memory consumption would be good if the data set would have common prefixes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top