Question

I am consuming a character stream (unseekable) which contains separated strings: e.g. abc|def|ghijkd with the separator |. Hence when I am parsing a token, I keep adding the characters of the stream to a string until I find a pipe, and then I start a new string.

I don't want to copy around the string under construction each time I add a new character.

What is the standard practice of building a string of a priori unknown length?

What I am doing now is basically allocating a "block" of given size (say 256 chars) and copying there the characters from the stream. When the block is full I increment the block size (in steps of the initial block size), allocate a new block, copy the old block to the new position, free the old block, add characters to the end of the new block, etc, rinse and repeat. This appears a bit sledge-hammer to me.

Was it helpful?

Solution

What you are doing is essentially what realloc does, except that it will only copy if it's unable to enlarge the block. Standard practice is to double the size of your block each time. The exponential growth will limit the number of reallocations required.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top