In the comments, Harry Johnston writes:
The generic solution to this class of problem is to encode the file paths before joining them. For example, if you're dealing with single-byte strings, you could convert them to hex strings; so "hello" becomes "68656c6c6f". (Obviously that isn't the most efficient solution!)
That is absolutely correct. Please don't try to do anything "tricky" with filenames and reserved characters, because it will eventually break in some weird corner case and your successor will have a heck of a time trying to repair the damage.
In fact, if you're trying to be portable, I strongly recommend that you never attempt to create any filenames including any characters other than [a-z0-9_]
. (Consider that common filesystems on both Windows and OS X can operate in case-insensitive mode, where FooBar.txt
and FOOBAR.TXT
are the same identifier.)
A decently compact encoding scheme for practical use would be to make a "whitelisted set" such as [a-z0-9_]
, and encode any character ch
outside your "whitelisted set" as printf("_%2x", ch)
. So hello.txt
becomes hello_2etxt
, and hello_world.txt
becomes hello_5fworld_2etxt
.
Since every _
is escaped, you can use double-_
as a separator: the encoded string hello_2etxt__goodbye___2e_2e
uniquely identifies the list of filenames ['hello.txt', 'goodbye', '..']
.