Question

Do you know some library/way in Java to generate tar archive with file names in proper windows national codepage ( for example cp1250 ).

I tried with Java tar, example code:

final TarEntry entry = new TarEntry( files[i] );
String filename = files[i].getPath().replaceAll( baseDir, "" );
entry.setName( new String( filename.getBytes(), "Cp1250" ) );
out.putNextEntry( entry );
...

It doesn't work. National characters are broken where I extract tar in windows. I've also found a strange thing, under Linux Polish national characters are shown correctly only when I used ISO-8859-1:

entry.setName( new String( filename.getBytes(), "ISO-8859-1" ) );

Despite the fact that proper Polish codepage is ISO-8859-2, which doesn't work too. I've also tried Cp852 for windows, no effect.

I know the limitations of tar format, but changing it is not an option.

Thanks for suggestions,

Was it helpful?

Solution

Officially, TAR doesn't support non-ASCII in headers. However, I was able to use UTF-8 encoded filenames on Linux.

You should try this,

String filename = files[i].getName();
byte[] bytes = filename.getBytes("Cp1250")
entry.setName(new String(bytes, "ISO-8859-1"));
out.putNextEntry( entry );

This at least preserves the bytes in Cp1250 in TAR headers.

OTHER TIPS

tar doesn't allow for non-ASCII values in its headers. If you try a different encoding, the result is probably up to what the target platform decides to do with those byte values. It kind of sounds like your target platform's tar program is interpreting the bytes as ISO-8859-1, which is why that 'works'.

Have a look at extended attributes? http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+8-current

I am no expert here but this seems to be the only official way to put any non-ASCII values in a tar file header.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top