Question

In a windows c++ console application I would like to read a password from command-line input. The password is used for encryption (and later decryption, maybe elsewhere in the world on a windows pc with a different locale). So I worry about locales and encoding of that passphrase not giving the same numerical representation. On the same computer or a computer with the same locale this does obviously not give a problem.

Therefore I would like to be able to fixed encode (and normalize?) and store as UTF-8. which is recommended here: http://www.jasypt.org/howtoencryptuserpasswords.html (point 4).

There are many issues relating to encoding/unicode/UTF-8/codepages I don't fully (or fully don't) grasp. I fiddled with boost:locale and boost::nowide, but couldn't figure it out or it doesn't work under windows (dunno). Some links with more clarification on the issues (windows) involved:

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/

http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

But these links address the opposite problem! How to make things look the same no matter what underlying representation, I need the same underlying [bit-wise] representation, no matter how it looks!

So the question is, how do I make sure (and do I have to?) that the locale/encoding has no effect on the basic data that get encrypted, data, as in the sense of an array of 8-bit integers? I don't necessarilly care about UTF-8 or Unicode, just need to be able to recover data, no matter what locale/encoding. The first link is helpful in explaining the issue.

Thoughts, C is not Unicode aware, would linking in some C-code help, or does C++ change that then again? Or will limiting input to "ASCII" characters (I know that doesn't exist on windows) ALWAYS, as in 'on any windows computer') work?

Accepted solution:

void EncryptFileNames ( const boost::filesystem::path& p, const std::string& pw );

int main ( int argc, char **argv ) // No checking
{
    // Call with encrypt.exe c:\tmp pässwörd

    boost::nowide::args a ( argc, argv ); // Fix arguments - make them UTF-8

    boost::filesystem::path p ( argv [ 1 ] );

    EncryptFileNames ( p, boost::locale::normalize ( argv [ 2 ], boost::locale::norm_nfc, std::locale ( ) ) );

    return 0;
}

Thanks to all contributers.

PS: For encryption I use Crypto++ with VS2008SP1 and Boost (without ICU backend).

Was it helpful?

Solution

Firstly UTF-8 is a red herring. To be international you must use an international character set, there is only one worth considering and it's called Unicode. How you represent Unicode within your program (i.e. how you encode it) is up to you, as long as the encoding can represent all of Unicode there is no problem. You could pick UTF-8 but since you are working on Windows it seems reasonable to pick the encoding that Windows uses internally which is UTF-16. As bmargulies says you can use MultiByteToWideChar to get from the local representation (i.e. the local code page) to UTF-16. I don't see the need to do the extra step and generate UTF-8 from UTF-16 but if you wanted to do that you could use WideCharToMultiByte.

OTHER TIPS

If your application is compiled with _UNICODE, then just call WideCharToMultiByte with the UTF-8 code page to get UTF-8. If your application is not compiled with _UNICODE, call MultiByteToWideChar to get UTF-16 from your ACP bytes, and then call WideCharToMultiByte to get UTF-8.

Since the code you added shows std::string, the data is presumably in the ACP for the system. So the recipe here will work. Now, there are plenty of convenience APIs for this purpose, such as mbtowcs. Don't be distracted by 'MB'. That's just Windows-speak for 'not UTF-16'.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top