Question

Code points of some Unicode characters (like 𤭢) consume more than 2-bytes. How do I use Win32 API functions like CreateFile() with these characters?

WinBase.h

WINBASEAPI
__out
HANDLE
WINAPI
CreateFileA(
    __in     LPCSTR lpFileName,
    __in     DWORD dwDesiredAccess,
    __in     DWORD dwShareMode,
    __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    __in     DWORD dwCreationDisposition,
    __in     DWORD dwFlagsAndAttributes,
    __in_opt HANDLE hTemplateFile
    );
WINBASEAPI
__out
HANDLE
WINAPI
CreateFileW(
    __in     LPCWSTR lpFileName,
    __in     DWORD dwDesiredAccess,
    __in     DWORD dwShareMode,
    __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    __in     DWORD dwCreationDisposition,
    __in     DWORD dwFlagsAndAttributes,
    __in_opt HANDLE hTemplateFile
    );
#ifdef UNICODE
#define CreateFile  CreateFileW
#else
#define CreateFile  CreateFileA
#endif // !UNICODE

LPCSTR and LPCWSTR are define in WinNT.h as:

typedef __nullterminated CONST CHAR *LPCSTR, *PCSTR;
typedef __nullterminated CONST WCHAR *LPCWSTR, *PCWSTR;

CHAR and WCHAR is defined in WinNT.h as:

typedef char CHAR;
#ifndef _MAC
typedef wchar_t WCHAR;    // wc,   16-bit UNICODE character
#else
// some Macintosh compilers don't define wchar_t in a convenient location, or define it as a char
typedef unsigned short WCHAR;    // wc,   16-bit UNICODE character
#endif

CreateFileA() accepts LPCSTR file names, which are stored in 8-bit char array internally.
CreateFileW() accepts LPCWSTR file names, which are stored in 16-bit wchar_t array internally.

I have created a file in the position C:\𤭢.txt. It looks like it is not possible to open this file using CreateFile(), because it contains the character 𤭢 whose Unicode code point is 0x24B62 which doesn't fit even in a WCHAR array cell.

But that file exists in my harddisk and Windows manages it normally. How do I open this file by a Win32 API function, like Windows does internally?

Was it helpful?

Solution

Such characters are represented by UTF-16 surrogate pairs. It takes two wide character elements to represent that code point. So, you just need to call CreateFile passing the necessary surrogate pair. And naturally you need to use the wide variant of CreateFile.

Presumably you won't be hard-coding such a filename in your code. In which case you'll be getting it from a file dialog, FindFirstFile, etc. And those APIs will give you the appropriate UTF-16 encoded buffer for the file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top