Question

I am trying to create a simple datastructure that will make it easy to convert back and forth between ASCII strings and Unicode strings. My issue is that the length returned by the function mbstowcs is correct but the length returned by the function wcslen, on the newly created wchar_t string, is not. Am I missing something here?

typedef struct{

    wchar_t *string;
    long length; // I have also tried int, and size_t
} String;

void setCString(String *obj, char *str){

    obj->length = strlen(str);

    free(obj->string); // Free original string
    obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to

    //memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference

    size_t length = 0;

    length = mbstowcs(obj->string, (const char *)str, obj->length);

    printf("Length = %d\n",(int)length); // Prints correct length
    printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...

    if(length != wcslen(obj->string))
            printf("Length failure!\n");

    if(length == -1)
    {
        //Conversion failed, set string to NULL terminated character
        free(obj->string);
        obj->string = (wchar_t *)malloc(sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
        //do stuff
    }
}
Was it helpful?

Solution

The length you need to pass to mbstowcs() includes the L'\0' terminator character, but your calculated length in obj->length() does not include it - you need to add 1 to the value passed to mbstowcs().

In addition, instead of using strlen(str) to determine the length of the converted string, you should be using mbstowcs(0, src, 0) + 1. You should also change the type of str to const char *, and elide the cast. realloc() can be used in place of a free() / malloc() pair. Overall, it should look like:

typedef struct {
    wchar_t *string;
    size_t length;
} String;

void setCString(String *obj, const char *str)
{
    obj->length = mbstowcs(0, src, 0);
    obj->string = realloc(obj->string, (obj->length + 1) * sizeof(wchar_t)); 

    size_t length = mbstowcs(obj->string, str, obj->length + 1);

    printf("Length = %zu\n", length);
    printf("!C string %s converted to wchar string %ls\n", str, obj->string);

    if (length != wcslen(obj->string))
            printf("Length failure!\n");

    if (length == (size_t)-1)
    {
        //Conversion failed, set string to NULL terminated character
        obj->string = realloc(obj->string, sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked!
        //do stuff
    }
}

Mark Benningfield points out that mbstowcs(0, src, 0) is a POSIX / XSI extension to the C standard - to obtain the required length under only standard C, you must instead use:

    const char *src_copy = src;
    obj->length = mbstowcs(NULL, &src_copy, 0, NULL);

OTHER TIPS

The code seems to work fine for me. Can you provide more context, such as the content of strings you're passing to it, and what locale you're using?

A few other bugs/style issues I noticed:

  • obj->length is left as the allocated length, rather than updated to match the length in (wide) characters. Is that your intention?
  • The cast to const char * is useless and bad style.

Edit: Upon discussion, it looks like you may be using a nonconformant Windows version of the mbstowcs function. If so, your question should be updated to reflect as such.

Edit 2: The code only happened to work for me because malloc returned a fresh, zero-filled buffer. Since you are passing obj->length to mbstowcs as the maximum number of wchar_t values to write to the destination, it will run out of space and not be able to write the null terminator unless there's a proper multibyte character (one which requires more than a single byte) in the source string. Change this to obj->length+1 and it should work fine.

I am running this on Ubuntu linux with UTF-8 as locale.

Here is the additional info as requested:

I am calling this function with a fully allocated structure and passing in a hard coded "string" (not a L"string"). so I call the function with what is essentially setCString(*obj, "Hello!").

Length = 6

!C string Hello! converted to wchar string Hello!xxxxxxxxxxxxxxxxxxxx

(where x = random data)

Length failure!

for reference printf("wcslen = %d\n",(int)wcslen(obj->string)); prints out as wcslen = 11

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top