How are wctype.h functions supposed to be used correctly?
-
06-06-2021 - |
Question
The various is...
functions (e.g. isalpha
, isdigit
) in ctype.h
aren't entirely predictable. They take int
arguments but expect character values in the unsigned char
range, so on a platform where char
is signed, passing a char
value directly could lead to undesirable sign extension. I believe that the typical approach to handling this is to explicitly cast to an unsigned char
first.
Okay, but what is the proper, portable way to deal with the various isw...
functions in wctype.h
? wchar_t
, like char
, also may be signed or unsigned, but because wchar_t
is itself a typedef
, a typename of unsigned wchar_t
is illegal.
Solution 2
Upon re-reading the ISO C99 specification regarding wctype.h
, it states:
For all functions described in this subclause that accept an argument of type
wint_t
, the value shall be representable as awchar_t
or shall equal the value of the macroWEOF
. If this argument has any other value, the behavior is undefined. (§7.25.1/5)
Contrast this with the corresponding note for ctype.h
:
In all cases the argument is an
int
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined. (§7.4/1)
(emphasis mine)
I think that it's also worth understanding the motivation for why the ctype.h
functions require unsigned char
representations. The standard requires that EOF
be a negative int
(§7.19.1/3), so the ctype.h
functions use unsigned char
representations to (try to) avoid potential ambiguity.
In contrast, that motivation doesn't exist for wctype.h
functions. The standard makes no such requirement of WEOF
, elaborated by footnote 270:
The value of the macro
WEOF
may differ from that ofEOF
and need not be negative.
because WEOF
is already guaranteed to not conflict with any character represented by wchar_t
(§7.24.1/3).
Therefore the wctype.h
functions don't have or need any of the unsigned nonsense, and wchar_t
values can be passed to them directly.
OTHER TIPS
Isn't that what wint_t
is for? The iswXxxxx()
functions take a wint_t
type:
ISO 9899:1999 covers this in various sections, working backwards:
§7.25 Wide character classification and mapping utilities
<wctype.h>
§7.25.2.1.1 The iswalnum function
Synopsis
#include <wctype.h> int iswalnum(wint_t wc);
Description
The iswalnum function tests for any wide character for which iswalpha or iswdigit is true.
§7.24 Extended multibyte and wide character utilities
<wchar.h>
§7.24.1 Introduction:
wint_t
which is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);269)
269)
wchar_t
andwint_t
can be the same integer type.
The 'unchanged by default argument promotions' should mean that it has to be as big as an int
, though it could be a short
or unsigned short
if sizeof(short) == sizeof(int)
(which is seldom the case these days, though it was true for some 16-bit systems).
§7.17 Common definitions
<stddef.h>
wchar_t
which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales; the null character shall have the code value zero and each member of the basic character set shall have a code value equal to its value when used as the lone character in an integer character constant.
As long as the value passed to iswalnum()
or its kin is a valid wchar_t
or WEOF, the function will work correctly. If you manufactured the value out of thin air and manage to get the value wrong, you get undefined behaviour.