Question

I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.

I use find() function for it:

    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");

L'ф' is a Cyrillic letter.

But find() in same call always returns npos. In a case with Latin letters find() works correctly.

It is a problem of this function? Or I incorrectly do something?

UPD

I use MinGW and save source in UTF-8. I also set locale with setlocale(LC_ALL, "");. Code same wcout << L'ф'; works coorectly. But same

wchar_t w;
wcin >> w;
wcout << w;

works incorrectly.

It is strange. Earlier I had no problems with the encoding, using setlocale ().

Was it helpful?

Solution

The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:

std::wcout << std::hex << L"ф";

Before C++11, you could use non-ASCII characters in source code by using their hex values:

"\x05" "five"

C++11 adds the ability to specify their Unicode value, which in your case would be

L"\u03A6"

If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:

const char* phi_utf8 = "\u03A6";
const char16_t* phi_utf16 = u"\u03A6";
const char32_t* phi_utf16 = U"\u03A6";

OTHER TIPS

You must set the encoding of the console.

This works:

#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>

using namespace std;

int main()
{       
    _setmode(_fileno(stdout), _O_U16TEXT);
    _setmode(_fileno(stdin), _O_U16TEXT);
    wstring str;
    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
    system("pause");
    return 0;
}

std::wstring::find() works fine. But you have to read the input string correctly.

The following code runs fine on Windows console (the input Unicode string is read using ReadConsoleW() Win32 API):

#include <exception>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <windows.h>
using namespace std;

class Win32Error : public runtime_error
{
public:
    Win32Error(const char* message, DWORD error)
        : runtime_error(message)
        , m_error(error)
    {}

    DWORD Error() const
    {
        return m_error;
    }

private:
    DWORD m_error;
};

void ThrowLastWin32(const char* message)
{
    const DWORD error = GetLastError();
    throw Win32Error(message, error);
}

void Test()
{
    const HANDLE hStdIn = GetStdHandle(STD_INPUT_HANDLE);
    if (hStdIn == INVALID_HANDLE_VALUE)
        ThrowLastWin32("GetStdHandle failed.");

    static const int kBufferLen = 200;
    wchar_t buffer[kBufferLen];
    DWORD numRead = 0;

    if (! ReadConsoleW(hStdIn, buffer, kBufferLen, &numRead, nullptr))
        ThrowLastWin32("ReadConsoleW failed.");

    const wstring str(buffer, numRead - 2);

    static const wchar_t kEf = 0x0444;
    wcout << ((str.find(kEf) != wstring::npos) ? L"EXIST" : L"NONE");
}

int main()
{
    static const int kExitOk = 0;
    static const int kExitError = 1;

    try
    {
        Test();
        return kExitOk;
    }    
    catch(const Win32Error& e)
    {
        cerr << "\n*** ERROR: " << e.what() << '\n';
        cerr << "    (GetLastError returned " << e.Error() << ")\n";
        return kExitError;
    }
    catch(const exception& e)
    {
        cerr << "\n*** ERROR: " << e.what() << '\n';
        return kExitError;
    }        
}

Output:

C:\TEMP>test.exe
abc
NONE
C:\TEMP>test.exe
abcфabc
EXIST

That's probably an encoding issue. wcin works with an encoding different from your compiler's/source code's. Try entering the ф in the console/wcin -- it will work. Try printing the ф via wcout -- it will show a different character or no character at all.

There is no platform independent way to circumvent this, but if you are on windows, you can manually change the console encoding, either with the chchp commandline command or programmatically with SetConsoleCP() (input) and SetConsoleOutputCP() (output).

You could also change your source file's/compiler's encoding. How this is done depends on your editor/compiler. If you are using MSVC, this answer might help you: https://stackoverflow.com/a/1660901/2128694

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top