std :: 문자열에서 문자 수를 얻는 방법?

https://stackoverflow.com/questions/905355

05-09-2019
|

문제

C ++의 문자열에서 문자 수를 어떻게 받아야합니까?

해결책

사용하는 경우 std::string, 전화 length():

std::string str = "hello";
std::cout << str << ":" << str.length();
// Outputs "hello:5"

C- 스트링을 사용하는 경우 전화하십시오 strlen().

const char *str = "hello";
std::cout << str << ":" << strlen(str);
// Outputs "hello:5"

또는 Pascal 스타일의 현을 사용하는 것을 좋아한다면 (또는 Joel Spolsky와 같은 f ***** 문자열 그들을 부르는 것을 좋아합니다 그들이 후행 null을 가지고있을 때), 첫 번째 캐릭터를 불러 일으키십시오.

const char *str = "\005hello";
std::cout << str + 1 << ":" << *str;
// Outputs "hello:5"

다른 팁

C ++ 문자열 (std :: string)을 다룰 때 길이() 또는 크기(). 둘 다 동일한 값을 제공해야합니다. 그러나 C 스타일 문자열을 다룰 때는 strlen ().

#include <iostream>
#include <string.h>

int main(int argc, char **argv)
{
   std::string str = "Hello!";
   const char *otherstr = "Hello!"; // C-Style string
   std::cout << str.size() << std::endl;
   std::cout << str.length() << std::endl;
   std::cout << strlen(otherstr) << std::endl; // C way for string length
   std::cout << strlen(str.c_str()) << std::endl; // convert C++ string to C-string then call strlen
   return 0;
}

산출:

말하는 문자열 유형에 따라 다릅니다. 문자열에는 여러 가지가 있습니다.

const char* -C 스타일 멀티 바이트 스트링
const wchar_t* - C 스타일의 와이드 스트링
std::string - "표준"멀티 바이트 문자열
std::wstring - "표준"와이드 스트링

3과 4의 경우 사용할 수 있습니다 .size() 또는 .length() 행동 양식.

1의 경우 사용할 수 있습니다 strlen(),하지만 문자열 변수가 null이 아닌지 확인해야합니다 (=== 0).

2의 경우 사용할 수 있습니다 wcslen(),하지만 문자열 변수가 null이 아닌지 확인해야합니다 (=== 0).

MFC와 같은 비표준 C ++ 라이브러리에는 다른 문자열 유형이 있습니다. CString, ATL CComBSTR, 에이스 ACE_CString, 그리고 다음과 같은 방법과 함께 .GetLength(), 등등. 나는 내 머리 꼭대기에서 그들의 세부 사항을 기억할 수 없다.

그만큼 stlsoft 도서관은이 모든 것을 그들이 부르는 것으로 추상화했습니다. 문자열 액세스 심, 어떤 유형에서나 문자열 길이 (및 기타 측면)를 얻는 데 사용할 수 있습니다. 따라서 동일한 기능을 사용하는 위의 모든 (비표준 도서관 포함)에 대해 stlsoft::c_str_len(). 이 기사 완전히 명백하거나 쉬운 것은 아니기 때문에 모든 것이 어떻게 작동하는지 설명합니다.

최신 STL 스타일 문자열 대신 오래된 C 스타일 문자열을 사용하는 경우 strlen C 런타임 라이브러리의 기능 :

const char* p = "Hello";
size_t n = strlen(p);

std :: string을 사용하는 경우 다음과 같은 두 가지 일반적인 방법이 있습니다.

std::string Str("Some String");
size_t Size = 0;
Size = Str.size();
Size = Str.length();

C 스타일 문자열 (char * 또는 const char * 사용)을 사용하는 경우 다음을 사용할 수 있습니다.

const char *pStr = "Some String";
size_t Size = strlen(pStr);

string foo;
... foo.length() ...

.length and .size는 동의어입니다. 단지 "길이"가 약간 더 명확한 단어라고 생각합니다.

std::string str("a string");
std::cout << str.size() << std::endl;

실제 문자열 객체의 경우 :

yourstring.length();

또는

yourstring.size();

c ++ std :: string에서 길이 () 및 size () 메소드는 바이트 수를 제공합니다. 그리고 반드시 캐릭터의 수는 아닙니다!. C 스타일 크기의 () 함수와 동일합니다!

인쇄 가능한 7bit-ASCII 문자 대부분의 경우 이것은 동일한 값이지만 7BIT-ASCII가 아닌 문자의 경우 확실히 그렇지 않습니다. 실제 결과를 제공하려면 다음 예제를 참조하십시오 (64 비트 리눅스).

실제로 문자 수를 계산할 수있는 간단한 C/C ++ 함수는 없습니다. 그건 그렇고,이 모든 것들은 구현 의존적이며 다른 환경에서 다를 수 있습니다 (컴파일러, 16/32, Linux, Embedded, ...)

다음 예를 참조하십시오 :

#include <string>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;

int main()
{
/* c-Style char Array */
const char * Test1 = "1234";
const char * Test2 = "ÄÖÜ€";
const char * Test3 = "αβγ𝄞";

/* c++ string object */
string sTest1 = "1234";
string sTest2 = "ÄÖÜ€";
string sTest3 = "αβγ𝄞";

printf("\r\nC Style Resluts:\r\n");
printf("Test1: %s, strlen(): %d\r\n",Test1, (int) strlen(Test1));
printf("Test2: %s, strlen(): %d\r\n",Test2, (int) strlen(Test2));
printf("Test3: %s, strlen(): %d\r\n",Test3, (int) strlen(Test3));

printf("\r\nC++ Style Resluts:\r\n");
cout << "Test1: " << sTest1 << ", Test1.size():  " <<sTest1.size() <<"  sTest1.length(): " << sTest1.length() << endl;
cout << "Test1: " << sTest2 << ", Test2.size():  " <<sTest2.size() <<"  sTest1.length(): " << sTest2.length() << endl;
cout << "Test1: " << sTest3 << ", Test3.size(): " <<sTest3.size() << "  sTest1.length(): " << sTest3.length() << endl;
return 0;
}

예제의 출력은 다음과 같습니다.

C Style Results:
Test1: ABCD, strlen(): 4    
Test2: ÄÖÜ€, strlen(): 9
Test3: αβγ𝄞, strlen(): 10

C++ Style Results:
Test1: ABCD, sTest1.size():  4  sTest1.length(): 4
Test2: ÄÖÜ€, sTest2.size():  9  sTest2.length(): 9
Test3: αβγ𝄞, sTest3.size(): 10  sTest3.length(): 10

STD 네임 스페이스에 대해 방해하지 않고 문자열 길이를 얻는 가장 간단한 방법은 다음과 같습니다.

공간이 있거나없는 문자열

#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
    cout<<"Length of given string is"<<str.length();
    return 0;
}

공간이없는 끈

#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
    cout<<"Length of given string is"<<str.length();
    return 0;
}

유니 코드 용

여기서 몇 가지 답변이이를 해결했습니다 .length() 멀티 바이트 문자로 잘못된 결과를 제공하지만 11 개의 답변이 있으며 그 중 어느 것도 해결책을 제공하지 않았습니다.

z̴͕̲̒̒͌̋ͪa͉̳̺ͥͬ̾l̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚의 사례

우선, "길이"가 의미하는 바를 아는 것이 중요합니다. 동기 부여 예를 들어, "z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚"문자열을 고려하십시오 (특히 태국인 일부 언어는 실제로 대규모 마크를 결합하는 것을 사용하므로 이것은 그렇지 않습니다. 단지 15 살짜리 밈에 유용하지만 분명히 가장 중요한 사용 사례입니다). 인코딩되었다고 가정합니다 UTF-8. 이 문자열의 길이에 대해 이야기 할 수있는 3 가지 방법이 있습니다.

95 바이트

00000000: 5acd a5cd accc becd 89cc b3cc ba61 cc92  Z............a..
00000010: cc92 cd8c cc8b cdaa ccb4 cd95 ccb2 6ccd  ..............l.
00000020: a4cc 80cc 9acc 88cd 9ccc a8cd 8ecc b0cc  ................
00000030: 98cd 89cc 9f67 cc92 cd9d cd85 cd95 cd94  .....g..........
00000040: cca4 cd96 cc9f 6fcc 90cd afcc 9acc 85cd  ......o.........
00000050: aacc 86cd a3cc a1cc b5cc a1cc bccd 9a    ...............

50 코드 포인트

LATIN CAPITAL LETTER Z
COMBINING LEFT ANGLE BELOW
COMBINING DOUBLE LOW LINE
COMBINING INVERTED BRIDGE BELOW
COMBINING LATIN SMALL LETTER I
COMBINING LATIN SMALL LETTER R
COMBINING VERTICAL TILDE
LATIN SMALL LETTER A
COMBINING TILDE OVERLAY
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LOW LINE
COMBINING TURNED COMMA ABOVE
COMBINING TURNED COMMA ABOVE
COMBINING ALMOST EQUAL TO ABOVE
COMBINING DOUBLE ACUTE ACCENT
COMBINING LATIN SMALL LETTER H
LATIN SMALL LETTER L
COMBINING OGONEK
COMBINING UPWARDS ARROW BELOW
COMBINING TILDE BELOW
COMBINING LEFT TACK BELOW
COMBINING LEFT ANGLE BELOW
COMBINING PLUS SIGN BELOW
COMBINING LATIN SMALL LETTER E
COMBINING GRAVE ACCENT
COMBINING DIAERESIS
COMBINING LEFT ANGLE ABOVE
COMBINING DOUBLE BREVE BELOW
LATIN SMALL LETTER G
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LEFT ARROWHEAD BELOW
COMBINING DIAERESIS BELOW
COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW
COMBINING PLUS SIGN BELOW
COMBINING TURNED COMMA ABOVE
COMBINING DOUBLE BREVE
COMBINING GREEK YPOGEGRAMMENI
LATIN SMALL LETTER O
COMBINING SHORT STROKE OVERLAY
COMBINING PALATALIZED HOOK BELOW
COMBINING PALATALIZED HOOK BELOW
COMBINING SEAGULL BELOW
COMBINING DOUBLE RING BELOW
COMBINING CANDRABINDU
COMBINING LATIN SMALL LETTER X
COMBINING OVERLINE
COMBINING LATIN SMALL LETTER H
COMBINING BREVE
COMBINING LATIN SMALL LETTER A
COMBINING LEFT ANGLE ABOVE

5 개의 그래픽

Z with some s**t
a with some s**t
l with some s**t
g with some s**t
o with some s**t

사용하는 길이 찾기 ICU

ICU 용 C ++ 클래스가 있지만 UTF-16으로 변환해야합니다. C 유형과 매크로를 직접 사용하여 UTF-8 지원을받을 수 있습니다.

#include <memory>
#include <iostream>
#include <unicode/utypes.h>
#include <unicode/ubrk.h>
#include <unicode/utext.h>

//
// C++ helpers so we can use RAII
//
// Note that ICU internally provides some C++ wrappers (such as BreakIterator), however these only seem to work
// for UTF-16 strings, and require transforming UTF-8 to UTF-16 before use.
// If you already have UTF-16 strings or can take the performance hit, you should probably use those instead of
// the C functions. See: http://icu-project.org/apiref/icu4c/
//
struct UTextDeleter { void operator()(UText* ptr) { utext_close(ptr); } };
struct UBreakIteratorDeleter { void operator()(UBreakIterator* ptr) { ubrk_close(ptr); } };
using PUText = std::unique_ptr<UText, UTextDeleter>;
using PUBreakIterator = std::unique_ptr<UBreakIterator, UBreakIteratorDeleter>;

void checkStatus(const UErrorCode status)
{
    if(U_FAILURE(status))
    {
        throw std::runtime_error(u_errorName(status));
    }
}

size_t countGraphemes(UText* text)
{
    // source for most of this: http://userguide.icu-project.org/strings/utext
    UErrorCode status = U_ZERO_ERROR;
    PUBreakIterator it(ubrk_open(UBRK_CHARACTER, "en_us", nullptr, 0, &status));
    checkStatus(status);
    ubrk_setUText(it.get(), text, &status);
    checkStatus(status);
    size_t charCount = 0;
    while(ubrk_next(it.get()) != UBRK_DONE)
    {
        ++charCount;
    }
    return charCount;
}

size_t countCodepoints(UText* text)
{
    size_t codepointCount = 0;
    while(UTEXT_NEXT32(text) != U_SENTINEL)
    {
        ++codepointCount;
    }
    // reset the index so we can use the structure again
    UTEXT_SETNATIVEINDEX(text, 0);
    return codepointCount;
}

void printStringInfo(const std::string& utf8)
{
    UErrorCode status = U_ZERO_ERROR;
    PUText text(utext_openUTF8(nullptr, utf8.data(), utf8.length(), &status));
    checkStatus(status);

    std::cout << "UTF-8 string (might look wrong if your console locale is different): " << utf8 << std::endl;
    std::cout << "Length (UTF-8 bytes): " << utf8.length() << std::endl;
    std::cout << "Length (UTF-8 codepoints): " << countCodepoints(text.get()) << std::endl;
    std::cout << "Length (graphemes): " << countGraphemes(text.get()) << std::endl;
    std::cout << std::endl;
}

void main(int argc, char** argv)
{
    printStringInfo(u8"Hello, world!");
    printStringInfo(u8"หวัดดีชาวโลก");
    printStringInfo(u8"\xF0\x9F\x90\xBF");
    printStringInfo(u8"Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚");
}

이 인쇄물 :

UTF-8 string (might look wrong if your console locale is different): Hello, world!
Length (UTF-8 bytes): 13
Length (UTF-8 codepoints): 13
Length (graphemes): 13

UTF-8 string (might look wrong if your console locale is different): หวัดดีชาวโลก
Length (UTF-8 bytes): 36
Length (UTF-8 codepoints): 12
Length (graphemes): 10

UTF-8 string (might look wrong if your console locale is different): 🐿
Length (UTF-8 bytes): 4
Length (UTF-8 codepoints): 1
Length (graphemes): 1

UTF-8 string (might look wrong if your console locale is different): Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚
Length (UTF-8 bytes): 95
Length (UTF-8 codepoints): 50
Length (graphemes): 5

boost.locale ICU 랩을 랩하고 더 좋은 인터페이스를 제공 할 수 있습니다. 그러나 여전히 UTF-16으로의 전환이 필요합니다.

문자열을 입력하고 길이를 찾는 가장 쉬운 방법 일 수 있습니다.

// Finding length of a string in C++ 
#include<iostream>
#include<string>
using namespace std;

int count(string);

int main()
{
string str;
cout << "Enter a string: ";
getline(cin,str);
cout << "\nString: " << str << endl;
cout << count(str) << endl;

return 0;

}

int count(string s){
if(s == "")
  return 0;
if(s.length() == 1)
  return 1;
else
    return (s.length());

}

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow