c++11 speed comparison/cost std::hash<std::string> equal versus std::string equal directly on 2 large strings

https://stackoverflow.com/questions/17887146

04-06-2022
|

문제

Hi I have a question on std::hash if I have 2 large strings for comparison and I am willing to accept that std::hash will compare equal in most cases is it more performance compliant to do something like the following instead of a direct string comparison? Also consider this will be in a loop reading a file so it will be executed several times which is the concern for large files.

std::string largeString1;  // large but not huge meaning a line of text like up to lets say 500 chars 
std::string largeString2;

// is this better than then next block in terms of performance and if so by how much?
if ( std::hash<std::string>(largeString1) == std::hash<std::string>(largeString2) )
{
// true logic
}

// is this a lot slower than the previous
if ( largeString1 == largeString2 )
{
// true logic
}

해결책

std::hash<std::string>(largeString1) == std::hash<std::string>(largeString2)

Will be far slower than

largeString1 == largeString2

Hashing a string involve iterating over the entire length of it. So the hash comparison requires the code to iterate the full length of both strings one at a time and run them through complex equations. The straight equality code simply iterates them at the same time and immediately quits the instant it finds a difference. Trust the library. If == could be done faster, they would have done it faster.

If you're going to be comparing each string many times, then hashing ahead of time once and comparing just the hashes may be faster, but you would still have to confirm matches since comparing hashes can give false positives. It only makes the "do not match" case faster.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow