質問

I am trying to improve my understanding of C++, pointer arithmetic especially. I use atoi pretty often, but I have rarely given thought as to how it works. Looking up how it is done, I understand it mostly, but there is one thing that I am confused about.

Here is an example of a solution I have found online:

int atoi( char* pStr ) 
{
  int iRetVal = 0; 

  if ( pStr )
  {
    while ( *pStr && *pStr <= '9' && *pStr >= '0' ) 
    {
      iRetVal = (iRetVal * 10) + (*pStr - '0');
      pStr++;
    }
  } 
  return iRetVal; 
} 

I think the main reason I have had a hard time grasping how atoi as been done in the past is the way characters are compared. The "while" statement is saying while the character exists, and the character is less-than-or-equal-to 9, and it is greater-than-or-equal-to 0 then do stuff. This statement says two things to me:

  1. Characters can be compared to other characters logically (but what is the returned value?).

Before I looked into this I suppose I knew it subconsciously but I never actually thought about it, but a '5' character is "smaller" than a '6' character in the same way that 5 is less than 6, so you can compare the characters as integers, essentially (for this intent).

  1. Somehow while (*sPtr) and *SPtr != 0 are different. This seems obvious to me, but I find that I cannot put it into words, which means I know this is true but I do not understand why.

Edit: I have no idea what the *pStr - '0' part would do.

Any help making sense of these observations would be very... helpful! Thanks!

役に立ちましたか?

解決 3

A character in C is represented simply as an ASCII value. Since all the digits are consecutive in ASCII (i.e. 0x30 == '0' and 0x39 == '9' with all the other digits in between), you can determine if a character is a digit by simply doing a range check, and you can get the digit's value by subtracting '0'.

他のヒント

while the character exists

No, not really. It says "while character is not 0 (or '\0'). Basically, ASCII character '\0' indicates an end of a "C" string. Since you don't want to go past the end of a character array (and the exact length is not known), every character is tested for '\0'.

Characters can be compared to other characters logically

That's right. Character is nothing but a number, well, at least in ASCII encoding. In ASCII, for instance, '0' corresponds to a decimal value of 48, '1' is 49, 'Z' is 90 (you can take a look at ASCII Table here). So yeah, you can compare characters just like you compare integers.

Somehow while (*sPtr) and *sPtr != 0 are different.

Not different at all. A decimal 0 is a special ASCII symbol (nul) that is used to indicate the end of "C" string, as I mentioned in the beginning. You cannot see or print (nul), but it's there.

The *pStr - '0' converts the character to its numeric value '1' - '0' = 1 The while loop checks if we are not at the end of the string and that we have a valid digit.

Note that posted implementation of atoi is not complete. Real atoi can process negative values.

Somehow while (*sPtr) and *sPtr != 0 are different.

These two expressions are the same. When used as condition, *sPtr is considered true when value stored at address sPtr is not zero, and *sPtr != 0 is true when value stored at address sPtr is not zero. Difference is when used somewhere else, then second expression evaluates to true or false, but the first one evaluates to stored value.

C-style strings are null-terminated.

Therefore:

while ( *pStr && *pStr <= '9' && *pStr >= '0' ) 

This tests:

  • *pStr that we have not yet reached the end of the string and is equivalent to writing *pStr != 0 (note without the single quote, ASCII value 0, or NUL).
  • *pStr >= '0' && *pStr <= '9' (perhaps more logically) that the character at *pStr is in the range '0' (ASCII value 48) to '9' (ASCII value 57), that is a digit.

The representation of '0' in memory os 0x30 and the representation of '9' is 0x39. This is what the computer sees, and when it compares them with logical operators, it uses these values. The nul-termination character is represented as 0x00, (aka zero). The key here is that chars are just like any other int to the machine.

Therefore, the while statement is saying:

While the char we are examining is valid (aka NOT zero and therefore NOT a nul-terminator), and its value (as the machine sees it) is less than 0x39 and its value is greater than 0x30, proceed.

The body of the while loop then calculates the appropriate value to add to the accumulator based on the integer's position in the string. It then increments the pointer and goes again. Once it's done, it returns the accumulated value.

This chunk of code is using ascii values to accumulate an integer tally of it's alpha equivalent.

In regards to your first numbered bullet, it seems quite trivial that when comparing anything the result is boolean. Although I feel like you were trying to ask if the compiler actually understands "characters". To my understanding though this comparison is done using the ascii values of the characters. i.e. a < b is interpreted as ( 97 < 98). (Note that it is also easy to see that ascii values are used when you compare 'a' and 'A', as 'A' is less than 'a')

Concerning your second bullet, it seems that the while loop is checking that there is in fact an assigned value that is not NULL (ascii value of 0). The and operator produces FALSE as soon as a false statement is encountered, so that you don't do comparison on a NULL char. As for the rest of the while loop, it is doing ascii comparison as I mentioned about bullet 1. It is just checking whether or not the given character corresponds to an ascii value that is related to a number. i.e. between '0' and '9' (or ascii: between 48 and 57)

LASTLY the (*ptr-'0') is the most interesting part in my opinion. This statement returns an integer between 0 and 9 inclusive. If you take a look at an ascii chart you will notice the numbers 0 through 9 are beside each other. So imagine '3'-'0' which is 51 - 48 and produces 3! :D So in simpler terms, it is doing ascii subtraction and returning the corresponding integer value. :D

Cheers, and I hope this explains a bit

Let's break it down:

if ( pStr )

If you pass atoi a null pointer, pStr will be 0x00 - and this will be false. Otherwise, we have something to parse.

while ( *pStr && *pStr <= '9' && *pStr >= '0' )

Ok, there's a bunch of things going on here. *pStr means we check if the value pStr is pointing to is 0x00 or not. If you look at an ASCII table, the ASCII for 0x00 is 'null' and in C/C++ the convention is that strings are null terminated (as opposed to Pascal and Java style strings, which tell you their length then have that many characters). So, when *pStr evaluates to false, our string has come to an end and we should stop.

*pStr <= '9' && *pStr >= '0' works because the values for the ASCII characters '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' are all contiguous - '0' is 0x30 and '9' is 0x39, for example. So, if pStr's pointed to value is outside this range, then we're not parsing an integer and we should stop.

iRetVal = (iRetVal * 10) + (*pStr - '0');

Because of the properties of ASCII numerals being contiguous in memory, it so happens that if we know we have a numeral, *pStr - '0' evaluates to its numerical value - 0 for '0' (0x30 - 0x30), 1 for '1' (0x31 - 0x30)... 9 for '9'. So we shift our number up and slide in the new place.

pStr++;

By adding one to the pointer, the pointer points to the next address in memory - the next character in the string we are converting to an integer.

Note that this function will screw up if the string is not null terminated, it has any non numerals (such as '-') or if it is non-ASCII in any way. It's not magic, it just relies on these things being true.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top