gettoken function - unclear procedure (K&R)
-
26-06-2021 - |
Question
I have several questions so I'll put numbers in comments so that the questioned line is easier to find.
[1] How can char *p be assigned to token variable which in fact doesn't exist?
[2] Why we don't put '\0' here, what is done in every other if condition?
[3] Why we copy () into token string only? And we don't do that in the case of [] and alphanumeric chars ?
[4] Those return commands are weird IMO -> first: why does it not look just like this return PARENS, and second: when it returns tokentype = '(' it is a char so why the function gettoken is declared as returning integers ?
[5] SUPPOSING part: let input be ( a b c ) then: ( causes function to return tokentype '(' a b c enter if condition (isalpha(C)) and the last ) exits that condition causing ungetch. Does it go along main else condition then? Is my walkthrough correct?
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
void dcl(void);
void dirdcl(void);
int gettoken(void);
int tokentype; /*type of last token ALSO [4] !!! */
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
strcpy(datatype, token); /* is the datatype */
out[0] = '\0';
dcl(); /* parse rest of line */
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token; /* [1] */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()"); /* [2][3] */
return tokentype = PARENS; /* [4] */
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
for (*p++ = c; isalnum(c = getch()); ) /* SUPPOSING [5] */
*p++ = c;
*p = '\0';
ungetch(c);
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
} else if (tokentype == NAME) /* variable name */
strcpy(name, token);
else
printf("error: expected name or (dcl)\n");
while ((type=gettoken()) == PARENS || type == BRACKETS)
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
THANKS IN ADVANCE!
Solution
1) token
does exist, but it's a global variable defined as char token[MAXTOKEN];
2) strcpy()
copies the terminating 0 byte from the source, so we don't need to do it manually
3) that appears to be special case handling for the literal string ()
- some parentheses with nothing in between, as opposed to handling the case where we have ( some stuff )
4) in accordance with (3), PARENS looks like the token type for an empty set of parentheses, while returning (
and )
separately as specific tokentype
s is the case for when we have something between them
5) not sure I follow what you're asking, but, since there doesn't appear to be a special case for the closing parenthesis, it appears that it takes the final else
branch, returning tokentype = c