Question

I am trying to make this shell parse. How do I make the program implement parsing in a way so that commands that are in quotes will be parsed based on the starting and ending quotes and will consider it as one token? During the second while loop where I am printing out the tokens I think I need to put some sort of if statement, but I am not too sure. Any feedback/suggestions are greatly appreciated.

#include <stdio.h>               //printf
#include <unistd.h>              //isatty
#include <string.h>              //strlen,sizeof,strtok

int main(int argc, char **argv[]){

    int MaxLength = 1024;         //size of buffer
    int inloop = 1;               //loop runs forever while 1
    char buffer[MaxLength];       //buffer
    bzero(buffer,sizeof(buffer)); //zeros out the buffer
    char *command;                //character pointer of strings
    char *token;                  //tokens
    const char s[] = "-,+,|, ";

    /* part 1 isatty */
    if (isatty(0))
    {

        while(inloop ==1)                // check if the standard input is from terminal
        {
            printf("$");
            command = fgets(buffer,sizeof(buffer),stdin);  //fgets(string of char pointer,size of,input from where
            token =  strtok(command,s);

            while (token !=NULL){

                printf( " %s\n",token);

                token = strtok(NULL, s);       //checks for elements       
            }


            if(strcmp(command,"exit\n")==0)
                inloop =0;

        }      

    }
    else 
        printf("the standard input is NOT from a terminal\n");

    return 0;
}
Was it helpful?

Solution

For an arbitrary command-line syntax, strtok is not the best function. It works for simple cases, where the words are delimited by special characters or white space, but there will come a time where you want to split something like this ls>out into three tokens. strtok can't handle this, because it needs to place its terminating zeros somewhere.

Here's a quick and dirty custom command-line parser:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>

int error(const char *msg)
{
    printf("Error: %s\n", msg);
    return -1;
}

int token(const char *begin, const char *end)
{
    printf("'%.*s'\n", end - begin, begin);
    return 1;
}

int parse(const char *cmd)
{
    const char *p = cmd;
    int count = 0;

    for (;;) {
        while (isspace(*p)) p++;
        if (*p == '\0') break;

        if (*p == '"' || *p == '\'') {
            int quote = *p++;
            const char *begin = p;

            while (*p && *p != quote) p++;
            if (*p == '\0') return error("Unmachted quote");
            count += token(begin, p);
            p++;
            continue;
        }

        if (strchr("<>()|", *p)) {
            count += token(p, p + 1);
            p++;
            continue;
        }

        if (isalnum(*p)) {
            const char *begin = p;

            while (isalnum(*p)) p++;
            count += token(begin, p);
            continue;
        }

        return error("Illegal character");
    }

    return count;
}

This code understands words separated by white-space, words separated by single or double quotation marks and single-character operators. It doesn't understand escaped quotation marks inside quotes and non-alphanumeric characters such as the dot in words.

The code is not hard to understand and you can extend it easily to understand double-char operators such as >> or comments.

If you want to escape quotation marks, you'll have to recognise the escape character in parse and unescape it and possible other escape sequences in token.

OTHER TIPS

First, you've declared argv to be an array of pointers to... pointers. In fact, it is an array of pointers to chars. So:

int main(int argc, char **argv){

The trend is you want to reach for [], which got you into incorrect code here, but the idiom in C/C++ is more commonly to use pointer syntax, e.g.:

const char* s = "-+| ";

FWIW. Also, note that fgets() will return NULL when it hits end of file (e.g., the user types CTRL-D on *nix or CTRL-Z on DOS/Windows). You probably don't want a segment violation when that happens.

Also, bzero() is a nonportable function (you probably don't care in this context) and the C compiler will happily initialize an array to zeroes for you if you ask it to (possibly worth caring about; syntax demonstrated below).

Next, as soon as you allow quoted strings, the next language question that immediately arises is: "how do I quote a quote?". Then, you are immediately out of the territory that can be handled cleanly with strtok(). I'm not 100% sure how you want to break your string into tokens. Using strtok() in the way you do, I think the string "a|b" would produce two tokens, "a" and "b", making you overlook the "|". You're treating "|" and "-" and "+" like whitespace, to be ignored, which is not generally what a shell does. For example, given this command-line:

echo 'This isn''t so hard' | cp -n foo.h .. >foo.out

I would probably want to get the following list of tokens:

echo
'This isn''t so hard'
|
cp
-n
foo.h
..
>
foo.out

Usually, characters like '+' and '-' are not special for most shells' tokenizing process (unlike '|' and '&' and '<', etc. which are instructions to the shell that the spawned command never sees). They get passed onto the application that is then free to decide "'-' indicates this word is an option and not a filename" or whatever.

What follows is a version of your code that produces the output I described (which may or may not be exactly what you want) and allows either double or single-quoted arguments (trivial to extend to handle back-ticks too) that can contain quote marks of the same kind, etc.

#include <stdio.h>               //printf
#include <unistd.h>              //isatty
#include <string.h>              //strlen,sizeof,strtok

#define MAXLENGTH 1024

int main(int argc, char **argv[]){

    int inloop = 1;               //loop runs forever while 1
    char buffer[MAXLENGTH] = {'\0'};       //compiler inits entire array to NUL bytes
//    bzero(buffer,sizeof(buffer)); //zeros out the buffer
    char *command;                //character pointer of strings
    char *token;                  //tokens
    char* rover;
    const char* StopChars = "|&<> ";
    size_t toklen;

    /* part 1 isatty */
    if (isatty(0))
    {

        while(inloop ==1)                // check if the standard input is from terminal
        {
            printf("$");
            token = command = fgets(buffer,sizeof(buffer),stdin);  //fgets(string of char pointer,size of,input from where
            if(command)
                while(*token)
                    {
                    // skip leading whitespace
                    while(*token == ' ')
                        ++token;
                    rover   = token;
                    // if possible quoted string
                    if(*rover == '\'' || *rover == '\"')
                        {
                        char Quote = *rover++;
                        while(*rover)
                            if(*rover != Quote)
                                ++rover;
                            else if(rover[1] == Quote)
                                rover += 2;
                            else
                                {
                                ++rover;
                                break;
                                }
                        }
                    // else if special-meaning character token
                    else if(strchr(StopChars, *rover))
                        ++rover;
                    // else generic token
                    else
                        while(*rover)
                            if(strchr(StopChars, *rover))
                                break;
                            else
                                ++rover;
                    toklen  = (size_t)(rover-token);
                    if(toklen)
                        printf(" %*.*s\n", toklen, toklen, token);
                    token   = rover;
                    }
            if(strcmp(command,"exit\n")==0)
                inloop =0;
        }      

    }
    else 
        printf("the standard input is NOT from a terminal\n");

    return 0;
}

Regarding your specific request: commands that are in quotes will be parsed based on the starting and ending quotes.

You can use strtok() by tokenizing on the " character. Here's how:

char a[]={"\"this is a set\" this is not"};  
char *buf;
buf = strtok(a, "\"");  

In that code snippet, buf will contain "this is a set"

Note the use of \ allowing the " character to used as a token delimiter.

Also, Not your main issue, but you need to:

Change this:

const char s[] = "-,+,|, ";  //strtok will parse on -,+| and a " " (space)

To:

const char s[] = "-+| ";  //strtok will parse on only -+| and a " " (space)

strtok() will parse out whatever you have in the delimiter string, including ","

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top