Question

What would be an efficient way of converting a delimited string into an array of strings in C (not C++)? For example, I might have:

char *input = "valgrind --leak-check=yes --track-origins=yes ./a.out"

The source string will always have only a single space as the delimiter. And I would like a malloc'ed array of malloc'ed strings char *myarray[] such that:

myarray[0]=="valgrind"
myarray[1]=="--leak-check=yes"
...

Edit I have to assume that there are an arbitrary number of tokens in the inputString so I can't just limit it to 10 or something.

I've attempted a messy solution with strtok and a linked list I've implemented, but valgrind complained so much that I gave up.

(If you're wondering, this is for a basic Unix shell I'm trying to write.)

Was it helpful?

Solution

What's about something like:

char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);

char* curToken = strtok(string, " \t");

for (int i = 0; curToken != NULL; ++i)
{
  args[i] = strdup(curToken);
  curToken = strtok(NULL, " \t");
}

OTHER TIPS

if you have all of the input in input to begin with then you can never have more tokens than strlen(input). If you don't allow "" as a token, then you can never have more than strlen(input)/2 tokens. So unless input is huge you can safely write.

char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );

int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{ 
   myarray[++NumActualTokens] = pToken;
   input = skip_token(input);
}

char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));

As a further optimization, you can keep input around and just replace spaces with \0 and put pointers into the input buffer into myarray[]. No need for a separate malloc for each token unless for some reason you need to free them individually.

Were you remembering to malloc an extra byte for the terminating null that marks the end of string?

From the strsep(3) manpage on OSX:

   char **ap, *argv[10], *inputstring;

   for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
           if (**ap != '\0')
                   if (++ap >= &argv[10])
                           break;

Edited for arbitrary # of tokens:

char **ap, **argv, *inputstring;

int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
    if (**ap != '\0')
        if (++ap >= &argv[arglen])
        {
            arglen += 10;
            argv = realloc(argv, arglen);
            ap = &argv[arglen-10];
        }

Or something close to that. The above may not work, but if not it's not far off. Building a linked list would be more efficient than continually calling realloc, but that's really besides the point - the point is how best to make use of strsep.

Looking at the other answers, for a beginner in C, it would look complex due to the tight size of code, I thought I would put this in for a beginner, it might be easier to actually parse the string instead of using strtok...something like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char **parseInput(const char *str, int *nLen);
void resizeptr(char ***, int nLen);

int main(int argc, char **argv){
    int maxLen = 0;
    int i = 0;
    char **ptr = NULL;
    char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out";
    ptr = parseInput(str, &maxLen);
    if (!ptr) printf("Error!\n");
    else{
        for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]);
    }
    for (i = 0; i < maxLen; i++) free(ptr[i]);
    free(ptr);
    return 0;
}

char **parseInput(const char *str, int *Index){
    char **pStr = NULL;
    char *ptr = (char *)str;
    int charPos = 0, indx = 0;
    while (ptr++ && *ptr){
        if (!isspace(*ptr) && *ptr) charPos++;
        else{
            resizeptr(&ptr, ++indx);
            pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
            if (!pStr[indx-1]) return NULL;
            strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
            pStr[indx-1][charPos+1]='\0';
            charPos = 0;
        }
    }
    if (charPos > 0){
        resizeptr(&pStr, ++indx);
        pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
        if (!pStr[indx-1]) return NULL;
        strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
        pStr[indx-1][charPos+1]='\0';
    }
    *Index = indx;
    return (char **)pStr;
}

void resizeptr(char ***ptr, int nLen){
    if (*(ptr) == (char **)NULL){
        *(ptr) = (char **)malloc(nLen * sizeof(char*));
        if (!*(ptr)) perror("error!");
    }else{
        char **tmp = (char **)realloc(*(ptr),nLen);
        if (!tmp) perror("error!");
        *(ptr) = tmp;
    }
}

I slightly modified the code to make it easier. The only string function that I used was strncpy..sure it is a bit long-winded but it does reallocate the array of strings dynamically instead of using a hard-coded MAX_ARGS, which means that the double pointer is already hogging up memory when only 3 or 4 would do, also which would make the memory usage efficient and tiny, by using realloc, the simple parsing is covered by employing isspace, as it iterates using the pointer. When a space is encountered, it reallocates the double pointer, and malloc the offset to hold the string.

Notice how the triple pointers are used in the resizeptr function.. in fact, I thought this would serve an excellent example of a simple C program, pointers, realloc, malloc, passing-by-reference, basic element of parsing a string...

Hope this helps, Best regards, Tom.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top