Question

I've been banging my head in the wall for too many hours now and I need your help. In my assignment I am supposed to write a function that splits a string into tokens separated by space. These tokens are copied to dynamically allocated array of strings. The string is passed as a parameter, and the second parameter is a pointer variable to the string array (char ***argv). I'm having a hard time to understand how to handle this triple dimensional array and how to dynamically allocate it. Below is the relevant code:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{

char **args = NULL;
char cmdline[] = "cmdline -s 20 -r -t parameter -p 20 filename";
int count = parse_cmdline(&args, cmdline);

And this is what I've come up with:

 #include <stdlib.h>
 #include <string.h>

 /* Parses a string into tokens (command line parameters) separated by space
 * Builds a dynamically allocated array of strings. Pointer to the array is
 * stored in variable pointed by argv.
 * 
 * Parameters:
 * argv: pointer to the variable that will store the string array
 * input: the string to be parsed (the original string can be modified, if needed)
 * 
 * Returns:
 * number of space-separated tokens in the string */
int parse_cmdline(char ***argv, char *input)
{
    int i=0;
    char *token=strtok(input," ");
    while (token!=NULL) {
        *argv=realloc(*argv,(i+1)*sizeof(char*));
        *argv[i]=malloc(sizeof(token));
        memcpy(*argv[i],token,sizeof(token));
        i++;
        token=strtok(NULL," ");
    }
    return i;
}

Valgrind gives this output:

==358== Use of uninitialised value of size 8
==358==    at 0x40263B: parse_cmdline (cmdline.c:21)
==358==    by 0x40155E: test_parse_cmdline (test_source.c:19)
==358==    by 0x405670: srunner_run_all (in /tmc/test/test)
==358==    by 0x40221E: tmc_run_tests (tmc-check.c:121)
==358==    by 0x401ED7: main (test_source.c:133)
==358==  Uninitialised value was created by a stack allocation
==358==    at 0x401454: test_parse_cmdline (test_source.c:10)
==358== 
==358== Invalid write of size 8
==358==    at 0x40263B: parse_cmdline (cmdline.c:21)
==358==    by 0x40155E: test_parse_cmdline (test_source.c:19)
==358==    by 0x405670: srunner_run_all (in /tmc/test/test)
==358==    by 0x40221E: tmc_run_tests (tmc-check.c:121)
==358==    by 0x401ED7: main (test_source.c:133)
==358==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==358== 
==358== 
==358== Process terminating with default action of signal 11 (SIGSEGV)
==358==  Access not within mapped region at address 0x0
==358==    at 0x40263B: parse_cmdline (cmdline.c:21)
==358==    by 0x40155E: test_parse_cmdline (test_source.c:19)
==358==    by 0x405670: srunner_run_all (in /tmc/test/test)
==358==    by 0x40221E: tmc_run_tests (tmc-check.c:121)
==358==    by 0x401ED7: main (test_source.c:133)
==358==  If you believe this happened as a result of a stack
==358==  overflow in your program's main thread (unlikely but
==358==  possible), you can try to increase the size of the
==358==  main thread stack using the --main-stacksize= flag.
==358==  The main thread stack size used in this run was 8388608.

I've read over and over about pointers,strings,arrays and multidimensional arrays but I just can't seem to grasp it. One thing i really don't understand is that why is the pointer passed as (&args), why not just pass it as a pointer of array? I'm not sure I'm using memcpy right either.

Était-ce utile?

La solution

Beside the bad use of sizeof instead of strlen or the like lets first look at a couple of the other issues.


Why &args?


One thing i really don't understand is that why is the pointer passed as (&args), why not just pass it as a pointer of array?

Look at your statement:

char **args = NULL;

What does this tell you?

A.) char **args

1. char   args  =>  args is char.
2. char  *args  =>  args is pointer to char.
3. char **args  =>  args is pointer to pointer to char.

B.) char **args = NULL;

It is initialized and it points to NULL. In your function you update what it points to from NULL to a new address returned by realloc.

And here is a very important point: You update what it points to! If you do not pass the address of args you have no way of telling main where the new location is.

You work on a copy of what is passed.

If it was pointing to some address, e.g. from a previous malloc, you could update what it points to, but you could not realloc as you would have no way to update the pointer itself. That is: back in main it would still point to the old location.

Your goal is something like:

  args          *            *
0x123e0a -> [0] 0x123fa00 -> 0x12fbae0 - 0x12fbae8 cmdline
            [1] 0x123fa08 -> 0x12fbae9 - 0x12fbaec -s
            [2] 0x123fa10 -> 0x12fcae0 - 0x12fcae3 20
            [3] 0x123fa18 -> 0x12fbad8 - 0x12fbadb -r
            ...

A simpler variant

If we look at a simpler variant first:

1. char  args              => args is char.
2. char *args              => args is pointer to char.
3. char *args = NULL       => args is pointer to char, pointing to NULL.
4. char *args = malloc(9); => args is pointer to char, pointing to some address.
                              returned by malloc().

That is: args has an address in all of the steps as do any variable. You can always say:

printf("Address of args=%p\n", (void*)&args);

The difference between 2., 3. and 4. above is that

  • At point 2. the variable does not point to anything. It is uninitialized.1
  • At point 3. it is initialized, however it does not point to an address but NULL. Finally
  • At point 4. it has the address of location returned by malloc.

1. That is unless it is static, global etc. at which it will be NULL.


Now we can look at char **args = NULL.

1. char **args;
2. char **args = NULL;
3. char **args = malloc(1);
4. char **args = malloc(1);
   args[0] = malloc(9);
         Address
1. args: 0xabc00 /-> Nowhere /-> Nowhere
2. args: 0xabc00  ->    NULL   /-> Nowhere
3. args: 0xabc00  ->  0xcf324 /-> Nowhere
4. args: 0xabc00  ->  0xcf324  -> 0xedf00 - 0xedf09

Reallocating

Now, if you want some function to manipulate what ever args point to what would you give to that function? You need to pass the address of what args point to.

What if you also want to change where it points as well as what it points to? You need to pass the address of the pointer itself. This so you can say: "args point here instead of there."

  • Stage 1: ptr point to A
  • Stage 2: ptr point to B

Again: As you do realloc() on args you need to update args with the new address to point to.

I'm having a hard time to understand how to handle this triple dimensional array and how to dynamically allocate it.

It is not a "triple dimensional array". It is a pointer to a pointer to a pointer. Or perhaps simpler in light of context: it is a pointer to a double pointer. Or pointer to a two dimensional array if you like.

char **args = NULL;
         |
         +-----> 0x123400 -> NULL
     parse_cmdline(&args, cmdline);
                    |
                    +-----> 0x123400 -> NULL

inside parse_cmdline():

 int parse_cmdline(char ***argv, const char* cmd)

 &argv => Address of the argument.
  argv => 0x123400 (Same address as in main)
 *argv => NULL     (Points to NULL)

 /* Inside loop: */

 /* First round: */
 *argv = realloc(*argv, (i + 1) * sizeof(char*));
   |               |
   |               +--------> Initial address NULL
   +------------------------> Updated address 0x1234af

 /* Second round */
 *argv = realloc(*argv, (i + 1) * sizeof(char*));
   |               |
   |               +--------> Old address     0x1234af
   +------------------------> Updated address 0x1234bd

 /* ... */

Back in main args still has address 0x123400, but it no longer points to NULL but address of last realloc(), for example 0x1234bd.


Accessing pointer to pointer-pointer array ...


Now the second issue is how to correctly access the elements of the array, given that you have a pointer to a pointer-pointer variable and not a pointer-pointer variable.

ptr->args vs args

Inside your main function you would access args by:

printf("%s\n", args[0]);
printf("%s\n", args[1]);

etc.

However inside parse_cmdline() there are some quirks. Firstly look at this:

*argv

Here you use * to dereference argv to get to args from main. So far so good. But then you say:

*argv[i]

This will go bad, and to understand why, you have to look at C's Precedence Table. As you can see [], array subscript, has a higher precedence then *. As a result what you are indexing is argv and not what it points to, then you try to dereference the indexed address. Something like:

 foo = argv[i];
*foo = malloc(...);

To solve this you encapsulate the dereferencing into parenthesis and index the result of that:

(*argv)[i] = malloc(...);
/* and */
memcpy((*argv)[i], token, strlen(token) + 1);

It might be easier to think of it in terms of simple math:

12 + 4 / 2

You want to divide the sum of 12 and 4 by 2. As / has higher precedence you need to add parenthesis:

(12 + 4) / 2

sizeof


Finally, as mentioned in comment, the sizeof does not give you the length of the string. In your code it will give you the size of a char pointer. As in: size of a pointer.

char foo[32] = "ab";
int bar[64];
char *baz;

sizeof foo => 32
sizeof bar => 64 * size of int e.g. 64 * 8 = 512
sizeof baz =>  8 (or 4 or what ever the size of a pointer is in 
                  current environment.)

Use strlen or the like. You could also make your own strlen for educational purposes. When you copy remember to also copy the terminating null byte. That is:

memcpy((*argv)[i], token, strlen(token) + 1);
                                          |
                                          +----- Include 0x00 at end of token.

the strtok() function replaces delimiter with 0x00 so you do not need manually add it. That is also why you can use strlen(). Without terminating null there would be no way unless you knew the length beforehand.


Work it


This is a typical topic where you simply need to work and work on it until it sinks inn. Though yes, there is simpler ways to solve it, e.g. by returning the array instead of updating it by ref. etc, it is educational and forces you to learn about pointers, arguments etc. on a level where one learn a lot by making it work.

Use printf() to help you. Print a lot. Print length of strings, size of variables etc. And in this case print addresses. For example in main:

printf("args=%p\n", (void*)&args);

If you get stuck, then step back. Try passing args instead of &args and work on that. Also try to have a temporary pointer inside your function:

int parse_cmdline(char ***argv, char *input)
{
        int i = 0;
        char **argx = *argv; /* Temporary local. */
        char *token = strtok(input, " ");

        fprintf(stderr, "argv=%p\n", (void*)argv);

        while (token != NULL) {
                fprintf(stderr,
                        "%15s, sizeof=%u strlen=%u\n",
                        token, sizeof(token), strlen(token)
                );
                /* Now you can refer argx like you would args in main(). 
                 * as in: argx[i] instead of (*argv)[i] etc.
                 */
        }
        /* Update pointer before return. */
        *argv = argx;
        return i;
}

Autres conseils

#include <stdio.h>
#include <stdlib.h>

int parse_cmdline(char ***argv, char *input);

int main(void){
    char **args = NULL;
    char cmdline[] = "cmdline -s 20 -r -t parameter -p 20 filename";
    int count = parse_cmdline(&args, cmdline);
    for(int i = 0;i<count;++i)
        printf("%s\n", args[i]);
    printf("\n");
    return 0;
}

#include <string.h>

int parse_cmdline(char ***argv, char *input){
    int i=0;
    char *token=strtok(input," ");

    while (token!=NULL) {
        int len = strlen(token);
        *argv=realloc(*argv,(i+1)*sizeof(char*));
        (*argv)[i]=malloc(len+1);
        strcpy((*argv)[i++],token);
        token=strtok(NULL," ");
    }
    return i;
}
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top