How do I access an individual character from an array of strings in c?

https://stackoverflow.com/questions/606879

03-07-2019
|

Question

Just trying to understand how to address a single character in an array of strings. Also, this of course will allow me to understand pointers to pointers subscripting in general. If I have char **a and I want to reach the 3rd character of the 2nd string, does this work: **((a+1)+2)? Seems like it should...

Solution

Almost, but not quite. The correct answer is:

*((*(a+1))+2)

because you need to first de-reference to one of the actual string pointers and then you to de-reference that selected string pointer down to the desired character. (Note that I added extra parenthesis for clarity in the order of operations there).

Alternatively, this expression:

a[1][2]

will also work!....and perhaps would be preferred because the intent of what you are trying to do is more self evident and the notation itself is more succinct. This form may not be immediately obvious to people new to the language, but understand that the reason the array notation works is because in C, an array indexing operation is really just shorthand for the equivalent pointer operation. ie: *(a+x) is same as a[x]. So, by extending that logic to the original question, there are two separate pointer de-referencing operations cascaded together whereby the expression a[x][y] is equivalent to the general form of *((*(a+x))+y).

OTHER TIPS

You don't have to use pointers.

int main(int argc, char **argv){

printf("The third character of argv[1] is [%c].\n", argv[1][2]);

}

Then:

$ ./main hello The third character of argv[1] is [l].

That's a one and an l.

You could use pointers if you want...

*(argv[1] +2)

or even

*((*(a+1))+2)

As someone pointed out above.

This is because array names are pointers.

Iirc, a string is actually an array of chars, so this should work:

a[1][2]

Theres a brilliant C programming explanation in the book Hacking the art of exploitation 2nd Edition by Jon Erickson which discusses pointers, strings, worth a mention for the programming explanation section alone https://leaksource.files.wordpress.com/2014/08/hacking-the-art-of-exploitation.pdf.

Although the question has already been answered, someone else wanting to know more may find the following highlights from Ericksons book useful to understand some of the structure behind the question.

Headers

Examples of header files available for variable manipulation you will probably use.

stdio.h - http://www.cplusplus.com/reference/cstdio/

stdlib.h - http://www.cplusplus.com/reference/cstdlib/

string.h - http://www.cplusplus.com/reference/cstring/

limits.h - http://www.cplusplus.com/reference/climits/

Functions

Examples of general purpose functions you will probably use.

malloc() - http://www.cplusplus.com/reference/cstdlib/malloc/

calloc() - http://www.cplusplus.com/reference/cstdlib/calloc/

strcpy() - http://www.cplusplus.com/reference/cstring/strcpy/

Memory

"A compiled program’s memory is divided into five segments: text, data, bss, heap, and stack. Each segment represents a special portion of memory that is set aside for a certain purpose. The text segment is also sometimes called the code segment. This is where the assembled machine language instructions of the program are located".

"The execution of instructions in this segment is nonlinear, thanks to the aforementioned high-level control structures and functions, which compile into branch, jump, and call instructions in assembly language. As a program executes, the EIP is set to the first instruction in the text segment. The processor then follows an execution loop that does the following:"

"1. Reads the instruction that EIP is pointing to"

"2. Adds the byte length of the instruction to EIP"

"3. Executes the instruction that was read in step 1"

"4. Goes back to step 1"

"Sometimes the instruction will be a jump or a call instruction, which changes the EIP to a different address of memory. The processor doesn’t care about the change, because it’s expecting the execution to be nonlinear anyway. If EIP is changed in step 3, the processor will just go back to step 1 and read the instruction found at the address of whatever EIP was changed to".

"Write permission is disabled in the text segment, as it is not used to store variables, only code. This prevents people from actually modifying the program code; any attempt to write to this segment of memory will cause the program to alert the user that something bad happened, and the program will be killed. Another advantage of this segment being read-only is that it can be shared among different copies of the program, allowing multiple executions of the program at the same time without any problems. It should also be noted that this memory segment has a fixed size, since nothing ever changes in it".

"The data and bss segments are used to store global and static program variables. The data segment is filled with the initialized global and static variables, while the bss segment is filled with their uninitialized counterparts. Although these segments are writable, they also have a fixed size. Remember that global variables persist, despite the functional context (like the variable j in the previous examples). Both global and static variables are able to persist because they are stored in their own memory segments".

"The heap segment is a segment of memory a programmer can directly control. Blocks of memory in this segment can be allocated and used for whatever the programmer might need. One notable point about the heap segment is that it isn’t of fixed size, so it can grow larger or smaller as needed".

"All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserve a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations. The heap will grow and shrink depending on how much memory is reserved for use. This means a programmer using the heap allocation functions can reserve and free memory on the fly. The growth of the heap moves downward toward higher memory addresses".

"The stack segment also has variable size and is used as a temporary scratch pad to store local function variables and context during function calls. This is what GDB’s backtrace command looks at. When a program calls a function, that function will have its own set of passed variables, and the function’s code will be at a different memory location in the text (or code) segment. Since the context and the EIP must change when a function is called, the stack is used to remember all of the passed variables, the location the EIP should return to after the function is finished, and all the local variables used by that function. All of this information is stored together on the stack in what is collectively called a stack frame. The stack contains many stack frames".

"In general computer science terms, a stack is an abstract data structure that is used frequently. It has first-in, last-out (FILO) ordering , which means the first item that is put into a stack is the last item to come out of it. Think of it as putting beads on a piece of string that has a knot on one end—you can’t get the first bead off until you have removed all the other beads. When an item is placed into a stack, it’s known as pushing, and when an item is removed from a stack, it’s called popping".

"As the name implies, the stack segment of memory is, in fact, a stack data structure, which contains stack frames. The ESP register is used to keep track of the address of the end of the stack, which is constantly changing as items are pushed into and popped off of it. Since this is very dynamic behavior, it makes sense that the stack is also not of a fixed size. Opposite to the dynamic growth of the heap, as the stack change s in size, it grows upward in a visual listing of memory, toward lower memory addresses".

"The FILO nature of a stack might seem odd, but since the stack is used to store context, it’s very useful. When a function is called, several things are pushed to the stack together in a stack frame. The EBP register—sometimes called the frame pointer (FP) or local base (LB) pointer —is used to reference local function variables in the current stack frame. Each stack frame contains the parameters to the function, its local variables, and two pointers that are necessary to put things back the way they were: the saved frame pointer (SFP) and the return address. The SFP is used to restore EBP to its previous value, and the return address is used to restore EIP to the next instruction found after the function call. This restores the functional context of the previous stack frame".

Strings

"In C, an array is simply a list of n elements of a specific data type. A 20-character array is simply 20 adjacent characters located in memory. Arrays are also referred to as buffers".

#include <stdio.h>

int main()
{
    char str_a[20];
    str_a[0] = 'H';
    str_a[1] = 'e';
    str_a[2] = 'l';
    str_a[3] = 'l';
    str_a[4] = 'o';
    str_a[5] = ',';
    str_a[6] = ' ';
    str_a[7] = 'w';
    str_a[8] = 'o';
    str_a[9] = 'r';
    str_a[10] = 'l';
    str_a[11] = 'd';
    str_a[12] = '!';
    str_a[13] = '\n';
    str_a[14] = 0;
    printf(str_a);
}

"In the preceding program, a 20-element character array is defined as str_a, and each element of the array is written to, one by one. Notice that the number begins at 0, as opposed to 1. Also notice that the last character is a 0".

"(This is also called a null byte.) The character array was defined, so 20 bytes are allocated for it, but only 12 of these bytes are actually used. The null byte Programming at the end is used as a delimiter character to tell any function that is dealing with the string to stop operations right there. The remaining extra bytes are just garbage and will be ignored. If a null byte is inserted in the fifth element of the character array, only the characters Hello would be printed by the printf() function".

"Since setting each character in a character array is painstaking and strings are used fairly often, a set of standard functions was created for string manipulation. For example, the strcpy() function will copy a string from a source to a destination, iterating through the source string and copying each byte to the destination (and stopping after it copies the null termination byte)".

"The order of the functions arguments is similar to Intel assembly syntax destination first and then source. The char_array.c program can be rewritten using strcpy() to accomplish the same thing using the string library. The next version of the char_array program shown below includes string.h since it uses a string function".

#include <stdio.h>
#include <string.h>

int main() 
{
    char str_a[20];
    strcpy(str_a, "Hello, world!\n");
    printf(str_a);
}

Find more information on C strings

http://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/CharacterStrings.html

http://www.tutorialspoint.com/cprogramming/c_strings.htm

Pointers

"The EIP register is a pointer that “points” to the current instruction during a programs execution by containing its memory address. The idea of pointers is used in C, also. Since the physical memory cannot actually be moved, the information in it must be copied. It can be very computationally expensive to copy large chunks of memory to be used by different functions or in different places. This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied. Pointers are a solution to this problem. Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory".

"Pointers in C can be defined and used like any other variable type. Since memory on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4 bytes). Pointers are defined by prepending an asterisk (*) to the variable name. Instead of defining a variable of that type, a pointer is defined as something that points to data of that type. The pointer.c program is an example of a pointer being used with the char data type, which is only 1byte in size".

#include <stdio.h>
#include <string.h>

int main() 
{
    char str_a[20]; // A 20-element character array
    char *pointer; // A pointer, meant for a character array
    char *pointer2; // And yet another one
    strcpy(str_a, "Hello, world!\n");
    pointer = str_a; // Set the first pointer to the start of the array.
    printf(pointer);
    pointer2 = pointer + 2; // Set the second one 2 bytes further in.
    printf(pointer2); // Print it.
    strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
    printf(pointer); // Print again.
}

"As the comments in the code indicate, the first pointer is set at the beginning of the character array. When the character array is referenced like this, it is actually a pointer itself. This is how this buffer was passed as a pointer to the printf() and strcpy() functions earlier. The second pointer is set to the first pointers address plus two, and then some things are printed (shown in the output below)".

reader@hacking:~/booksrc $ gcc -o pointer pointer.c
reader@hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader@hacking:~/booksrc $

"The address-of operator is often used in conjunction with pointers, since pointers contain memory addresses. The addressof.c program demonstrates the address-of operator being used to put the address of an integer variable into a pointer. This line is shown in bold below".

#include <stdio.h>

int main() 
{
    int int_var = 5;
    int *int_ptr;
    int_ptr = &int_var; // put the address of int_var into int_ptr
}

"An additional unary operator called the dereference operator exists for use with pointers. This operator will return the data found in the address the pointer is pointing to, instead of the address itself. It takes the form of an asterisk in front of the variable name, similar to the declaration of a pointer. Once again, the dereference operator exists both in GDB and in C".

"A few additions to the addressof.c code (shown in addressof2.c) will demonstrate all of these concepts. The added printf() functions use format parameters, which I’ll explain in the next section. For now, just focus on the programs output".

#include <stdio.h>

int main() 
{
    int int_var = 5;
    int *int_ptr;
    int_ptr = &int_var; // Put the address of int_var into int_ptr.
    printf("int_ptr = 0x%08x\n", int_ptr);
    printf("&int_ptr = 0x%08x\n", &int_ptr);
    printf("*int_ptr = 0x%08x\n\n", *int_ptr);
    printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
    printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n", &int_ptr, int_ptr, *int_ptr);
}

"When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing".

Find out more about Pointers & memory allocation

Professor Dan Hirschberg, Computer Science Department, University of California on computer memory https://www.ics.uci.edu/~dan/class/165/notes/memory.html

http://cslibrary.stanford.edu/106/

http://www.programiz.com/c-programming/c-dynamic-memory-allocation

Arrays

Theres a simple tutorial on multi-dimensional arrays by a chap named Alex Allain available here http://www.cprogramming.com/tutorial/c/lesson8.html

Theres information on arrays by a chap named Todd A Gibson available here http://www.augustcouncil.com/~tgibson/tutorial/arr.html

Iterate an Array

#include <stdio.h>

int main() 
{

    int i;
    char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
    int int_array[5] = {1, 2, 3, 4, 5};
    char *char_pointer;
    int *int_pointer;
    char_pointer = char_array;
    int_pointer = int_array;

    for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
        printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer);
        int_pointer = int_pointer + 1;
    }

    for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
        printf("[char pointer] points to %p, which contains the char '%c'\n", char_pointer, *char_pointer);
        char_pointer = char_pointer + 1;
    }

}

Linked Lists vs Arrays

Arrays are not the only option available, information on Linked List.

http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_linklist.aspx

Conclusion

This information was written simply to pass on some of what I have read throughout my research on the topic that might help others.

Quote from the wikipedia article on C pointers -

In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires that array[i] be equivalent to *(array + i). Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps), and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array can be declared and used in the following manner:

int array[5];      /* Declares 5 contiguous (per Plauger Standard C 1992) integers */
int *ptr = array;  /* Arrays can be used as pointers */
ptr[0] = 1;        /* Pointers can be indexed with array syntax */
*(array + 1) = 2;  /* Arrays can be dereferenced with pointer syntax */

So, in response to your question - yes, pointers to pointers can be used as an array without any kind of other declaration at all!

Try a[1][2]. Or *(*(a+1)+2).

Basically, array references are syntactic sugar for pointer dereferencing. a[2] is the same as a+2, and also the same as 2[a] (if you really want unreadable code). An array of strings is the same as a double pointer. So you can extract the second string using either a[1] or *(a+1). You can then find the third character in that string (call it 'b' for now) with either b[2] or *(b + 2). Substituting the original second string for 'b', we end up with either a[1][2] or *(*(a+1)+2).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow