Dev Notes

Software Development Resources by David Egan.

Double Pointers and Linked List in C


C, Pointers
David Egan

This article will (hopefully) demonstrate the usefulness of double pointers (multiple indirection) in manipulating linked list elements efficiently.

Intro to Linked Lists

A linked list is an abstract data structure that is made up of a collection of nodes (or elements). Lists nodes are accessed by means of sequential access - they are accessed in an ordered/predetermined sequence.

List objects can be stored anywhere in memory - they do not need to be next to one another. This makes it easy to insert new elements, but has the disadvantage of requiring sequential access when accessing values.

Linked List Using Double Pointers

For the current example, nodes are dynamically allocated (using malloc()) and each node consists of a data field and a reference (a pointer) to the next node in the list. In the case of the last node in the list, the next field contains NULL - it is set as a null pointer.

Our nodes are defined with a C struct - for the purposes of this exercise, nodes are defined as NODE using typedef. The data field of the node consists of a single char *name member variable:

// list.h
typedef struct node
{
    char *name;
    struct node *next;
} NODE;

After this typedef declaration, NODE *head defines a variable head which is a pointer to a node struct pointer.

For the sake of simplicity this example will consider two actions:

  • prepend(): add a node to the start (head) of the list
  • append(): add a node at the end of the list

Pointers

A pointer is a variable that stores the memory address of another variable. In C, you need to define the type of the object that is being pointed at.

For example: int a = 42; int *pNum = &a;.

The variable pNum now contains the address of a, which is the memory location of an object having the data type integer. Dereferencing the pointer variable *pNum instructs the programme to return the integer value that is held at the memory address returned by pNum.

NOTE: you shouldn’t try to dereference a null pointer in C - this produces undefined behaviour.

Initialise the Linked List

The list is initialised by creating a NODE *head which is set to NULL. The variable head is now a pointer to NULL, but as NODEs are added to the list head will be set to point to the first NODE. In this way, head becomes the access point for sequential access to the list.

Adding Nodes

As the first node is added, the head pointer is amended to point to the new node, and the next pointer of the new node points to NULL. In the case of the first node, append() and prepend() have exactly the same effect.

In each case, a node needs to be created. To achieve this:

// Function to create a node and return a pointer to the node.
NODE *createNode(char *name)
{
    NODE *p = malloc(sizeof(NODE));
    p->name = malloc(strlen(name) + 1);
    strcpy(p->name, name);
    p->next = NULL; // This will be set on insertion
    return p;
}

Why Double Pointers?

Arguments are always passed to functions by value in C. In other words, when C passes control to a function, a copy of each argument is made and this copy is passed to the function - leaving the original variable unchanged.

In order to amend a variable in a calling function from within a called function, you need to pass a pointer to the variable as a function argument. This provides the called function with the memory address of the variable to be amended. Dereferencing this (pointer) variable within the called function provides access to the original value.

Simplified example:

// Function argument is the memory address of input, NOT the actual variable
void doubleInput(int *input)
{
    // Dereference the pointer to provide access:
    // *input refers the the value held by the memory address input.
    // By dereferencing the address, we can directly amend what is stored there
    // from within the function.
    *input = *input + *input;
}

in main()
{
    int myNum = 5;
    printf("%d", myNum); // Outputs 5
    void doubleInput(&myNum); // Pass the address of myNum to doubleInput()
    printf("%d", myNum); // Outputs 10
    return 0;
}

Prepend

In the function definition for prepend() shown below, the first parameter is NODE **head. This specifies that the function receives a variable with the pointer to pointer variable type NODE **. In other words, the function receives an address of a variable which is in turn a pointer to a NODE- in this case, the function argument is the address of the variable NODE *head which is a pointer that points to either the first node in the list or to NULL.

If a new node is added to the head of the list (by means of a prepend() action), the head pointer is amended to point to the new node and the next field of the new node refers to the address of the second (previously first) node.

This is the easier function to understand:

// The caller
NODE *head = malloc(sizeof(NODE));
head = NULL;

// Prepend node with the name "Goldfish"
prependNode(&head, createNode("Goldfish"));

void prependNode(NODE **head, NODE *newNode)
{
    newNode->next = *head;
    *head = newNode;
}

In the case of the new node being the first node in the list, head == NULL so the assignment newNode->next = *head correctly sets the next field of the last node (in this case, also the first node) to NULL.

If the list already contains nodes, the head variable holds a pointer to a pointer to the existing first node. Note that the address of head is passed into the prepend() function from the caller.

Because we want to insert the new node before the existing first node, the next field of the new node should point to the address of the existing first node. The declaration newNode->next = *head correctly makes this assignment.

Setting the new head reference is where double pointers become useful. Because we pass in the address of the head variable rather than the variable itself, we can access (and amend) it’s value from within the prepend() function. When the *head is set to the address of the new node from within prepend() we are saying:

“Dereference **head once to get the memory address that is held by head in the calling function, and set this address to the address of the new node.”

Append Node

If a new node is added to the end of the list (by means of append(), the next field of the previously last node is amended to point to the memory address of the new node. If the appended node is also the first node in the list, then the head pointer must be amended to point to the new node.

In this case, the challenge is to efficiently find the last node, whilst accounting for the possibility that this may be the first node in the list. Note that in any case the next field of the new node points to null.

void append(NODE **headRef, NODE *newNode)
{
    NODE **tracer = headRef;
    while (*tracer) {
        tracer = &(*tracer)->next;
    }
    newNode->next = *tracer;
    *tracer = newNode;
}

Create a tracer and set this initially to hold the memory address of head. Set tracer to the address of the pointer to the next NODE. In the case of an empty list or the last NODE, this value will be NULL.

NOTE: *tracer dereferences tracer once, so it refers to a pointer to a NODE.

The arrow operator can then be used to indirectly access the member variable. The following are equivalent:

  • tracer = &(*tracer)->next;
  • tracer = &(*(*tracer)).next;
  • tracer = &(**tracer).next;

Note that tracer must be a pointer to a pointer to a NODE - it must contain the memory address of a pointer to a NODE (i.e. NODE *).

After the loop, *tracer will be a pointer to NULL even if the list is empty. *tracer now refers to the pointer to the next node of the last node. Currently NULL, the next line refers it to the new NODE.

References


comments powered by Disqus