Dev Notes

Software Development Resources by David Egan.

Pointer Arithmetic & Multiple Dereferencing in C


C
David Egan

It’s useful to really understand pointer arithmetic in C - though in most situations you’d probably be better off using the array subscript operator ([]).

The * dereferencing operator has precedence over the addition + operator.

For the following examples, we’ll use multiple indirection to create an abstract data structure doc to represent this string:

char str[] = "hello world. abc def.\nxyz";

…as a multidimensional array. The string could be represented as a document doc whereby:

  • doc: contains multiple paragraphs - sequences of characters delimited by newline (‘\n’) characters
  • paragraph: contains multiple sentences - sequences of characters delimited by fullstop (‘.’) characters
  • sentence: contains multple words -sequences of characters delimited by space (‘ ‘) characters
  • word: contains multiple characters, with no spaces and terminated by a NUL character

The underlying data type in this example is a char. This means:

  • Words are pointers to chars (an array of chars): char *
  • Sentences are pointers to words (an array of arrays): char **
  • Paragraphs are pointers to sentences (an array of arrays of arrays): char ***
  • The Document is a pointer to paragraph(s) - an array of arrays of arrays of arrays: char****.

If you’re trying to wrap your head around multiple indirection, I strongly recommend:

  • Creating a simple programme with a data structure like the one shown below.
  • Run the programme with GDB, and attempt to dereference the various elements, as shown at the end of this article.

Data Structure

If you’re more used to using the array index operator [], you can think of the data structure like this:

char *doc[2][5][10]

// alternatively, allocate space dynamically:
// char ****doc = NULL;

// Assume space has been allocated:
doc[0][0][0] = "hello";
doc[0][0][1] = "world";
doc[0][1][0] = "abc";
doc[0][1][1] = "def";
doc[1][0][0] = "xyz";

Resulting Data structure:

char *doc[][10][20] =
// Document
{
	// Paragraph 0
	{
		// Sentence 0
		{"hello", "world"},
		// Sentence 1
		{"abc", "def"}
	},
	// Paragraph 1
	{
		// Sentence 0
		{"xyz"}
	}
};

Dereferencing

Example

*(**doc + 1))
  • doc: This variable holds the address of the first highest level array - a pointer to first (zero index) paragraph
  • *doc: Dereference doc to get a pointer to the first sentence in the first paragraph.
  • **doc: Dereference this to get a pointer to the first word in the first sentence of the first paragraph. This happens next because * has precedence over +.
  • **doc + 1: Add 1 to this to move the pointer to the second word of the sentence.
  • *(**doc + 1): Dereference this to get a printable string - a pointer to a null terminated array of chars.
printf("%s\n", *(**doc + 1));
// Output: "world"

Example

*(*(*doc + 1) + 1)
  • *doc + 1: Dereference doc to get a pointer to first sentence in paragraph 0, then add 1 to get a pointer to sentence 2.
  • *(*doc + 1) + 1: Dereference the pointer to sentence to get a pointer to first word, add 1 to get a pointer to second word.
  • *(*(*doc + 1) + 1): Dereference this to print the string.
printf("%s\n", *(*(*doc + 1) + 1));
// Output: "def"

General Dereferencing Heuristic

//                          para index 0
//                          |
//                          |    sentence index 1
//                          |    |
//                          |    |    word index 0
//                          |    |    |
//                          |    |    |    character index 1
//                          ↓    ↓    ↓    ↓ 
char letter = *(*(*(*(doc + 0) + 1) + 0) + 1);
// equivalent:
char letter = doc[0][1][0][1];

More Examples

// GDB output
>>> p **(*(*doc+1)+1)
$112 = 100 'd'

>>> p *(*(*(*doc+1)+1)+1)
$113 = 101 'e'

>>> p *(*(*(*doc+1)+1)+2)
$114 = 102 'f'
>>> 
char ****doc;

doc[0][0][0] == "hello"
doc[0][0][1] == "world"
doc[0][1][0] == "abc"
doc[0][1][1] == "def"

// GDB output
>>> p *(*(*doc+0)+0)
$117 = 0x602680 "hello"

>>> p *(*(*doc+0)+1)
$118 = 0x6026a0 "world"
  • *doc: Dereference doc once, giving a pointer to the first sentence in the first paragraph.
  • *doc + 1: Add 1 to this value - providing a pointer to the second sentence.
  • *(*doc + 1): Dereference this, giving a sentence (an array of words).
  • *(*doc + 1) + 0: Add zero to pointer - which still points to the first array of words (sentence).
  • *(*(*doc + 1) + 0): Dereference this to print, generating tdoche first word in the second sentence of the first paragraph
// GDB output
>>> p *(*(*doc+1)+0)
$119 = 0x6026c0 "abc"


>>> p *(*(*doc+1)+1)
$120 = 0x6026e0 "def"


comments powered by Disqus