Dev Notes

Software Development Resources by David Egan.

Strings in C


C, Strings
David Egan

In C, strings are arrays of characters terminated by an ASCII NUL character '\0'.

A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.

C11 ISO/IEC 9899:2011 Definition of string (p 180)

String Initialisation

Strings can be declared as an array of characters or as a pointer to a character. In either case, a character string literal can be used to initialise the value.

String literals are a collection of characters surrounded by double quotes. The terminating ‘\0’ character is implicit.

String Declared as Character Array

For example:

// Declare an empty string, big enough for 5 characters
char name[6];

An array declared to have 6 characters in this way can only store 5 actual characters - one element must be reserved for the termination character.

A string can also be declared by initialising with a string literal:

char name[] = "Alice";

This defines an array of char objects, whose elements are initialised with a character string literal (with an implicit null terminating character).

In a similar way, you can also initialise with a comma separated list - note that you need to explicitly add the null character in this case:

char name[6] = {'A', 'l', 'i', 'c', 'e', '\0'};

Char Array Objects are Mutable

String literals defined as arrays (char name[] = "Mary") are stored in the stack can be modified. If a string literal was declared as a char array, you can dereference and amend elements.

Change the value stored in name from “Alice” to “Alina”:

// Amending a character array defined string is OK
// -----------------------------------------------
char name[] = "Alice";
name[3] = 'n';
name[4] = 'a';

// Modifying a string defined as a character pointer is NOT OK
// The following will compile but results in undefined behaviour.
// The result will probably be a segmentation fault
// ------------------------------------------------
char *name2 = "David";
name2[3] = 'e'; // DON'T DO THIS IF NAME IS DEFINED AS A CHARACTER POINTER
name2[4] = '\0';

String Declared as Pointer to Char

You can also define a string as a pointer to a char, initialised by a string literal. In this case, string literals are stored in a read only section of memory and are effectively constant. For example:

char *name = "Bob"

In this case, the value is stored in a read-only section in the binary file and cannot be modified. If you compile to an assembly file (use the -S compiler option in gcc), you can see the string literals in the .rodata section. In this context, rodata means “read-only data”.

/* main.s */
.file	"main.c"
.section	.rodata
.LC0:
.string	"Bob"

Though the string literal is read-only, you can modify your char * string by pointing it to a different string literal. Following on from the above example:

// First Definition
char *name = "Bob";

// Redefine the char pointer
name = "Bub";

// This re-assignment is OK - you will see both string literals in your assembly file

You can declare a variable as a pointer to a character without specifying the string length & location. For example: char *name;.

You can then dynamically allocate memory to the string later.

String Declaration: ISO/IEC 9899:201x

The declaration char s[] = "abc", t[3] = "abc"; defines “plain” char array objects s and t whose elements are initialized with character string literals. This declaration is identical to char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; The contents of the arrays are modifiable.

On the other hand, the declaration char *p = "abc"; defines p with type “pointer to char” and initializes it to point to an object with type “array of char” with length 4 whose elements are initialized with a character string literal.

If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

C11 ISO/IEC 9899:2011 Section 6.7.9

References


comments powered by Disqus