IC221: Systems Programming (SP15)


Home Policy Calendar Syllabus

Lecture 05: Pointers, Arrays, and Structures

Table of Contents

1 Basics Numerics C Types

In the last lesson we review the basic types of C. For reference, they apear below:

  • int : integer number : 4-bytes
  • short : integer number : 2-bytes
  • long : integer number : 8-bytes
  • char : character : 1-byte
  • float : floating point number : 4-bytes
  • double : floating point number : 8-bytes
  • void * : pointers : 8-bytes on (64 bit machines)

These types and the operations over them are sufficient for most programming; however, we will need more to accomplish the needed tasks. In particular, there are three aspects of these types that require further exploration:

  1. Advanced Structured Types: Create new types and formatted by combining basic types.
  2. Pointers: Working with references to data
  3. Arrays: Organizing data into linear structures.

2 Advanced Types: struct

An incredibly useful tool in programming is to be able to create advanced types built upon basic types. Consider managing a pair of integers. In practice, you could declare two integer variables and manage each separately, like so:

int left;
int right;
left = 1;
right = 2;

But that is cumbersome and you always have to remeber that the varible left is paired with the variable right, and what happens when you need to have two pairs or three. It just is not manageable.

Instead, what we can do is declare a new type that is a structure containing two integers.

struct pair{    //declaring a new pair type 
  int left;     //that containing two integers
  int right;
};

struct pair p1;       //declare two variables of that type
struct pair p2;

p1.left = 10;    //assign values to the pair types
p1.right = 20;

p2.left = 0;
p2.right = 5;

The first part is to declare the new structure type by using the keyword struct and specify the basic types that are members of the structure. Next, we can declare variables of that type using the type name, struct pair. With those variables, we can then refer to the member values, left and right, using the . operator.

One question to consider: How is the data for the structure laid out in memory? Another way to ask is: How many bytes does it take to store the structure? In this example, the structure contains two integers, so it is 8 bytes in size. In memory, it would be represented by two integers that are adjacent in memory space.

struct pair
.--------------------.       
|.--------..--------.|
||<- 4B ->||<- 4B ->||     
||  left  ||  right ||
|'________''________'|
'--------------------'
 <----- 8 bytes ----->

Using the . and the correct name either refers to the first or second four bytes, or the left or right integer within the pair. When we print its size, that is exactly what we get.

printf("%ul\n", sizeof(struct pair));

While the pair strict is a simple example, but throughout the semester we will see many advances structure types that combine a large amount of information. These structures are used to represent various states of the computer and convey a lot of information in a compact form.

3 Defining new types with typedef

while structure data is ever present in the system, it is often hidden by declare new type names. The way to introduce a new type name or type definition is using the typedef macro. Here is an example for the pair structure type we declared above.

typedef struct{    //declaring a new structure
  int left;        //that containing two integers
  int right;
} pair_t;          //the type name for the structure is pair_t

pair_t p1;       //declare two variables of that type
pair_t p2;

p1.left = 10;    //assign values to the pair types
p1.right = 20;

p2.left = 0;
p2.right = 5;

This time we declare the same type, a pair of two integers, but we gave that structure type a distinct name, a pair_t. When declaring something of this type, we do not need to specify that it is a structure, instead, we call it what it is, a pair_t. The compiler is going to recognize the new type and ensure that it has the properties of the structure.

The suffix _t is typically used to specify that this type is not a basic type and defined. This is a convention of C, not a rule, but it can help guide you through the moray of types you will see this class.

4 Pointers

In C, pointers play a larger role than in C++. Recall that a pointer is a data type whose value is a memory address. A pointer must be declared based on what type it references; for example, int * are pointers to integers and char * are pointers to chars. Here are some basic operations associated with pointers.

  • int * p : pointer declaration
  • *p : pointer dereference, follow the pointer to the value
  • &a : Address of the variable a
  • p = &a : pointer assignment, p now references a
  • *p = 20 : assignment via a dereference, follow the pointer and assign a the value.

Individually, each of these operations can be difficult to understand. Following a stack diagram, where variables and values are modeled. For the purposes of this class, we will draw stack diagrams like this:

+----------+-------+
| variable | value |
+----------+-------+

If we have a pointer variable, then we'll do this:

+----------+-------+
| pointer  |  .------->
+----------+-------+

This will indicate that the value of the pointer is a memory address that references some other memory.

To codify this concept further, let's follow a running example of the following program:

int a = 10, b;
int *p =  &a;
a = 20; 
b = *p; 
*p = 30;
p = &b;

Let us walk through it step by step:

(1) Initially, a has the value 10, b has not been assigned to, and p references the value of a.

int a = 10, b;
int *p =  &a; // <-- (1)
a = 20; 
b = *p; 
*p = 30;
p = &b;
+---+----+  
| a | 10 |<-.
+---+----+  |
| b |    |  |   arrow for pointer indicates 
+---+----+  |   a reference
| p |  .----+
+---+----+

(2) Assigning to a changes a's value, and now p also references that value

int a = 10, b;
int *p =  &a; 
a = 20; // <--  (2)
b = *p; 
*p = 30;
p = &b;
+---+----+  
| a | 20 |<-.
+---+----+  |
| b |    |  |
+---+----+  |
| p |  .----+
+---+----+

(3) p is dereferenced with *, and the value that p referenced is assigned to b

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; // <-- (3)
*p = 30;
p = &b;
+---+----+  
| a | 20 |<-.
+---+----+  |
| b | 20 |  |   *p means to follow pointer 
+---+----+  |   to get value
| p |  .----+
+---+----+

(4) Assigning to *p stores the value at memory p references, changing a's value

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; 
*p = 30; // <-- (4)
p = &b;
+---+----+  
| a | 30 |<-.
+---+----+  |
| b | 20 |  |   assigning *p follows pointer 
+---+----+  |   to store value
| p |  .----+
+---+----+

(5) Assigning to p requires an address, now p references the memory address of b

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; 
*p = 30; 
p = &b; // <-- (5)
+---+----+  
| a | 30 |
+---+----+  
| b | 20 |<-.
+---+----+  | 
| p |  .----+
+---+----+

4.1 Pointers to structures

Just like for other types, we can create pointers to structured memory. Consider for example:

typdef struct{
  int left;
  int right;
} pair_t;

pair_t pair;
pair.left = 1;
pair.right = 2;

pair_t * p = &pair;

This should be familiar to you as we can treat pair_t just like other data types, except we know that it is actually composed of two integers. However, now that p references a pair_t how do we deference it such that we get to member data? Here is one way.

printf("pair: (%d,%d)\n", (*p).left, (*p).right);

Looking closely, you see we first use the * operator to deference the pointer, and then the . operator to refer to a member of the structure. That is a lot of work and because C requires us to frequently access members of structures via a pointer reference. To alleviate that, C has a shortcut operation, the arrow or ->, which dereferences and then does member reference for pointers to structures. Here is how that looks:

printf("pair: (%d,%d)\n", p->left, p->right);

p->right = 2017;
p->left  = 1845;

5 Array Types

The last type are array types which provides a way for the program to declare an arbitrary amount of the same type in continuous memory. Here is a simple example with an array of integers:

int array[10]; //declare an array of 10 integers
int i;

//assign to the array  
for(i=0;i<10;i++){
  array[i] = 2*i; //index times 2
 }

//reference the array
for(i=0;i<10;i++){
  printf("%d:%d\n", i,array[i]); 
 }
aviv@saddleback: demo $ ./array-example
0:0
1:2
2:4
3:6
4:8
5:10
6:12
7:14
8:16
9:18

We declare an array using the [ ] following the variable name. We use the term index to refer to an element of an array. Above, the array array is of size 10, which means that we can use indexes 0 through 9 (computer scientist start counting at 0). To index the array, for both retrieval and assignment, we use the [ ] operators as well.

5.1 Arrays and Pointers

Now, it is time to blow your mind. It turns out that in C arrays and pointers are the same thing. Seriously. Well, not exactly the same, but basically the same.

Let me demonstrate. First consider how and what happens when you assign a pointer to an array.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

  int array[10];
  int i;
  int * p = array; //p points to array

  //assign to the array  
  for(i=0;i<10;i++){
    array[i] = 2*i; //index times 2
  }

  //derefernce p and assign 2017
  *p = 2017;

  //print the array
  for(i=0;i<10;i++){
    printf("%d:%d\n", i,array[i]); 
  }

}
aviv@saddleback: demo $ ./pointer-array 
0:2017
1:2
2:4
3:6
4:8
5:10
6:12
7:14
8:16
9:18

Notice that at index 0 the value is now 2017. Also notice that when you assigned the pointer value, we did not take the address of the array. That means p is really referencing the address of the first item in the array and for that matter, so is array!

It gets crazier because we can also the [ ] operators with pointers. Consider this small change to the program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

  int array[10];
  int i;
  int * p = array; //p points to array

  //assign to the array  
  for(i=0;i<10;i++){
    array[i] = 2*i; //index times 2
  }

  //index p at 5  and assign 2017
  p[5] = 2017;   //<---------------!!

  //print the array
  for(i=0;i<10;i++){
    printf("%d:%d\n", i,array[i]); 
  }

}
aviv@saddleback: demo $ ./pointer-array-index 
0:0
1:2
2:4
3:6
4:8
5:2017 //<---------!!!
6:12
7:14
8:16
9:18

In this case we indexed the pointer at 5 and assigned to it the value 2017, which resulted in that value appearing in the output. What is the implication of this? We know that p is a pointer and we know to assign to the value referenced by a pointer it requires a dereference, so the [ ] must be a dereference operation. And it is. In fact we can translate the [ ] operation like so:

p[5] = *(p+5)

What the [ ] operation does is increments the pointer by the index and then deference. As a stack diagram, we can visualize this like so:

.-------+------.
| array |   .--+--.
|-------+------|  |
|       |    0 |<-'<-.
|-------+------|     |
|       |    2 |     |
|-------+------|     |
|       |    4 |     |
|-------+------|     |
|       |    6 |     |
|-------+------|     |
|       |    8 |     |
|-------+------|     |
|       | 2017 |<----+----- p+5, array+5
|-------+------|     |
|       |   12 |     |
|-------+------|     |
|       |   14 |     |
|-------+------|     |
|       |   16 |     |
|-------+------|     |
|       |   18 |     |
|-------+------|     |
| p     |   .--+-----'
'-------+------'

This is called pointer arithmetic, which is a bit complicated, but we'll return to it later when discussing strings. The important take away is that there is a close relationship between pointers and arrays. And now you also know why arrays are indexed starting at 0 — it is because of pointer arithmetic. The first item in the array is the same as just dereferencing the pointer to the array, thus occurring at index 0.

Before I described that relationship as the same, but they are not exactly the same. Where they differ is that pointers can be reassigned like any variable, but arrays cannot they. They are constants. For example, this is not allowed:

int a[10];
int b[10];
int *p;

p = a; // ok

b = p; // not ok!

Array pointers are constant, we cannot reassign to them. The reason is obvious when you think: if we could reassign the array pointer, then how would reclaim that memory? The answer is you could not. It would be lost.

In the next lessons, we will continue to look at arrays and pointers, but in the context of strings, which are simply arrays of characters with the property that they are null terminated.