Lecture 05: Pointers, Arrays, and Structures
Table of Contents
1 Basics Numerics C Types
In the last lesson we review the basic types of C. For reference, they apear below:
int
: integer number : 4-bytesshort
: integer number : 2-byteslong
: integer number : 8-byteschar
: character : 1-bytefloat
: floating point number : 4-bytesdouble
: floating point number : 8-bytesvoid *
: pointers : 8-bytes on (64 bit machines)
These types and the operations over them are sufficient for most programming; however, we will need more to accomplish the needed tasks. In particular, there are three aspects of these types that require further exploration:
- Advanced Structured Types: Create new types and formatted by combining basic types.
- Pointers: Working with references to data
- Arrays: Organizing data into linear structures.
2 Advanced Types: struct
An incredibly useful tool in programming is to be able to create advanced types built upon basic types. Consider managing a pair of integers. In practice, you could declare two integer variables and manage each separately, like so:
int left; int right; left = 1; right = 2;
But that is cumbersome and you always have to remeber that the varible
left
is paired with the variable right
, and what happens when you
need to have two pairs or three. It just is not manageable.
Instead, what we can do is declare a new type that is a structure containing two integers.
struct pair{ //declaring a new pair type int left; //that containing two integers int right; }; struct pair p1; //declare two variables of that type struct pair p2; p1.left = 10; //assign values to the pair types p1.right = 20; p2.left = 0; p2.right = 5;
The first part is to declare the new structure type by using the
keyword struct
and specify the basic types that are members of the
structure. Next, we can declare variables of that type using the type
name, struct pair
. With those variables, we can then refer to the
member values, left
and right
, using the .
operator.
One question to consider: How is the data for the structure laid out in memory? Another way to ask is: How many bytes does it take to store the structure? In this example, the structure contains two integers, so it is 8 bytes in size. In memory, it would be represented by two integers that are adjacent in memory space.
struct pair .--------------------. |.--------..--------.| ||<- 4B ->||<- 4B ->|| || left || right || |'________''________'| '--------------------' <----- 8 bytes ----->
Using the .
and the correct name either refers to the first or
second four bytes, or the left
or right
integer within the
pair. When we print its size, that is exactly what we get.
printf("%ul\n", sizeof(struct pair));
While the pair strict is a simple example, but throughout the semester we will see many advances structure types that combine a large amount of information. These structures are used to represent various states of the computer and convey a lot of information in a compact form.
3 Defining new types with typedef
while structure data is ever present in the system, it is often hidden
by declare new type names. The way to introduce a new type name or
type definition is using the typedef
macro. Here is an example for
the pair structure type we declared above.
typedef struct{ //declaring a new structure int left; //that containing two integers int right; } pair_t; //the type name for the structure is pair_t pair_t p1; //declare two variables of that type pair_t p2; p1.left = 10; //assign values to the pair types p1.right = 20; p2.left = 0; p2.right = 5;
This time we declare the same type, a pair of two integers, but we
gave that structure type a distinct name, a pair_t
. When declaring
something of this type, we do not need to specify that it is a
structure, instead, we call it what it is, a pair_t
. The compiler is
going to recognize the new type and ensure that it has the properties
of the structure.
The suffix _t
is typically used to specify that this type is not a
basic type and defined. This is a convention of C, not a rule, but it
can help guide you through the moray of types you will see this class.
4 Pointers
In C, pointers play a larger role than in C++. Recall that a pointer
is a data type whose value is a memory address. A pointer must be
declared based on what type it references; for example, int *
are
pointers to integers and char *
are pointers to chars. Here are some
basic operations associated with pointers.
int * p
: pointer declaration*p
: pointer dereference, follow the pointer to the value&a
: Address of the variablea
p = &a
: pointer assignment, p now references a*p = 20
: assignment via a dereference, follow the pointer and assign a the value.
Individually, each of these operations can be difficult to understand. Following a stack diagram, where variables and values are modeled. For the purposes of this class, we will draw stack diagrams like this:
+----------+-------+ | variable | value | +----------+-------+
If we have a pointer variable, then we'll do this:
+----------+-------+ | pointer | .-------> +----------+-------+
This will indicate that the value of the pointer is a memory address that references some other memory.
To codify this concept further, let's follow a running example of the following program:
int a = 10, b; int *p = &a; a = 20; b = *p; *p = 30; p = &b;
Let us walk through it step by step:
(1) Initially, a
has the value 10, b
has not been assigned to, and p
references the value of a
.
int a = 10, b; int *p = &a; // <-- (1) a = 20; b = *p; *p = 30; p = &b;
+---+----+ | a | 10 |<-. +---+----+ | | b | | | arrow for pointer indicates +---+----+ | a reference | p | .----+ +---+----+
(2) Assigning to a
changes a
's value, and now p
also references that value
int a = 10, b; int *p = &a; a = 20; // <-- (2) b = *p; *p = 30; p = &b;
+---+----+ | a | 20 |<-. +---+----+ | | b | | | +---+----+ | | p | .----+ +---+----+
(3) p
is dereferenced with *, and the value that p referenced is assigned to b
int a = 10, b; int *p = &a; a = 20; b = *p; // <-- (3) *p = 30; p = &b;
+---+----+ | a | 20 |<-. +---+----+ | | b | 20 | | *p means to follow pointer +---+----+ | to get value | p | .----+ +---+----+
(4) Assigning to *p
stores the value at memory p
references, changing a
's value
int a = 10, b; int *p = &a; a = 20; b = *p; *p = 30; // <-- (4) p = &b;
+---+----+ | a | 30 |<-. +---+----+ | | b | 20 | | assigning *p follows pointer +---+----+ | to store value | p | .----+ +---+----+
(5) Assigning to p
requires an address, now p
references the memory address of b
int a = 10, b; int *p = &a; a = 20; b = *p; *p = 30; p = &b; // <-- (5)
+---+----+ | a | 30 | +---+----+ | b | 20 |<-. +---+----+ | | p | .----+ +---+----+
4.1 Pointers to structures
Just like for other types, we can create pointers to structured memory. Consider for example:
typdef struct{ int left; int right; } pair_t; pair_t pair; pair.left = 1; pair.right = 2; pair_t * p = &pair;
This should be familiar to you as we can treat pair_t
just like
other data types, except we know that it is actually composed of two
integers. However, now that p
references a pair_t
how do we
deference it such that we get to member data? Here is one way.
printf("pair: (%d,%d)\n", (*p).left, (*p).right);
Looking closely, you see we first use the *
operator to deference
the pointer, and then the .
operator to refer to a member of the
structure. That is a lot of work and because C requires us to
frequently access members of structures via a pointer reference. To
alleviate that, C has a shortcut operation, the arrow or ->
, which
dereferences and then does member reference for pointers to
structures. Here is how that looks:
printf("pair: (%d,%d)\n", p->left, p->right);
p->right = 2017;
p->left = 1845;
5 Array Types
The last type are array types which provides a way for the program to declare an arbitrary amount of the same type in continuous memory. Here is a simple example with an array of integers:
int array[10]; //declare an array of 10 integers int i; //assign to the array for(i=0;i<10;i++){ array[i] = 2*i; //index times 2 } //reference the array for(i=0;i<10;i++){ printf("%d:%d\n", i,array[i]); }
aviv@saddleback: demo $ ./array-example 0:0 1:2 2:4 3:6 4:8 5:10 6:12 7:14 8:16 9:18
We declare an array using the [ ]
following the variable name. We
use the term index to refer to an element of an array. Above, the
array array
is of size 10, which means that we can use indexes 0
through 9 (computer scientist start counting at 0). To index the
array, for both retrieval and assignment, we use the [ ]
operators
as well.
5.1 Arrays and Pointers
Now, it is time to blow your mind. It turns out that in C arrays and pointers are the same thing. Seriously. Well, not exactly the same, but basically the same.
Let me demonstrate. First consider how and what happens when you assign a pointer to an array.
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]){ int array[10]; int i; int * p = array; //p points to array //assign to the array for(i=0;i<10;i++){ array[i] = 2*i; //index times 2 } //derefernce p and assign 2017 *p = 2017; //print the array for(i=0;i<10;i++){ printf("%d:%d\n", i,array[i]); } }
aviv@saddleback: demo $ ./pointer-array 0:2017 1:2 2:4 3:6 4:8 5:10 6:12 7:14 8:16 9:18
Notice that at index 0 the value is now 2017. Also notice that when
you assigned the pointer value, we did not take the address of the
array. That means p is really referencing the address of the first
item in the array and for that matter, so is array
!
It gets crazier because we can also the [ ]
operators with
pointers. Consider this small change to the program:
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]){ int array[10]; int i; int * p = array; //p points to array //assign to the array for(i=0;i<10;i++){ array[i] = 2*i; //index times 2 } //index p at 5 and assign 2017 p[5] = 2017; //<---------------!! //print the array for(i=0;i<10;i++){ printf("%d:%d\n", i,array[i]); } }
aviv@saddleback: demo $ ./pointer-array-index 0:0 1:2 2:4 3:6 4:8 5:2017 //<---------!!! 6:12 7:14 8:16 9:18
In this case we indexed the pointer at 5 and assigned to it the value
2017, which resulted in that value appearing in the output. What is
the implication of this? We know that p
is a pointer and we know to
assign to the value referenced by a pointer it requires a dereference,
so the [ ]
must be a dereference operation. And it is. In fact we
can translate the [ ]
operation like so:
p[5] = *(p+5)
What the [ ]
operation does is increments the pointer by the index
and then deference. As a stack diagram, we can visualize this like so:
.-------+------. | array | .--+--. |-------+------| | | | 0 |<-'<-. |-------+------| | | | 2 | | |-------+------| | | | 4 | | |-------+------| | | | 6 | | |-------+------| | | | 8 | | |-------+------| | | | 2017 |<----+----- p+5, array+5 |-------+------| | | | 12 | | |-------+------| | | | 14 | | |-------+------| | | | 16 | | |-------+------| | | | 18 | | |-------+------| | | p | .--+-----' '-------+------'
This is called pointer arithmetic, which is a bit complicated, but we'll return to it later when discussing strings. The important take away is that there is a close relationship between pointers and arrays. And now you also know why arrays are indexed starting at 0 — it is because of pointer arithmetic. The first item in the array is the same as just dereferencing the pointer to the array, thus occurring at index 0.
Before I described that relationship as the same, but they are not exactly the same. Where they differ is that pointers can be reassigned like any variable, but arrays cannot they. They are constants. For example, this is not allowed:
int a[10]; int b[10]; int *p; p = a; // ok b = p; // not ok!
Array pointers are constant, we cannot reassign to them. The reason is obvious when you think: if we could reassign the array pointer, then how would reclaim that memory? The answer is you could not. It would be lost.
In the next lessons, we will continue to look at arrays and pointers,
but in the context of strings, which are simply arrays of characters
with the property that they are null
terminated.