IC221: Systems Programming (SP14)


Home Policy Calendar Syllabus Resources Piazza

Lecture 05: Intro to C Programming and C strings

Table of Contents

1 C Programming and Unix

In this course, all of our programming will be in C, and that's because C is a low level language, much closer to the hardware and Operating System then say C++ or Java. In fact, almost all modern operating systems are written in C including Unix, Linux, Mac OSX, and even Windows.

The C programming language, itself, was developed for the purpose of writing the original Unix operating system. It shouldn't be that surprising then, that if you want to learn how to write programs that interact with the Unix system directly, then those programs must be written in C. And, in so doing, the act of learning to program in C will illuminate key parts of the Unix system.

In many ways, you can view C as the lingua franc of programming; it's the language that every competent programmer should be able to program, even a little bit in. The syntax and programming constructs of C are present in all modern programming languages, and the concepts behind managing memory, data structures, and the like underpin modern programming. Humorously, while nearly all programmers know how to program in C, most try to avoid doing so because the same power that C provides as a "low level" language is what makes it finicky and difficult to deal with.

2 C Programming Preliminaries

If you were to look at a C program, you'd probably already understand it to some extent because you've been writing simple C++ programs. There are many things that are the same in C as in C++:

  • Conditionals: Use if and else with the same syntax
  • Loops: Use while and for loops with the same syntax
  • Basic Types: Use int, float, double, char
  • Variable Declaration: Still must declare your variables and types
  • Functions: function declaration is the same
  • Arrays and Pointers: Memory aligned sequences of data and references to that data

There are also a number of key differences:

  • No namespace: C doesn't have a notion of namespace, everything is loaded into the same namespace.
  • No objects or advanced types: C does not have advanced types built in, this includes string. Instead, strings are null terminated arrays of char's.
  • No function overloading: Even functions with different type declarations, that is, take different types of input and return different types, cannot share the same name. Only the last declaration will be used.
  • All functions are pass-by-value: You cannot declare a function to take a reference, e.g., void foo(int &a). Instead, you must pass a pointer value.
  • Different Structures: Structures in C use a different syntax and interpreted differently.
  • Variable Scoping: The deceleration of variables are tightly scoped to code blocks, and you must declare variables prior to the block to use them. For example, for(int i, ....) is not allowed in C. Instead, you must declare i prior to the start of the for loop.

While clearly, the two programming languages, C++ and C, are different, they are actually more alike then different. In fact, C is a subset of C++, which means that any program you write in C is also a C++ program. There are often situations, when programming in C++ that your best choice for completing the task is to use C libraries. This is particularly relevant whenever you need to accomplish system related tasks, such as manipulating the file system or creating new processes. However, must programs you write in C++ are not C programs.

2.1 Hello World

When learning any programming language, you always start with the "Hello World" program. The way in which your program "speaks" says a lot about the syntax and structure of programming in that language. Below is the "Hello World" program for C++ and C, for comparison.

#include <iostream>

using namespace std;

// Hello World in C++
int main(int argc, char * argv[]){
  cout << "Hello World" << endl;
}
#include <stdio.h>

// Hello World in C
int main(int argc, char * argv[]){
  printf("Hello World\n");
}

To begin, each of the programs has a #include, which is a compiler directive to include a library with the program. Both of the include statements ask the compiler to include the I/O library. While in C++ this was the iostream library, in C, the standard I/O library is stdio.h. The .h refers to a header file, which is also a C program that library or auxiliary information is generally stored.

2.2 Compiling a C program

The compilation process for C is very similar to that of C++, but we use a C compiler. The standard C compiler on Unix system is gcc, the gnu C compiler. For exaple, to compile helloworld.c, we do the following.

#> gcc helloworld.c

Which will produce an executable file a.out, which we can run

#> ./a.out
Hello World

If we want to specify the name of the output file, you use the -o option.

#> gcc helloworld.c -o helloworld
#> ./helloworld
Hello World

There are more advanced compilation techniques that we will cover in lab, such as including multiple files, compiling to object files, and using pre-compiler directores.

2.3 Includes

The process of including libraries in your program looks very similar to that of C++, and uses the include statement. Note, all C libraries end in .h, unlike C++. Here are some common libraries you will probably want to include in your C program:

  • stdlib.h : The C standard library, contains many useful utilities, and is generally included in all programs.
  • stdio.h : The standard I/O library, contains utilities for reading and writing from files and file streams, and is generally included in all programs.
  • unistd.h : The Unix standard library, contains utilities for interacting with the unix system, such as system calls
  • sys/types.h : System types library, contains the definitions for the base types and structures of the unix system.
  • string.h : String library, contains utilities for handling C strings.
  • ctype.h : Character libary, contains utilities for handing char conversions
  • math.h : Math library, contains basic math utility functions.

When you put a #include <header.h> in your program, the compiler will search for that header in its header search path. The most common location is in /usr/include. However, if you place your filename to include in quotes:

#include "header.h"

The compiler will look in the local directory for the file and not the search path. This will become important when we start to develop larger programs.

2.4 Format Input and Output

The way output is performed in C++ is also quite different then that of C. In C++ you use the << and >> to direct items from cin or towards cout using iostreams. This convention is not possible in C, and instead, format printing and reading is used. Let's look at another example to further the comparison.

#include <iostream>

using namespace std;

int main(int argc, char * argv[]){
  int num;

  cout << "Enter a number" << endl;
  cin >> num;

  cout << "You netered " << num << endl;
}
#include <stdio.h>

int main(int argc, char * argv[]){
  int num;

  printf("Enter a number\n");
  scanf("%d", &num); //use &num to store 
                     //at the address of num

  printf("You entered %d\n", num);
}

The two programs above both ask the user to provide a number, and then print out that number. In C++, this should be fairly familiar. You use iostreams and direct the prompts to cout and direct input from cin to the integer num. C++ is smart enough to understand that if you are directing an integer to output or from input, then clearly, you are execting a number. C is not capable of making those assumptions.

In C, we use a concept of format printing and format scanning to do basic I/O. The format tells C what kind of input or output to expect. In the above program enternumber.c, the scanf asks for a %d, which is the special format for a number, and similar, the printf has a %d format to indicate that num should be printed as a number.

There are other format options. For example, you can use %f to request a float.

#include <stdio.h>

int main(int argc, char * argv[]){

  float pi;
  printf("Enter pi:\n");
  scanf("%f", &pi);

  printf("Mmmm, pi: %f\n", pi);
}

And you can use the format to change the number of decimals to print. %0.2f says print a float with only 2 trailing decimals. You can also include multiple formats, and the order of the formats match the additional arguments

int a=10,b=12;
float f=3.14;
printf("An int:%d a float:%f and another int:%d", a, f, b);
//              |          |                  |   |  |  |
//              |          |                  `---|--|--'
//              |          `----------------------|--'    
//              `---------------------------------'

There are a number of different formats available, and you can read the manual pages for printf and scanf to get more detail.

man 3 printf
man 3 scanf

You have to use the 3 in the manual command because there exists other forms of these functions, namely for Bash programming, and you need to look in section 3 of the manual for C standard library manuals.

For this class, we will use the following format characters frequently:

  • %d : format integer/number
  • %f : format float/double
  • %x : format hexadecimal
  • %s : format string
  • %c : format a char
  • %% : print a % symbol

2.5 Printing to stdout and stderr

By default, all printf() prints to stdout, but you can alternative write to any file stream. To do so, you use the fprintf() function, which acts just like printf(), except you explicitly state to which file stream you wish to print. Similarly, there is an fscanf() function for format reading from files other than stdin.

printf("Hello World\n"); //prints implicitly to standard out
fprintf(stdout, "Hello World\n"); //print explicitly to standard out
fprintf(stderr, "ERROR: World coming to an endline!\n"); //print to standard error

The standard file descriptors are available in C via their shorthand, and you can refer to their file descriptor numbers where appropriate:

  • stdin : 0 : standard input
  • stdout : 1 : standard output
  • stderr : 2 : standard error

2.6 Control Flow

The same control flow you find in C++ is present in C. This includes if/else statements.

if( condition1 ){
  //do something if condition1 is true
}else if (condition2){
  //do something if condition1 is false and condition2 is true
}else{
  //do this if both condition1 and condition2 is true
}

While loops:

while( condition ){
  //run this until the condition is not true
}

And, for loops:

//run init at the start
for ( init; condition; iteration){
  //run until condition is false preforming iteration on each loop.
}

One major gotcha for C style programming versus C++, is that there are different scoping issues. This is most present in for loops; you must declare your variables outside the initialization.

int i; //declared before loop
for(i=0; i < 10; i++){
  printf("%d\n",i);
}

2.7 True and False

C does not have a boolean type, that is, a basic type that explicitly defines true and false. Instead, true and false are defined for each type where 0 or NULL is always false and everything is true. All basic types can be used as a condition on its own. For example, this is a common form of writing an infinite loop:

while(1){
  //loop forever!
}

3 C Types

All types in C exist in C++; however, not all C++ types exist in C. For example, the C++ string type does not exist in C. Instead, we use arrays to represent strings of char's.

As a C programmer, you also have to change your notion of data type. You've previously thought of a type as a way to declare a variable to reference certain data, like integers, floats, and etc. While this is still true in C, you need to additionally think of types as a way of describing how much memory is needed to describe that data. This change in reasoning is difficult at first, but will benefit you

3.1 Numeric Types

The same basic numeric types exist in C:

  • int : integer number : 4-bytes
  • short : integer number : 2-bytes
  • long : integer number : 8-bytes
  • char : character : 1-byte
  • float : floating point number : 4-bytes
  • double : floating point number : 8-bytes
  • void * : pointers : 8-bytes on (64 bit machines)

We can even write a C program to illuminate the types:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){
  printf("Size of int: %lu bytes\n", sizeof(int));
  printf("Size of short: %lu bytes\n", sizeof(short));
  printf("Size of long: %lu bytes\n", sizeof(long));
  printf("Size of char: %lu bytes\n", sizeof(char));
  printf("Size of float: %lu bytes\n", sizeof(float));
  printf("Size of double: %lu bytes\n", sizeof(double));
  printf("Size of pointer: %lu bytes\n", sizeof(char *));
}

The sizeof() macro takes a type and returns the size of that type in bytes. The format character %lu is for an unsigned long. By default, all types are signed, which means they can store negative values. For example, the integer type, when unsigned, can store values between 231 - 1 = 2,147,483,647 and 0 - 231 = -2,147,483,648 because one of 32 bits is used to indicate signess. However, an usigned int can store numbers between 0 and 232 = 4294967296 or approximately 4 billion.

3.2 Pointer Types

In C, pointers play a larger role than in C++. Recall that a pointer is a data type whose value is a memory address. A pointer must be declared based on what type it references; for example, int * are pointers to integers and char * are pointers to chars. Here are some basic operations associated with pointers.

  • int * p : pointer declaration
  • *p : pointer dereference, follow the pointer to the value
  • &a : Address of the variable a
  • p = &a : pointer assignment, p now references a
  • *p = 20 : assignment via a dereference, follow the pointer and assign a the value.

Individually, each of these operations can be difficult to understand. Following a stack diagram, where variables and values are modeled, is generally much easier way to understanding pointers. Below is a series of examples.

(1) Initially, a has the value 10, b has not been assigned to, and p references the value of a.

int a = 10, b;
int *p =  &a; // <-- (1)
a = 20; 
b = *p; 
*p = 30;
p = &b;
+---+----+  
| a | 10 |<-.
+---+----+  |
| b |    |  |   arrow for pointer indicates 
+---+----+  |   a reference
| p |  .----+
+---+----+

(2) Assigning to a changes a's value, and now p also references that value

int a = 10, b;
int *p =  &a; 
a = 20; // <--  (2)
b = *p; 
*p = 30;
p = &b;
+---+----+  
| a | 20 |<-.
+---+----+  |
| b |    |  |
+---+----+  |
| p |  .----+
+---+----+

(3) p is dereferenced with *, and the value that p referenced is assigned to b

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; // <-- (3)
*p = 30;
p = &b;
+---+----+  
| a | 20 |<-.
+---+----+  |
| b | 20 |  |   *p means to follow pointer 
+---+----+  |   to get value
| p |  .----+
+---+----+

(4) Assigning to *p stores the value at memory p references, changing a's value

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; 
*p = 30; // <-- (4)
p = &b;
+---+----+  
| a | 30 |<-.
+---+----+  |
| b | 20 |  |   assigning *p follows pointer 
+---+----+  |   to store value
| p |  .----+
+---+----+

(5) Assigning to p requires an address, now p references the memory address of b

int a = 10, b;
int *p =  &a; 
a = 20; 
b = *p; 
*p = 30; 
p = &b; // <-- (5)
+---+----+  
| a | 30 |
+---+----+  
| b | 20 |<-.
+---+----+  | 
| p |  .----+
+---+----+

3.3 Structures

A structure in C is similar to that in C++, and is way to group data types together into a new type.

struct intpair{
  int a;
  int b;
}

struct intpair pair;
pair.a = 10;
pair.b = 20;

When you declare a struct, you are actually introducing a new type into the program framework. Recall that a type both describes the kind of data that is stored as well as the memory footprint of that data. For example, we can inspect the size of the intpair struct using the sizeof() macro.

printf("%ul\n", sizeof(struct intpair));

It should be of size 8-bytes because it stores 2 integers, each 4-bytes wide.

Through this course, we'll encounter many structure types, but we will not refer to their type as struct name. It can get quite cumbersome to constantly have to write struct when referring to these types. Fortunately, C provides a way to introduce a new type name into the programming environment, via the typedef command.

typedef struct{
   int a;
   int b;
} intpair_t;

intpair_t pair;

The above typedef says, the structure of the form containing two integers is assigned to the typename intpair_t. It is convention in C programs to declare new types with a _t suffix to more easily identify them.