IC221: Systems Programming (SP15)


Home Policy Calendar Syllabus

Lecture 04: Intro to C Programming and C strings

Table of Contents

1 C Programming and Unix

In this course, all of our programming will be in C, and that's because C is a low level language, much closer to the hardware and Operating System then say C++ or Java. In fact, almost all modern operating systems are written in C including Unix, Linux, Mac OSX, and even Windows.

The C programming language, itself, was developed for the purpose of writing the original Unix operating system. It shouldn't be that surprising then, that if you want to learn how to write programs that interact with the Unix system directly, then those programs must be written in C. And, in so doing, the act of learning to program in C will illuminate key parts of the Unix system.

In many ways, you can view C as the lingua franc of programming; it's the language that every competent programmer should be able to program, even a little bit in. The syntax and programming constructs of C are present in all modern programming languages, and the concepts behind managing memory, data structures, and the like underpin modern programming. Humorously, while nearly all programmers know how to program in C, most try to avoid doing so because the same power that C provides as a "low level" language is what makes it finicky and difficult to deal with.

2 C Programming Preliminaries

First: YOU ALREADY KNOW C! That's because you've been programming C in your previos class since C is a subset of the C++ language. Not all of the programs you wrote are valid C programs, but the same structure and syntax are the same. If you were to look at a C program, you'd probably understand it to some extent, but there are a few things that C++ has that C does not; however must are the same. For example:

  • Conditionals: Use if and else with the same syntax
  • Loops: Use while and for loops with the same syntax
  • Basic Types: Use int, float, double, char
  • Variable Declaration: Still must declare your variables and types
  • Functions: function declaration is the same
  • Arrays and Pointers: Memory aligned sequences of data and references to that data

The big differences between C and C++ is:

  • No namespace: C doesn't have a notion of namespace, everything is loaded into the same namespace.
  • No objects or advanced types: C does not have advanced types built in, this includes string. Instead, strings are null terminated arrays of char's. But you can create more advanced data types like structs, but they also have slightly different properties.
  • No function overloading: Even functions with different type declarations, that is, take different types of input and return different types, cannot share the same name. Only the last declaration will be used.
  • All functions are pass-by-value: You cannot declare a function to take a reference, e.g., void foo(int &a). Instead, you must pass a pointer value.
  • Different Structures: Structures in C use a different syntax and interpreted differently.
  • Variable Scoping: The deceleration of variables are tightly scoped to code blocks, and you must declare variables prior to the block to use them. For example, for(int i, ....) is not allowed in C. Instead, you must declare i prior to the start of the for loop.

While clearly, the two programming languages, C++ and C, are different, they are actually more alike then different. In fact, C is a subset of C++, which means that any program you write in C is also a C++ program. There are often situations, when programming in C++

is not your best choice for completing the task while using C libraries are. This is particularly relevant whenever you need to accomplish system related tasks, such as manipulating the file system or creating new processes. However, must programs you write in C++ are not C programs.

2.1 Hello World

When learning any programming language, you always start with the "Hello World" program. The way in which your program "speaks" says a lot about the syntax and structure of programming in that language. Below is the "Hello World" program for C++ and C, for comparison.

#include <iostream>

using namespace std;

// Hello World in C++
int main(int argc, char * argv[]){
  cout << "Hello World" << endl;
}
#include <stdio.h>

// Hello World in C
int main(int argc, char * argv[]){
  printf("Hello World\n");
}

To begin, each of the programs has a #include, which is a compiler directive to include a library with the program. Both of the include statements ask the compiler to include the I/O library. While in C++ this was the iostream library, in C, the standard I/O library is stdio.h. The .h refers to a header file, which is also a C program that library or auxiliary information is generally stored.

2.2 Compiling a C program

The compilation process for C is very similar to that of C++, but we use a C compiler. The standard C compiler on Unix system is gcc, the gnu C compiler. For exaple, to compile helloworld.c, we do the following.

#> gcc helloworld.c

Which will produce an executable file a.out, which we can run

#> ./a.out
Hello World

If we want to specify the name of the output file, you use the -o option.

#> gcc helloworld.c -o helloworld
#> ./helloworld
Hello World

There are more advanced compilation techniques that we will cover in lab, such as including multiple files, compiling to object files, and using pre-compiler directores.

2.3 Includes

The process of including libraries in your program looks very similar to that of C++, and uses the include statement. Note, all C libraries end in .h, unlike C++. Here are some common libraries you will probably want to include in your C program:

  • stdlib.h : The C standard library, contains many useful utilities, and is generally included in all programs.
  • stdio.h : The standard I/O library, contains utilities for reading and writing from files and file streams, and is generally included in all programs.
  • unistd.h : The Unix standard library, contains utilities for interacting with the unix system, such as system calls
  • sys/types.h : System types library, contains the definitions for the base types and structures of the unix system.
  • string.h : String library, contains utilities for handling C strings.
  • ctype.h : Character libary, contains utilities for handing char conversions
  • math.h : Math library, contains basic math utility functions.

When you put a #include <header.h> in your program, the compiler will search for that header in its header search path. The most common location is in /usr/include. However, if you place your filename to include in quotes:

#include "header.h"

The compiler will look in the local directory for the file and not the search path. This will become important when we start to develop larger programs.

2.4 Format Input and Output

The way output is performed in C++ is also quite different then that of C. In C++ you use the << and >> to direct items from cin or towards cout using iostreams. This convention is not possible in C, and instead, format printing and reading is used. Let's look at another example to further the comparison.

#include <iostream>

using namespace std;

int main(int argc, char * argv[]){
  int num;

  cout << "Enter a number" << endl;
  cin >> num;

  cout << "You netered " << num << endl;
}
#include <stdio.h>

int main(int argc, char * argv[]){
  int num;

  printf("Enter a number\n");
  scanf("%d", &num); //use &num to store 
                     //at the address of num

  printf("You entered %d\n", num);
}

The two programs above both ask the user to provide a number, and then print out that number. In C++, this should be fairly familiar. You use iostreams and direct the prompts to cout and direct input from cin to the integer num. C++ is smart enough to understand that if you are directing an integer to output or from input, then clearly, you are execting a number. C is not capable of making those assumptions.

In C, we use a concept of format printing and format scanning to do basic I/O. The format tells C what kind of input or output to expect. In the above program enternumber.c, the scanf asks for a %d, which is the special format for a number, and similar, the printf has a %d format to indicate that num should be printed as a number.

There are other format options. For example, you can use %f to request a float.

#include <stdio.h>

int main(int argc, char * argv[]){

  float pi;
  printf("Enter pi:\n");
  scanf("%f", &pi);

  printf("Mmmm, pi: %f\n", pi);
}

And you can use the format to change the number of decimals to print. %0.2f says print a float with only 2 trailing decimals. You can also include multiple formats, and the order of the formats match the additional arguments

int a=10,b=12;
float f=3.14;
printf("An int:%d a float:%f and another int:%d", a, f, b);
//              |          |                  |   |  |  |
//              |          |                  `---|--|--'
//              |          `----------------------|--'    
//              `---------------------------------'

There are a number of different formats available, and you can read the manual pages for printf and scanf to get more detail.

man 3 printf
man 3 scanf

You have to use the 3 in the manual command because there exists other forms of these functions, namely for Bash programming, and you need to look in section 3 of the manual for C standard library manuals.

For this class, we will use the following format characters frequently:

  • %d : format integer/number
  • %f : format float/double
  • %x : format hexadecimal
  • %s : format string
  • %c : format a char
  • %% : print a % symbol

2.5 Printing to stdout and stderr

By default, all printf() prints to stdout, but you can alternative write to any file stream. To do so, you use the fprintf() function, which acts just like printf(), except you explicitly state to which file stream you wish to print. Similarly, there is an fscanf() function for format reading from files other than stdin.

printf("Hello World\n"); //prints implicitly to standard out
fprintf(stdout, "Hello World\n"); //print explicitly to standard out
fprintf(stderr, "ERROR: World coming to an endline!\n"); //print to standard error

The standard file descriptors are available in C via their shorthand, and you can refer to their file descriptor numbers where appropriate:

  • stdin : 0 : standard input
  • stdout : 1 : standard output
  • stderr : 2 : standard error

2.6 Control Flow

The same control flow you find in C++ is present in C. This includes if/else statements.

if( condition1 ){
  //do something if condition1 is true
}else if (condition2){
  //do something if condition1 is false and condition2 is true
}else{
  //do this if both condition1 and condition2 is true
}

While loops:

while( condition ){
  //run this until the condition is not true
}

And, for loops:

//run init at the start
for ( init; condition; iteration){
  //run until condition is false preforming iteration on each loop.
}

One major gotcha for C style programming versus C++, is that there are different scoping issues. This is most present in for loops; you must declare your variables outside the initialization.

int i; //declared before loop
for(i=0; i < 10; i++){
  printf("%d\n",i);
}

2.7 True and False

C does not have a boolean type, that is, a basic type that explicitly defines true and false. Instead, true and false are defined for each type where 0 or NULL is always false and everything is true. All basic types can be used as a condition on its own. For example, this is a common form of writing an infinite loop:

while(1){
  //loop forever!
}

3 C Types

All types in C exist in C++; however, not all C++ types exist in C. For example, the C++ string type does not exist in C. Instead, we use arrays to represent strings of char's.

As a C programmer, you also have to change your notion of data type. You've previously thought of a type as a way to declare a variable to reference certain data, like integers, floats, and etc. While this is still true in C, you need to additionally think of types as a way of describing how much memory is needed to describe that data. This change in reasoning is difficult at first, but will benefit you

3.1 Basic Types and sizeof()

The same basic numeric types exist in C:

  • int : integer number : 4-bytes
  • short : integer number : 2-bytes
  • long : integer number : 8-bytes
  • char : character : 1-byte
  • float : floating point number : 4-bytes
  • double : floating point number : 8-bytes
  • void * : pointers : 8-bytes on (64 bit machines)

We can even write a C program to illuminate the types:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

  printf("Size of char: %lu bytes\n", sizeof(char));
  printf("Size of short: %lu bytes\n", sizeof(short));
  printf("Size of int: %lu bytes\n", sizeof(int));
  printf("Size of long: %lu bytes\n", sizeof(long));  

  printf("Size of float: %lu bytes\n", sizeof(float));
  printf("Size of double: %lu bytes\n", sizeof(double));

  printf("Size of pointer: %lu bytes\n", sizeof(char *));


}
aviv@saddleback: demo $ ./sizeoftypes 
Size of char: 1 bytes
Size of short: 2 bytes
Size of int: 4 bytes
Size of long: 8 bytes
Size of float: 4 bytes
Size of double: 8 bytes
Size of pointer: 8 bytes

The sizeof() macro takes a type and returns the size of that type in bytes. This macro will be one of the most important in C because while we like to talk a lot about the type of the variable, in reality, that data must be stored in memory. The important thing for C is how much memory is needed to store that data; the type describes how to interpret that data. This is a nuanced and significant distinction that will take much practice to realize, but hopefully it will be realized by end of this class.

One common mistake about sizeof() that you will surely make is that it is not the same as a length function, that is, it will not consistently report the length of a data item, like the length of a string or an array. For that, there are other specialized functions (more on that later in the course). Instead, sizeof() primary purpose is uniquely used to report the amount of memory needed to hold a data element of that type.

You can pass to sizeof() a variable, but it will instead report the amount of memory needed to store that variable's data. For example:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){


  char c=0;
  short s=0;
  int i=0;
  long l=0;

  float f=0;
  double d=0;

  char *p=0;

  printf("Size of char: %lu bytes\n", sizeof(c));  
  printf("Size of short: %lu bytes\n", sizeof(s));
  printf("Size of int: %lu bytes\n", sizeof(i));
  printf("Size of long: %lu bytes\n", sizeof(l));  

  printf("Size of float: %lu bytes\n", sizeof(f));
  printf("Size of double: %lu bytes\n", sizeof(d));

  printf("Size of pointer: %lu bytes\n", sizeof(p));


}
aviv@saddleback: demo $ ./sizeoftypes-vars
Size of char: 1 bytes
Size of short: 2 bytes
Size of int: 4 bytes
Size of long: 8 bytes
Size of float: 4 bytes
Size of double: 8 bytes
Size of pointer: 8 bytes

3.2 Signess: Everything has a sign (or not)

In the above examples, we used a format character %lu which refers to an unsigned long. We had to specify that the type is unsigned because implicitly all numeric types are signed and can have positive and negative representations.

What is the implication of this and how is signess represented? Thinking about it, the determination of positive or negative is a binary decision, and thus we only need a single bit to represent the signess of a number. In practice, this means that for numeric types, there is actually one less bits used for the counting portion.

So: What is the largest and smallest signed integer?

  • There are 32 bits in an integer (i.e., 4 bytes). One bit is used for the sign, so 31 bits are used for counting. This the largest signed integer is 231 - 1 and the smallest signed integer is 0 - 231. We subtract 1 from the largest value to account for zero.

However, there are many times when we do not want signed numerics, and in those instances we want to specify unsigned types. To do that, we use the unsigned designation. For example:

#include <stdio.h>

int main(int argc, char *argv[]){

  unsigned int i = 4294967295; //2^32 -1 -- largest integer

}

In this case, I set the unsigned integer i to the largest possible value, 232-1, which is an order of magnitude (twice as great) as the largest signed integer. However, you have to be careful about signess because although we have declared the integer an unsigned designation, it may not always get interpreted that way. That is because of implicit casting.

Consider what happens with these two print statements:

printf("   Largest integer printed with unsigned: %u\n",i);
printf("Largest integer printed without unsigned: %d\n",i);

And the output:

aviv@saddleback: demo $ ./unsigned-types 
   Largest integer printed with unsigned: 4294967295
Largest integer printed without unsigned: -1

For the first, with a %u the integer was interpreted as an unsigned int and the value was printed as expected. For the second, with a %d, the output is not as expected. We get the value -1 — why?

This comes about because of how negative numbers are represented in binary using a system called twos-compliment. You will learn about that in your architecture class, but generally, negative numbers count backwards. So the largest positive number is the largest negative number, -1. The reason for this example is to make sure you are aware of this common pitfall that can occur from implicit casting.

3.3 Casting Between Types

The previous example brings up an important point about types, that we can translate between them. For example, sometimes we might want to cast a integer to a float, and vice versa. When casting numeric types, this happens either implicitly on assignment or can occur explicitly with a cast designation.

Here is an example of implicite casting across assignments.

//assignment will cast implicitly 
  int i = 1847;
  float f = i; //integer to float

  printf("int to float: f: %f\n",f);


  float g = 3.1415926;
  int j = g; //float to integer

  printf("float to int: j: %d\n",j);

In both cases, the act of assignment to an integer or to a float automatically converted the data into the target type. We can also make this action explicit, and often this is necessary to determine the output of a function. For example:

float pi = 3.1415926; //pi
float e = 2.7182818; //e
float r2 = 1.1412135; //root 2

int k = pi + e + r2; //casts on assignment
int l = (int) pi + (int) e + (int) r2; //casts before operations

printf("cast on assignemnt: k: %d\n",k);
printf("cast before operation: l: %d\n",l);

There are two outputs, for k the value is 7 and for l the value is 6. With the explicite cast, e.g., (int), the floats are casted to integers using the floor function and then summed. On assignemnt, the values are summed as floats and then casted.

Something interesting to think about with regards to casting in C is how this affects the underlying data representation. Floats and integers are not stored using the same data representation; however, casting between them seems to take care of that. What you will see later in the class is that in some situations where casting does not require changing the underlying data representation, casting only changes the interpretation of the data and not the structure. While this might seem inconsequential, the impact can be dramatic if used properly.

3.4 Pointer Types

The last basic type is the pointer. It also stores a number, but this number is the address of other data. We will discuss pointers in more detail in the next lesson, but the important take away is that pointer types are perhaps the most important parts of C and the most confusing and difficult.

The basic concept is that we define a pointer as a reference to some other memory type. For example, int * p declares a pointer p that references an integer, and similar float * q declares a pointer q that references a float. We use the term reference to indicate that the value of the pointer is a memory reference. A memory reference is the address of data, and so a pointer holds a value that references other memory.

Gere are the basic pointer operations:

  • int * p : pointer declaration
  • *p : pointer dereference, follow the pointer to the value
  • &a : Address of the variable a
  • p = &a : pointer assignment, p now references a
  • *p = 20 : assignment via a dereference, follow the pointer and assign a the value.

We will review this in more detail int he next lesson.