Lecture 04: Intro to C Programming and C strings
Table of Contents
1 C Programming and Unix
In this course, all of our programming will be in C, and that's because C is a low level language, much closer to the hardware and Operating System then say C++ or Java. In fact, almost all modern operating systems are written in C including Unix, Linux, Mac OSX, and even Windows.
The C programming language, itself, was developed for the purpose of writing the original Unix operating system. It shouldn't be that surprising then, that if you want to learn how to write programs that interact with the Unix system directly, then those programs must be written in C. And, in so doing, the act of learning to program in C will illuminate key parts of the Unix system.
In many ways, you can view C as the lingua franc of programming; it's the language that every competent programmer should be able to program, even a little bit in. The syntax and programming constructs of C are present in all modern programming languages, and the concepts behind managing memory, data structures, and the like underpin modern programming. Humorously, while nearly all programmers know how to program in C, most try to avoid doing so because the same power that C provides as a "low level" language is what makes it finicky and difficult to deal with.
2 C Programming Preliminaries
First: YOU ALREADY KNOW C! That's because you've been programming C in your previos class since C is a subset of the C++ language. Not all of the programs you wrote are valid C programs, but the same structure and syntax are the same. If you were to look at a C program, you'd probably understand it to some extent, but there are a few things that C++ has that C does not; however must are the same. For example:
- Conditionals: Use
if
andelse
with the same syntax - Loops: Use
while
andfor
loops with the same syntax - Basic Types: Use
int
,float
,double
,char
- Variable Declaration: Still must declare your variables and types
- Functions: function declaration is the same
- Arrays and Pointers: Memory aligned sequences of data and references to that data
The big differences between C and C++ is:
- No
namespace
: C doesn't have a notion of namespace, everything is loaded into the same namespace. - No objects or advanced types: C does not have advanced types built
in, this includes
string
. Instead, strings are null terminated arrays ofchar
's. But you can create more advanced data types like structs, but they also have slightly different properties. - No function overloading: Even functions with different type declarations, that is, take different types of input and return different types, cannot share the same name. Only the last declaration will be used.
- All functions are pass-by-value: You cannot declare a function to
take a reference, e.g.,
void foo(int &a)
. Instead, you must pass a pointer value. - Different Structures: Structures in C use a different syntax and interpreted differently.
- Variable Scoping: The deceleration of variables are tightly scoped
to code blocks, and you must declare variables prior to the block
to use them. For example,
for(int i, ....)
is not allowed in C. Instead, you must declarei
prior to the start of the for loop.
While clearly, the two programming languages, C++ and C, are different, they are actually more alike then different. In fact, C is a subset of C++, which means that any program you write in C is also a C++ program. There are often situations, when programming in C++
is not your best choice for completing the task while using C libraries are. This is particularly relevant whenever you need to accomplish system related tasks, such as manipulating the file system or creating new processes. However, must programs you write in C++ are not C programs.
2.1 Hello World
When learning any programming language, you always start with the "Hello World" program. The way in which your program "speaks" says a lot about the syntax and structure of programming in that language. Below is the "Hello World" program for C++ and C, for comparison.
#include <iostream> using namespace std; // Hello World in C++ int main(int argc, char * argv[]){ cout << "Hello World" << endl; }
#include <stdio.h> // Hello World in C int main(int argc, char * argv[]){ printf("Hello World\n"); }
To begin, each of the programs has a #include
, which is a
compiler directive to include a library with the program. Both of
the include statements ask the compiler to include the I/O library.
While in C++ this was the iostream
library, in C, the standard
I/O library is stdio.h
. The .h
refers to a header file, which
is also a C program that library or auxiliary information is
generally stored.
2.2 Compiling a C program
The compilation process for C is very similar to that of C++, but we
use a C compiler. The standard C compiler on Unix system is gcc
,
the gnu C compiler. For exaple, to compile helloworld.c
, we do the
following.
#> gcc helloworld.c
Which will produce an executable file a.out
, which we can run
#> ./a.out Hello World
If we want to specify the name of the output file, you use the -o
option.
#> gcc helloworld.c -o helloworld #> ./helloworld Hello World
There are more advanced compilation techniques that we will cover in lab, such as including multiple files, compiling to object files, and using pre-compiler directores.
2.3 Includes
The process of including libraries in your program looks very
similar to that of C++, and uses the include
statement. Note, all
C libraries end in .h
, unlike C++. Here are some common libraries
you will probably want to include in your C program:
stdlib.h
: The C standard library, contains many useful utilities, and is generally included in all programs.stdio.h
: The standard I/O library, contains utilities for reading and writing from files and file streams, and is generally included in all programs.unistd.h
: The Unix standard library, contains utilities for interacting with the unix system, such as system callssys/types.h
: System types library, contains the definitions for the base types and structures of the unix system.string.h
: String library, contains utilities for handling C strings.ctype.h
: Character libary, contains utilities for handing char conversionsmath.h
: Math library, contains basic math utility functions.
When you put a #include <header.h>
in your program, the compiler will
search for that header in its header search path. The most common
location is in /usr/include
. However, if you place your filename
to include in quotes:
#include "header.h"
The compiler will look in the local directory for the file and not the search path. This will become important when we start to develop larger programs.
2.4 Format Input and Output
The way output is performed in C++ is also quite different then
that of C. In C++ you use the <<
and >>
to direct items from
cin
or towards cout
using iostreams. This convention is not
possible in C, and instead, format printing and reading is used.
Let's look at another example to further the comparison.
#include <iostream> using namespace std; int main(int argc, char * argv[]){ int num; cout << "Enter a number" << endl; cin >> num; cout << "You netered " << num << endl; }
#include <stdio.h> int main(int argc, char * argv[]){ int num; printf("Enter a number\n"); scanf("%d", &num); //use &num to store //at the address of num printf("You entered %d\n", num); }
The two programs above both ask the user to provide a number, and
then print out that number. In C++, this should be fairly familiar.
You use iostreams and direct the prompts to cout
and direct input
from cin
to the integer num
. C++ is smart enough to understand
that if you are directing an integer to output or from input, then
clearly, you are execting a number. C is not capable of making
those assumptions.
In C, we use a concept of format printing and format scanning
to do basic I/O. The format tells C what kind of input or output to
expect. In the above program enternumber.c
, the scanf
asks for
a %d
, which is the special format for a number, and similar, the
printf
has a %d
format to indicate that num
should be printed
as a number.
There are other format options. For example, you can use %f
to
request a float.
#include <stdio.h> int main(int argc, char * argv[]){ float pi; printf("Enter pi:\n"); scanf("%f", &pi); printf("Mmmm, pi: %f\n", pi); }
And you can use the format to change the number of decimals to
print. %0.2f
says print a float with only 2 trailing decimals.
You can also include multiple formats, and the order of the formats
match the additional arguments
int a=10,b=12; float f=3.14; printf("An int:%d a float:%f and another int:%d", a, f, b); // | | | | | | // | | `---|--|--' // | `----------------------|--' // `---------------------------------'
There are a number of different formats available, and you can read
the manual pages for printf
and scanf
to get more detail.
man 3 printf man 3 scanf
You have to use the 3 in the manual command because there exists other forms of these functions, namely for Bash programming, and you need to look in section 3 of the manual for C standard library manuals.
For this class, we will use the following format characters frequently:
%d
: format integer/number%f
: format float/double%x
: format hexadecimal%s
: format string%c
: format a char%%
: print a % symbol
2.5 Printing to stdout
and stderr
By default, all printf()
prints to stdout
, but you can
alternative write to any file stream. To do so, you use the
fprintf()
function, which acts just like printf()
, except you
explicitly state to which file stream you wish to print. Similarly,
there is an fscanf()
function for format reading from files other
than stdin
.
printf("Hello World\n"); //prints implicitly to standard out fprintf(stdout, "Hello World\n"); //print explicitly to standard out fprintf(stderr, "ERROR: World coming to an endline!\n"); //print to standard error
The standard file descriptors are available in C via their shorthand, and you can refer to their file descriptor numbers where appropriate:
stdin
: 0 : standard inputstdout
: 1 : standard outputstderr
: 2 : standard error
2.6 Control Flow
The same control flow you find in C++ is present in C. This includes if/else statements.
if( condition1 ){ //do something if condition1 is true }else if (condition2){ //do something if condition1 is false and condition2 is true }else{ //do this if both condition1 and condition2 is true }
While loops:
while( condition ){ //run this until the condition is not true }
And, for loops:
//run init at the start for ( init; condition; iteration){ //run until condition is false preforming iteration on each loop. }
One major gotcha for C style programming versus C++, is that there are different scoping issues. This is most present in for loops; you must declare your variables outside the initialization.
int i; //declared before loop for(i=0; i < 10; i++){ printf("%d\n",i); }
2.7 True and False
C does not have a boolean type, that is, a basic type that explicitly
defines true and false. Instead, true and false are defined for each
type where 0 or NULL
is always false and everything is true. All
basic types can be used as a condition on its own. For example, this
is a common form of writing an infinite loop:
while(1){ //loop forever! }
3 C Types
All types in C exist in C++; however, not all C++ types exist in C.
For example, the C++ string
type does not exist in C. Instead, we
use arrays to represent strings of char
's.
As a C programmer, you also have to change your notion of data type. You've previously thought of a type as a way to declare a variable to reference certain data, like integers, floats, and etc. While this is still true in C, you need to additionally think of types as a way of describing how much memory is needed to describe that data. This change in reasoning is difficult at first, but will benefit you
3.1 Basic Types and sizeof()
The same basic numeric types exist in C:
int
: integer number : 4-bytesshort
: integer number : 2-byteslong
: integer number : 8-byteschar
: character : 1-bytefloat
: floating point number : 4-bytesdouble
: floating point number : 8-bytesvoid *
: pointers : 8-bytes on (64 bit machines)
We can even write a C program to illuminate the types:
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]){ printf("Size of char: %lu bytes\n", sizeof(char)); printf("Size of short: %lu bytes\n", sizeof(short)); printf("Size of int: %lu bytes\n", sizeof(int)); printf("Size of long: %lu bytes\n", sizeof(long)); printf("Size of float: %lu bytes\n", sizeof(float)); printf("Size of double: %lu bytes\n", sizeof(double)); printf("Size of pointer: %lu bytes\n", sizeof(char *)); }
aviv@saddleback: demo $ ./sizeoftypes Size of char: 1 bytes Size of short: 2 bytes Size of int: 4 bytes Size of long: 8 bytes Size of float: 4 bytes Size of double: 8 bytes Size of pointer: 8 bytes
The sizeof()
macro takes a type and returns the size of that type in
bytes. This macro will be one of the most important in C because while
we like to talk a lot about the type of the variable, in reality, that
data must be stored in memory. The important thing for C is how much
memory is needed to store that data; the type describes how to
interpret that data. This is a nuanced and significant distinction
that will take much practice to realize, but hopefully it will be
realized by end of this class.
One common mistake about sizeof()
that you will surely make is that
it is not the same as a length function, that is, it will not
consistently report the length of a data item, like the length of a
string or an array. For that, there are other specialized functions
(more on that later in the course). Instead, sizeof()
primary
purpose is uniquely used to report the amount of memory needed to
hold a data element of that type.
You can pass to sizeof()
a variable, but it will instead report the
amount of memory needed to store that variable's data. For example:
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]){ char c=0; short s=0; int i=0; long l=0; float f=0; double d=0; char *p=0; printf("Size of char: %lu bytes\n", sizeof(c)); printf("Size of short: %lu bytes\n", sizeof(s)); printf("Size of int: %lu bytes\n", sizeof(i)); printf("Size of long: %lu bytes\n", sizeof(l)); printf("Size of float: %lu bytes\n", sizeof(f)); printf("Size of double: %lu bytes\n", sizeof(d)); printf("Size of pointer: %lu bytes\n", sizeof(p)); }
aviv@saddleback: demo $ ./sizeoftypes-vars Size of char: 1 bytes Size of short: 2 bytes Size of int: 4 bytes Size of long: 8 bytes Size of float: 4 bytes Size of double: 8 bytes Size of pointer: 8 bytes
3.2 Signess: Everything has a sign (or not)
In the above examples, we used a format character %lu
which refers
to an unsigned long. We had to specify that the type is unsigned
because implicitly all numeric types are signed and can have positive
and negative representations.
What is the implication of this and how is signess represented? Thinking about it, the determination of positive or negative is a binary decision, and thus we only need a single bit to represent the signess of a number. In practice, this means that for numeric types, there is actually one less bits used for the counting portion.
So: What is the largest and smallest signed integer?
- There are 32 bits in an integer (i.e., 4 bytes). One bit is used for the sign, so 31 bits are used for counting. This the largest signed integer is 231 - 1 and the smallest signed integer is 0 - 231. We subtract 1 from the largest value to account for zero.
However, there are many times when we do not want signed numerics, and
in those instances we want to specify unsigned types. To do that, we
use the unsigned
designation. For example:
#include <stdio.h> int main(int argc, char *argv[]){ unsigned int i = 4294967295; //2^32 -1 -- largest integer }
In this case, I set the unsigned integer i
to the largest possible
value, 232-1, which is an order of magnitude (twice as great) as the
largest signed integer. However, you have to be careful about signess
because although we have declared the integer an unsigned designation,
it may not always get interpreted that way. That is because of
implicit casting.
Consider what happens with these two print statements:
printf(" Largest integer printed with unsigned: %u\n",i); printf("Largest integer printed without unsigned: %d\n",i);
And the output:
aviv@saddleback: demo $ ./unsigned-types Largest integer printed with unsigned: 4294967295 Largest integer printed without unsigned: -1
For the first, with a %u
the integer was interpreted as an unsigned
int and the value was printed as expected. For the second, with a
%d
, the output is not as expected. We get the value -1 — why?
This comes about because of how negative numbers are represented in binary using a system called twos-compliment. You will learn about that in your architecture class, but generally, negative numbers count backwards. So the largest positive number is the largest negative number, -1. The reason for this example is to make sure you are aware of this common pitfall that can occur from implicit casting.
3.3 Casting Between Types
The previous example brings up an important point about types, that we can translate between them. For example, sometimes we might want to cast a integer to a float, and vice versa. When casting numeric types, this happens either implicitly on assignment or can occur explicitly with a cast designation.
Here is an example of implicite casting across assignments.
//assignment will cast implicitly int i = 1847; float f = i; //integer to float printf("int to float: f: %f\n",f); float g = 3.1415926; int j = g; //float to integer printf("float to int: j: %d\n",j);
In both cases, the act of assignment to an integer or to a float automatically converted the data into the target type. We can also make this action explicit, and often this is necessary to determine the output of a function. For example:
float pi = 3.1415926; //pi float e = 2.7182818; //e float r2 = 1.1412135; //root 2 int k = pi + e + r2; //casts on assignment int l = (int) pi + (int) e + (int) r2; //casts before operations printf("cast on assignemnt: k: %d\n",k); printf("cast before operation: l: %d\n",l);
There are two outputs, for k
the value is 7 and for l
the value
is 6. With the explicite cast, e.g., (int
), the floats are casted
to integers using the floor function and then summed. On assignemnt,
the values are summed as floats and then casted.
Something interesting to think about with regards to casting in C is how this affects the underlying data representation. Floats and integers are not stored using the same data representation; however, casting between them seems to take care of that. What you will see later in the class is that in some situations where casting does not require changing the underlying data representation, casting only changes the interpretation of the data and not the structure. While this might seem inconsequential, the impact can be dramatic if used properly.
3.4 Pointer Types
The last basic type is the pointer. It also stores a number, but this number is the address of other data. We will discuss pointers in more detail in the next lesson, but the important take away is that pointer types are perhaps the most important parts of C and the most confusing and difficult.
The basic concept is that we define a pointer as a reference to
some other memory type. For example, int * p
declares a pointer p
that references an integer, and similar float * q
declares a
pointer q that references a float. We use the term reference to
indicate that the value of the pointer is a memory reference. A
memory reference is the address of data, and so a pointer holds a
value that references other memory.
Gere are the basic pointer operations:
int * p
: pointer declaration*p
: pointer dereference, follow the pointer to the value&a
: Address of the variablea
p = &a
: pointer assignment, p now references a*p = 20
: assignment via a dereference, follow the pointer and assign a the value.
We will review this in more detail int he next lesson.