Lecture 05: Intro to C Programming and C strings
Table of Contents
1 C Programming and Unix
In this course, all of our programming will be in C, and that's because C is a low level language, much closer to the hardware and Operating System then say C++ or Java. In fact, almost all modern operating systems are written in C including Unix, Linux, Mac OSX, and even Windows.
The C programming language, itself, was developed for the purpose of writing the original Unix operating system. It shouldn't be that surprising then, that if you want to learn how to write programs that interact with the Unix system directly, then those programs must be written in C. And, in so doing, the act of learning to program in C will illuminate key parts of the Unix system.
In many ways, you can view C as the lingua franc of programming; it's the language that every competent programmer should be able to program, even a little bit in. The syntax and programming constructs of C are present in all modern programming languages, and the concepts behind managing memory, data structures, and the like underpin modern programming. Humorously, while nearly all programmers know how to program in C, most try to avoid doing so because the same power that C provides as a "low level" language is what makes it finicky and difficult to deal with.
2 C Programming Preliminaries
If you were to look at a C program, you'd probably already understand it to some extent because you've been writing simple C++ programs. There are many things that are the same in C as in C++:
- Conditionals: Use
if
andelse
with the same syntax - Loops: Use
while
andfor
loops with the same syntax - Basic Types: Use
int
,float
,double
,char
- Variable Declaration: Still must declare your variables and types
- Functions: function declaration is the same
- Arrays and Pointers: Memory aligned sequences of data and references to that data
There are also a number of key differences:
- No
namespace
: C doesn't have a notion of namespace, everything is loaded into the same namespace. - No objects or advanced types: C does not have advanced types built
in, this includes
string
. Instead, strings are null terminated arrays ofchar
's. - No function overloading: Even functions with different type declarations, that is, take different types of input and return different types, cannot share the same name. Only the last declaration will be used.
- All functions are pass-by-value: You cannot declare a function to
take a reference, e.g.,
void foo(int &a)
. Instead, you must pass a pointer value. - Different Structures: Structures in C use a different syntax and interpreted differently.
- Variable Scoping: The deceleration of variables are tightly scoped
to code blocks, and you must declare variables prior to the block
to use them. For example,
for(int i, ....)
is not allowed in C. Instead, you must declarei
prior to the start of the for loop.
While clearly, the two programming languages, C++ and C, are different, they are actually more alike then different. In fact, C is a subset of C++, which means that any program you write in C is also a C++ program. There are often situations, when programming in C++ that your best choice for completing the task is to use C libraries. This is particularly relevant whenever you need to accomplish system related tasks, such as manipulating the file system or creating new processes. However, must programs you write in C++ are not C programs.
2.1 Hello World
When learning any programming language, you always start with the "Hello World" program. The way in which your program "speaks" says a lot about the syntax and structure of programming in that language. Below is the "Hello World" program for C++ and C, for comparison.
#include <iostream> using namespace std; // Hello World in C++ int main(int argc, char * argv[]){ cout << "Hello World" << endl; }
#include <stdio.h> // Hello World in C int main(int argc, char * argv[]){ printf("Hello World\n"); }
To begin, each of the programs has a #include
, which is a
compiler directive to include a library with the program. Both of
the include statements ask the compiler to include the I/O library.
While in C++ this was the iostream
library, in C, the standard
I/O library is stdio.h
. The .h
refers to a header file, which
is also a C program that library or auxiliary information is
generally stored.
2.2 Compiling a C program
The compilation process for C is very similar to that of C++, but we
use a C compiler. The standard C compiler on Unix system is gcc
,
the gnu C compiler. For exaple, to compile helloworld.c
, we do the
following.
#> gcc helloworld.c
Which will produce an executable file a.out
, which we can run
#> ./a.out Hello World
If we want to specify the name of the output file, you use the -o
option.
#> gcc helloworld.c -o helloworld #> ./helloworld Hello World
There are more advanced compilation techniques that we will cover in lab, such as including multiple files, compiling to object files, and using pre-compiler directores.
2.3 Includes
The process of including libraries in your program looks very
similar to that of C++, and uses the include
statement. Note, all
C libraries end in .h
, unlike C++. Here are some common libraries
you will probably want to include in your C program:
stdlib.h
: The C standard library, contains many useful utilities, and is generally included in all programs.stdio.h
: The standard I/O library, contains utilities for reading and writing from files and file streams, and is generally included in all programs.unistd.h
: The Unix standard library, contains utilities for interacting with the unix system, such as system callssys/types.h
: System types library, contains the definitions for the base types and structures of the unix system.string.h
: String library, contains utilities for handling C strings.ctype.h
: Character libary, contains utilities for handing char conversionsmath.h
: Math library, contains basic math utility functions.
When you put a #include <header.h>
in your program, the compiler will
search for that header in its header search path. The most common
location is in /usr/include
. However, if you place your filename
to include in quotes:
#include "header.h"
The compiler will look in the local directory for the file and not the search path. This will become important when we start to develop larger programs.
2.4 Format Input and Output
The way output is performed in C++ is also quite different then
that of C. In C++ you use the <<
and >>
to direct items from
cin
or towards cout
using iostreams. This convention is not
possible in C, and instead, format printing and reading is used.
Let's look at another example to further the comparison.
#include <iostream> using namespace std; int main(int argc, char * argv[]){ int num; cout << "Enter a number" << endl; cin >> num; cout << "You netered " << num << endl; }
#include <stdio.h> int main(int argc, char * argv[]){ int num; printf("Enter a number\n"); scanf("%d", &num); //use &num to store //at the address of num printf("You entered %d\n", num); }
The two programs above both ask the user to provide a number, and
then print out that number. In C++, this should be fairly familiar.
You use iostreams and direct the prompts to cout
and direct input
from cin
to the integer num
. C++ is smart enough to understand
that if you are directing an integer to output or from input, then
clearly, you are execting a number. C is not capable of making
those assumptions.
In C, we use a concept of format printing and format scanning
to do basic I/O. The format tells C what kind of input or output to
expect. In the above program enternumber.c
, the scanf
asks for
a %d
, which is the special format for a number, and similar, the
printf
has a %d
format to indicate that num
should be printed
as a number.
There are other format options. For example, you can use %f
to
request a float.
#include <stdio.h> int main(int argc, char * argv[]){ float pi; printf("Enter pi:\n"); scanf("%f", &pi); printf("Mmmm, pi: %f\n", pi); }
And you can use the format to change the number of decimals to
print. %0.2f
says print a float with only 2 trailing decimals.
You can also include multiple formats, and the order of the formats
match the additional arguments
int a=10,b=12; float f=3.14; printf("An int:%d a float:%f and another int:%d", a, f, b); // | | | | | | // | | `---|--|--' // | `----------------------|--' // `---------------------------------'
There are a number of different formats available, and you can read
the manual pages for printf
and scanf
to get more detail.
man 3 printf man 3 scanf
You have to use the 3 in the manual command because there exists other forms of these functions, namely for Bash programming, and you need to look in section 3 of the manual for C standard library manuals.
For this class, we will use the following format characters frequently:
%d
: format integer/number%f
: format float/double%x
: format hexadecimal%s
: format string%c
: format a char%%
: print a % symbol
2.5 Printing to stdout
and stderr
By default, all printf()
prints to stdout
, but you can
alternative write to any file stream. To do so, you use the
fprintf()
function, which acts just like printf()
, except you
explicitly state to which file stream you wish to print. Similarly,
there is an fscanf()
function for format reading from files other
than stdin
.
printf("Hello World\n"); //prints implicitly to standard out fprintf(stdout, "Hello World\n"); //print explicitly to standard out fprintf(stderr, "ERROR: World coming to an endline!\n"); //print to standard error
The standard file descriptors are available in C via their shorthand, and you can refer to their file descriptor numbers where appropriate:
stdin
: 0 : standard inputstdout
: 1 : standard outputstderr
: 2 : standard error
2.6 Control Flow
The same control flow you find in C++ is present in C. This includes if/else statements.
if( condition1 ){ //do something if condition1 is true }else if (condition2){ //do something if condition1 is false and condition2 is true }else{ //do this if both condition1 and condition2 is true }
While loops:
while( condition ){ //run this until the condition is not true }
And, for loops:
//run init at the start for ( init; condition; iteration){ //run until condition is false preforming iteration on each loop. }
One major gotcha for C style programming versus C++, is that there are different scoping issues. This is most present in for loops; you must declare your variables outside the initialization.
int i; //declared before loop for(i=0; i < 10; i++){ printf("%d\n",i); }
2.7 True and False
C does not have a boolean type, that is, a basic type that explicitly
defines true and false. Instead, true and false are defined for each
type where 0 or NULL
is always false and everything is true. All
basic types can be used as a condition on its own. For example, this
is a common form of writing an infinite loop:
while(1){ //loop forever! }
3 C Types
All types in C exist in C++; however, not all C++ types exist in C.
For example, the C++ string
type does not exist in C. Instead, we
use arrays to represent strings of char
's.
As a C programmer, you also have to change your notion of data type. You've previously thought of a type as a way to declare a variable to reference certain data, like integers, floats, and etc. While this is still true in C, you need to additionally think of types as a way of describing how much memory is needed to describe that data. This change in reasoning is difficult at first, but will benefit you
3.1 Numeric Types
The same basic numeric types exist in C:
int
: integer number : 4-bytesshort
: integer number : 2-byteslong
: integer number : 8-byteschar
: character : 1-bytefloat
: floating point number : 4-bytesdouble
: floating point number : 8-bytesvoid *
: pointers : 8-bytes on (64 bit machines)
We can even write a C program to illuminate the types:
#include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]){ printf("Size of int: %lu bytes\n", sizeof(int)); printf("Size of short: %lu bytes\n", sizeof(short)); printf("Size of long: %lu bytes\n", sizeof(long)); printf("Size of char: %lu bytes\n", sizeof(char)); printf("Size of float: %lu bytes\n", sizeof(float)); printf("Size of double: %lu bytes\n", sizeof(double)); printf("Size of pointer: %lu bytes\n", sizeof(char *)); }
The sizeof()
macro takes a type and returns the size of that type in
bytes. The format character %lu
is for an unsigned long. By default,
all types are signed, which means they can store negative values. For
example, the integer type, when unsigned, can store values between
231 - 1 = 2,147,483,647 and 0 - 231 = -2,147,483,648 because one of
32 bits is used to indicate signess. However, an usigned int
can
store numbers between 0 and 232 = 4294967296 or approximately 4
billion.
3.2 Pointer Types
In C, pointers play a larger role than in C++. Recall that a pointer
is a data type whose value is a memory address. A pointer must be
declared based on what type it references; for example, int *
are
pointers to integers and char *
are pointers to chars. Here are some
basic operations associated with pointers.
int * p
: pointer declaration*p
: pointer dereference, follow the pointer to the value&a
: Address of the variablea
p = &a
: pointer assignment, p now references a*p = 20
: assignment via a dereference, follow the pointer and assign a the value.
Individually, each of these operations can be difficult to understand. Following a stack diagram, where variables and values are modeled, is generally much easier way to understanding pointers. Below is a series of examples.
(1) Initially, a
has the value 10, b
has not been assigned to, and p
references the value of a
.
int a = 10, b; int *p = &a; // <-- (1) a = 20; b = *p; *p = 30; p = &b;
+---+----+ | a | 10 |<-. +---+----+ | | b | | | arrow for pointer indicates +---+----+ | a reference | p | .----+ +---+----+
(2) Assigning to a
changes a
's value, and now p
also references that value
int a = 10, b; int *p = &a; a = 20; // <-- (2) b = *p; *p = 30; p = &b;
+---+----+ | a | 20 |<-. +---+----+ | | b | | | +---+----+ | | p | .----+ +---+----+
(3) p
is dereferenced with *, and the value that p referenced is assigned to b
int a = 10, b; int *p = &a; a = 20; b = *p; // <-- (3) *p = 30; p = &b;
+---+----+ | a | 20 |<-. +---+----+ | | b | 20 | | *p means to follow pointer +---+----+ | to get value | p | .----+ +---+----+
(4) Assigning to *p
stores the value at memory p
references, changing a
's value
int a = 10, b; int *p = &a; a = 20; b = *p; *p = 30; // <-- (4) p = &b;
+---+----+ | a | 30 |<-. +---+----+ | | b | 20 | | assigning *p follows pointer +---+----+ | to store value | p | .----+ +---+----+
(5) Assigning to p
requires an address, now p
references the memory address of b
int a = 10, b; int *p = &a; a = 20; b = *p; *p = 30; p = &b; // <-- (5)
+---+----+ | a | 30 | +---+----+ | b | 20 |<-. +---+----+ | | p | .----+ +---+----+
3.3 Structures
A structure in C is similar to that in C++, and is way to group data types together into a new type.
struct intpair{ int a; int b; } struct intpair pair; pair.a = 10; pair.b = 20;
When you declare a struct, you are actually introducing a new type
into the program framework. Recall that a type both describes the kind
of data that is stored as well as the memory footprint of that data.
For example, we can inspect the size of the intpair
struct using the
sizeof()
macro.
printf("%ul\n", sizeof(struct intpair));
It should be of size 8-bytes because it stores 2 integers, each 4-bytes wide.
Through this course, we'll encounter many structure types, but we will
not refer to their type as struct name
. It can get quite cumbersome
to constantly have to write struct
when referring to these types.
Fortunately, C provides a way to introduce a new type name into the
programming environment, via the typedef
command.
typedef struct{ int a; int b; } intpair_t; intpair_t pair;
The above typedef
says, the structure of the form containing two
integers is assigned to the typename intpair_t
. It is convention in C
programs to declare new types with a _t
suffix to more easily
identify them.