IC221: Systems Programming (SP15)


Home Policy Calendar

Lec 10: File Streams

Table of Contents

1 The FILE * and opening files

The last part of the standard C library we haven't explore is reading/writing from files. Although, you've done this already in the form of the standard files, e.g., standard input, output, and error, we have demonstrated how to open, read, write, and close other files that may exist on the file system.

All the file stream functions and types are defined in the header file stdio.h, so you have to include that. In later lessons, we will look into using, the system call API to do all of our I/O.

1.1 Opening a file

Open files in the standard C library are referred to as file streams, and have the type:

FILE * stream;

and we open a file using fopen() which has the following prototype:

FILE * fopen(const char *path, const char *mode);

The first argument path is a string storing the file system path of the file to open, and mode describes the settings of the file. For example:

FILE * stream = fopen("gonavy.txt", "w");

will open a file in the current directory called "gonavy.txt" with write mode. We'll discuss the modes more shortly.

File streams, as pointers, are actually dynamically allocated, and they must be deallocated or closed. The function that closes a file stream is fclose()

fclose( stream );

1.2 File Modes

The mode of the file describes how to open and use the file. For example, you can open a file for reading, writing, append mode. You can start reading/writing from the start or end of the file. You can truncate the file when opening removing all the previous data. From the man page, here are the options:

The argument mode points to a string beginning with one of the following sequences  (possibly  followed
by additional characters, as described below):

r      Open text file for reading.  The stream is positioned at the beginning of the file.

r+     Open for reading and writing.  The stream is positioned at the beginning of the file.

w      Truncate  file  to zero length or create text file for writing.  The stream is positioned at the
       beginning of the file.

w+     Open for reading and writing.  The file is created if it does not exist, otherwise it  is  trun‐
       cated.  The stream is positioned at the beginning of the file.

a      Open  for  appending  (writing  at end of file).  The file is created if it does not exist.  The
       stream is positioned at the end of the file.

a+     Open for reading and appending (writing at end of file).  The file is created  if  it  does  not
       exist.   The  initial  file  position for reading is at the beginning of the file, but output is
       always appended to the end of the file.

One key thing to notice from the modes is that any mode string with a "+" is for both reading and writing, but using "r" vs. "w" has different consequences for if the file already exists. With "r" a file will never be truncated if it exists, which means its contents will be deleted, but "w" will always truncate if it exits. However, "r" mode will not create the file if it doesn't exist while "w" will. Finally, append mode with "a" is a special case of "w" that doesn't truncate with all writes occurring at the end of the file.

As you can also see in the man page description, the file stream is described as having a "position" which refers to where within the file we read/write from. When you read a byte, you move the position forward in the file. In later lessons, we will look into how to manipulate the stream more directly. For now

1.3 File Errors

Looking at the man page for the file, we can check for errors when opening files:

RETURN VALUE
       Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer.  Otherwise,  NULL  is
       returned and errno is set to indicate the error.

So we can check for NULL for errors, for example:

if ( (stream = fopen("DOESNOTEXIST.txt", "r")) == NULL){
   fprintf("ERROR ... \n");
}

An error could occur, like above, for the file not existing, but it could also be because you have insufficient permissions. Consider, a file that you do not have read permission on, but you open the file with "r" mode, then that would be an error condition. Similarly, if you try and open a file for writing without having write permission, that will cause an error.

Additionally, you can get errors when you try and read/write from a file that you opened with the wrong mode. This will cause the read/write function (discussed below) to fail. In those cases, you should check the return values of those functions.

While I do not show error checking below, it is important that your code does error checking.

2 Format input/output from File Streams

Just as you worked with the standard file streams, stdin, stdout, and stderr, we can do format input and output with file streams that you open with fopen().

2.1 Format Output with fprintf()

Let's start where we always start with understanding a new input/output system, hello world!

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  FILE * stream = fopen("helloworld.txt", "w");

  fprintf(stream, "Hello World!\n");

  fclose(stream);
}
aviv@saddleback: demo $ ./hello_fopen 
aviv@saddleback: demo $ cat helloworld.txt
Hello World!
aviv@saddleback: demo $ ./hello_fopen 
aviv@saddleback: demo $ cat helloworld.txt
Hello World!

The program opens a new stream with the write mode at the path "helloworld.txt" and prints to the stream, "Hello World!\n". When we execute the program, the file helloworld.txt is created if it doesn't exist, and if it does it is truncated. After printing to it, we can read it with cat, and we see that in fact "Hello World!" is in the file. If we run the program again, we still have "Hello World!" in the file, just one, and that's because the second time we run the program, the file exists, so it is truncated. The previous "Hello World!" is removed and we write "Hello World!".

However if we wanted to open the file in a different mode, say append, we get a different result:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  FILE * stream = fopen("helloworld.txt", "a");//<--

  fprintf(stream, "Hello World!\n");

  fclose(stream);
}
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ ./hello_append 
aviv@saddleback: demo $ cat helloworld.txt 
Hello World!
Hello World!
Hello World!
Hello World!

The original "Hello World!" remains, and the additional "Hello World!"'s are append to the end of the file. Printing "Hello World!" does not require a format, but fprintf() can format just like printf(), but to a file stream. For example, consider this simple program that prints information about the command line arguments.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  int i;

  FILE * stream = fopen("cmd.txt", "w");

  for(i=0;i<argc;i++){
    fprintf(stream, "argv[%d] = %s\n", i, argv[i]);
  }

  fclose(stream);
  return 0;
}
aviv@saddleback: demo $ ./command_info 
aviv@saddleback: demo $ cat cmd.txt
argv[0] = ./command_info
aviv@saddleback: demo $ ./command_info a b c d e f
aviv@saddleback: demo $ cat cmd.txt
argv[0] = ./command_info
argv[1] = a
argv[2] = b
argv[3] = c
argv[4] = d
argv[5] = e
argv[6] = f

2.2 Format Input with fscanf()

Just as we can format print to files, we can format read, or scan, from a file. fscanf() is just like scanf(), except it takes a file stream as the first argument. For example, consider a data file with following entries:

aviv@saddleback: demo $ cat file.dat
Aviv Adam 10 20 50 3.141592 yes
Pepin Joni 15 21 53 2.781 no

We can write a format to read this data in with fscanf():

int main(int argc, char * argv){

  FILE * stream = fopen("file.dat", "r");

  char fname[1024],lname[1024],yesno[4];
  int a,b,c;
  float f;

  while ( fscanf(stream,
                 "%s %s %d %d %d %f %s",
                 fname, lname, &a, &b, &c, &f, yesno) != EOF){

    printf("First Name: %s\n",fname);
    printf( "Last Name: %s\n",lname);
    printf("         a: %d\n",a);
    printf("         b: %d\n",b);
    printf("         b: %d\n",c);
    printf("         f: %f\n",f);
    printf("     yesno: %s\n", yesno);
    printf("\n");
  }

  fclose(stream);
  return 0;
}

And when we run it, we see that we scan each line at a time:

aviv@saddleback: demo $ ./scan_file 
First Name: Aviv
Last Name: Adam
         a: 10
         b: 20
         b: 50
         f: 3.141592
     yesno: yes

First Name: Pepin
Last Name: Joni
         a: 15
         b: 21
         b: 53
         f: 2.781000
     yesno: no

One thing you should notice from the scanning loop is that we compare to EOF, which is special value for "End of File." The end of the file is encoded in such a way that you can compare against it. When scanning and you reach end of the file, EOF is returned, which can be detected and used to break the loop.

Another item to note is that scanning with fscanf() is the same as that with scanf(), and is white space driven to separate different values to scan. Also, "%s" reads a word, as separated by white space, and does not read the whole line.

3 Input/Output bytes from File Streams

Suppose that instead of reading/writing formatted data, we are interested in reading/writing raw data. To do that, we use a different set of stream access functions, fread() and fwrite():

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

The idea behind fread() and fwrite() is that it can be used to read and write arbitrarily stored data. The first argument ptr references the memory to read from or write too. Notice that it is of the type void * since we do not want to put any requirements on the type of the data, so arbitrary data can be referenced.

The next two arguments size and nmemb work in tandem to describe how much data to read and write. The size argument is how many bytes is each element to be read/written, while nmemb is the number of such elements to read/written. The total amount of data to read/write is size*nmemb. Finally, the last argument is the stream to be read from or written to.

3.1 Reading/Writing a single element

The best way to see how this works is to look at a basic example, and to demonstrate how fread() and fwrite() are good for reading and writing arbitrary data, we will read and write pair structures:

#include <stdio.h>
#include <stdlib.h>


typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t pair;

  pair.left = 10;
  pair.right = 20;

  FILE * stream = fopen("pair.dat","w");

  fwrite(&pair, sizeof(pair_t), 1, stream);

  close(stream);
}

Here, we've declared a pair on the stack, assigned its left and right value. And then we write the pair to the file pair.dat. Note that in this case we are writing one pair, so while the size of the pair is sizeof(pair_t) the nmemb is 1.

If we were to look at the file pair.dat we would see that it is 8-bytes in size and will store the data that is integer 10 and 20. We can use the command line tool xxd which prints each byte in hex:

aviv@saddleback: demo $ xxd -c 1 pair.dat 
0000000: 0a  .
0000001: 00  .
0000002: 00  .
0000003: 00  .
0000004: 14  .
0000005: 00  .
0000006: 00  .
0000007: 00  .

The first column is the offset, and the second is the byte at that offset. So we have 8 bytes in the file. The first 4 bytes are 0x0a000000 and the second are 0x14000000. We know from the previous lesson that computers store the least significant byte first, so this these are the numbers 0x0a and 0x14, or 10 and 20 respectively.

Now if we want to read that data back into a pair structure, we can use a similar strategy as we did before, but in reverse:

#include <stdio.h>
#include <stdlib.h>


typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t pair;

  FILE * stream = fopen("pair.dat","r");

  fread(&pair, sizeof(pair_t), 1, stream);

  close(stream);


  printf("pair.left: %d pair.right: %d\n",
         pair.left,
         pair.right);
}

3.2 Reading/Writing multiple elements

This time, we are going to take advantage of the number of elements field to read/write an array of pairs. We'll set up the program just as before, but with an array of pairs initialized.

#include <stdio.h>
#include <stdlib.h>


typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t pairs[10];

  int i;
  for( i=0; i < 10 ; i++){

    pairs[i].left = 10*i;
    pairs[i].right = 20*i;
  }

  FILE * stream = fopen("arraypair.dat","w");

  fwrite(pairs, sizeof(pair_t), 10, stream);

  close(stream);
}

Looking clsoely, you can see that the fwrite() is now writing 10 elements, and which will write out the whole array. Each pair in the array has a fixed size and there are 10 of them. IF we look at the data in hex:

aviv@saddleback: demo $ xxd -c 8 arraypair.dat
0000000: 0000 0000 0000 0000  ........
0000008: 0a00 0000 1400 0000  ........
0000010: 1400 0000 2800 0000  ....(...
0000018: 1e00 0000 3c00 0000  ....<...
0000020: 2800 0000 5000 0000  (...P...
0000028: 3200 0000 6400 0000  2...d...
0000030: 3c00 0000 7800 0000  <...x...
0000038: 4600 0000 8c00 0000  F.......
0000040: 5000 0000 a000 0000  P.......
0000048: 5a00 0000 b400 0000  Z.......

This time, I printed 8 bytes per line, and looking closely at the data, you can see that, yes, the data is well encoded. Then reading the data back in with fread() is the same as writing:

#include <stdio.h>
#include <stdlib.h>

typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t pairs[10];


  FILE * stream = fopen("arraypair.dat","r");

  fread(pairs, sizeof(pair_t), 10, stream); //<--

  int i;
  for( i=0; i < 10 ; i++){

    printf("pair.left: %d pair.right: %d\n",
           pairs[i].left,
           pairs[i].right);
  }

  close(stream);
}

3.3 Reading/Writing Raw Bytes

The last thing we might want to do is be able to read and write any kind of binary file. To do this, we will take advantage of the fact that an array of characters can store any kind of data because each char is 1 byte, so an array of char's stores just a bunch of bytes. Generally, we like to think of character arrays as strings, but that is just a special form of character arrays. To differentiate character arrays that are strings from general byte contains, we use the term buffer to refer byte containers, as in "a buffer of bytes."

To see how this relates to reading/writing from file streams, lets look at a program that does something as simple as copy data from one file to another. That is, an implementation of the cp command.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  if(argc < 3){
    fprintf(stderr, "ERROR: %s src dest\n", argv[0]);
    return 1;
  }

  FILE * src, * dest;

  src  = fopen(argv[1], "r");
  dest = fopen(argv[2], "w");

  char data[1024];
  int n;

  while ( (n = fread(data, 1, 1024, src)) > 0){
    fwrite(data, 1, n, dest);
  }

  fclose(src);
  fclose(dest);

  return 0;
}

Looking closely, you see we define a data buffer, data, which can store 1024 bytes at a time. In the while loop, we read a size of 1 byte with 1024 elements, to fill up the buffer. Note that fread() returns the number of bytes read, so once the end of file is reached, 0 is returned so that there is no data left to read. Finally, we write the data we read, but only as much as we read as stored in n.