Lec 10: File Streams
Table of Contents
1 The FILE *
and opening files
The last part of the standard C library we haven't explore is reading/writing from files. Although, you've done this already in the form of the standard files, e.g., standard input, output, and error, we have demonstrated how to open, read, write, and close other files that may exist on the file system.
All the file stream functions and types are defined in the header
file stdio.h
, so you have to include that. In later lessons, we
will look into using, the system call API to do all of our I/O.
1.1 Opening a file
Open files in the standard C library are referred to as file streams, and have the type:
FILE * stream;
and we open a file using fopen()
which has the following prototype:
FILE * fopen(const char *path, const char *mode);
The first argument path
is a string storing the file system path
of the file to open, and mode
describes the settings of the
file. For example:
FILE * stream = fopen("gonavy.txt", "w");
will open a file in the current directory called "gonavy.txt" with write mode. We'll discuss the modes more shortly.
File streams, as pointers, are actually dynamically allocated, and
they must be deallocated or closed. The function that closes a file
stream is fclose()
fclose( stream );
1.2 File Modes
The mode of the file describes how to open and use the file. For example, you can open a file for reading, writing, append mode. You can start reading/writing from the start or end of the file. You can truncate the file when opening removing all the previous data. From the man page, here are the options:
The argument mode points to a string beginning with one of the following sequences (possibly followed by additional characters, as described below): r Open text file for reading. The stream is positioned at the beginning of the file. r+ Open for reading and writing. The stream is positioned at the beginning of the file. w Truncate file to zero length or create text file for writing. The stream is positioned at the beginning of the file. w+ Open for reading and writing. The file is created if it does not exist, otherwise it is trun‐ cated. The stream is positioned at the beginning of the file. a Open for appending (writing at end of file). The file is created if it does not exist. The stream is positioned at the end of the file. a+ Open for reading and appending (writing at end of file). The file is created if it does not exist. The initial file position for reading is at the beginning of the file, but output is always appended to the end of the file.
One key thing to notice from the modes is that any mode string with a "+" is for both reading and writing, but using "r" vs. "w" has different consequences for if the file already exists. With "r" a file will never be truncated if it exists, which means its contents will be deleted, but "w" will always truncate if it exits. However, "r" mode will not create the file if it doesn't exist while "w" will. Finally, append mode with "a" is a special case of "w" that doesn't truncate with all writes occurring at the end of the file.
As you can also see in the man page description, the file stream is described as having a "position" which refers to where within the file we read/write from. When you read a byte, you move the position forward in the file. In later lessons, we will look into how to manipulate the stream more directly. For now
1.3 File Errors
Looking at the man page for the file, we can check for errors when opening files:
RETURN VALUE Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer. Otherwise, NULL is returned and errno is set to indicate the error.
So we can check for NULL for errors, for example:
if ( (stream = fopen("DOESNOTEXIST.txt", "r")) == NULL){ fprintf("ERROR ... \n"); }
An error could occur, like above, for the file not existing, but it could also be because you have insufficient permissions. Consider, a file that you do not have read permission on, but you open the file with "r" mode, then that would be an error condition. Similarly, if you try and open a file for writing without having write permission, that will cause an error.
Additionally, you can get errors when you try and read/write from a file that you opened with the wrong mode. This will cause the read/write function (discussed below) to fail. In those cases, you should check the return values of those functions.
While I do not show error checking below, it is important that your code does error checking.
2 Format input/output from File Streams
Just as you worked with the standard file streams, stdin
,
stdout
, and stderr
, we can do format input and output with file
streams that you open with fopen()
.
2.1 Format Output with fprintf()
Let's start where we always start with understanding a new input/output system, hello world!
/*hello_fopen.c*/ #include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ FILE * stream = fopen("helloworld.txt", "w"); fprintf(stream, "Hello World!\n"); fclose(stream); }
aviv@saddleback: demo $ ./hello_fopen aviv@saddleback: demo $ cat helloworld.txt Hello World! aviv@saddleback: demo $ ./hello_fopen aviv@saddleback: demo $ cat helloworld.txt Hello World!
The program opens a new stream with the write mode at the path
"helloworld.txt" and prints to the stream, "Hello World!\n". When we
execute the program, the file helloworld.txt is created if it
doesn't exist, and if it does it is truncated. After printing to it,
we can read it with cat
, and we see that in fact "Hello World!" is
in the file. If we run the program again, we still have "Hello
World!" in the file, just one, and that's because the second time we
run the program, the file exists, so it is truncated. The previous
"Hello World!" is removed and we write "Hello World!".
However if we wanted to open the file in a different mode, say append, we get a different result:
/*hello_append.c*/ #include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ FILE * stream = fopen("helloworld.txt", "a");//<-- fprintf(stream, "Hello World!\n"); fclose(stream); }
aviv@saddleback: demo $ ./hello_append aviv@saddleback: demo $ ./hello_append aviv@saddleback: demo $ ./hello_append aviv@saddleback: demo $ cat helloworld.txt Hello World! Hello World! Hello World! Hello World!
The original "Hello World!" remains, and the additional "Hello
World!"'s are append to the end of the file. Printing "Hello World!"
does not require a format, but fprintf()
can format just like
printf()
, but to a file stream. For example, consider this simple
program that prints information about the command line arguments.
#include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ int i; FILE * stream = fopen("cmd.txt", "w"); for(i=0;i<argc;i++){ fprintf(stream, "argv[%d] = %s\n", i, argv[i]); } fclose(stream); return 0; }
aviv@saddleback: demo $ ./command_info aviv@saddleback: demo $ cat cmd.txt argv[0] = ./command_info aviv@saddleback: demo $ ./command_info a b c d e f aviv@saddleback: demo $ cat cmd.txt argv[0] = ./command_info argv[1] = a argv[2] = b argv[3] = c argv[4] = d argv[5] = e argv[6] = f
2.2 Format Input with fscanf()
Just as we can format print to files, we can format read, or scan,
from a file. fscanf()
is just like scanf()
, except it takes a
file stream as the first argument. For example, consider a data file
with following entries:
aviv@saddleback: demo $ cat file.dat Aviv Adam 10 20 50 3.141592 yes Pepin Joni 15 21 53 2.781 no
We can write a format to read this data in with fscanf()
:
int main(int argc, char * argv){ FILE * stream = fopen("file.dat", "r"); char fname[1024],lname[1024],yesno[4]; int a,b,c; float f; while ( fscanf(stream, "%s %s %d %d %d %f %s", fname, lname, &a, &b, &c, &f, yesno) != EOF){ printf("First Name: %s\n",fname); printf( "Last Name: %s\n",lname); printf(" a: %d\n",a); printf(" b: %d\n",b); printf(" b: %d\n",c); printf(" f: %f\n",f); printf(" yesno: %s\n", yesno); printf("\n"); } fclose(stream); return 0; }
And when we run it, we see that we scan each line at a time:
aviv@saddleback: demo $ ./scan_file First Name: Aviv Last Name: Adam a: 10 b: 20 b: 50 f: 3.141592 yesno: yes First Name: Pepin Last Name: Joni a: 15 b: 21 b: 53 f: 2.781000 yesno: no
One thing you should notice from the scanning loop is that we
compare to EOF
, which is special value for "End of File." The end
of the file is encoded in such a way that you can compare against
it. When scanning and you reach end of the file, EOF is returned,
which can be detected and used to break the loop.
Another item to note is that scanning with fscanf()
is the same as
that with scanf()
, and is white space driven to separate different
values to scan. Also, "%s" reads a word, as separated by white
space, and does not read the whole line.
3 Input/Output bytes from File Streams
Suppose that instead of reading/writing formatted data, we are
interested in reading/writing raw data. To do that, we use a
different set of stream access functions, fread()
and fwrite()
:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The idea behind fread()
and fwrite()
is that it can be used to
read and write arbitrarily stored data. The first argument ptr
references the memory to read from or write too. Notice that it is
of the type void *
since we do not want to put any requirements on
the type of the data, so arbitrary data can be referenced.
The next two arguments size
and nmemb
work in tandem to describe
how much data to read and write. The size
argument is how many
bytes is each element to be read/written, while nmemb
is the
number of such elements to read/written. The total amount of data to
read/write is size*nmemb
. Finally, the last argument is the
stream to be read from or written to.
3.1 Reading/Writing a single element
The best way to see how this works is to look at a basic example,
and to demonstrate how fread()
and fwrite()
are good for reading
and writing arbitrary data, we will read and write pair structures:
/*write_struct.c*/ #include <stdio.h> #include <stdlib.h> typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t pair; pair.left = 10; pair.right = 20; FILE * stream = fopen("pair.dat","w"); fwrite(&pair, sizeof(pair_t), 1, stream); close(stream); }
Here, we've declared a pair on the stack, assigned its left and
right value. And then we write the pair to the file pair.dat
. Note
that in this case we are writing one pair, so while the size
of
the pair is sizeof(pair_t)
the nmemb
is 1.
If we were to look at the file pair.dat
we would see that it is
8-bytes in size and will store the data that is integer 10
and 20. We can use the command line tool xxd
which prints each byte in hex:
aviv@saddleback: demo $ xxd -c 1 pair.dat 0000000: 0a . 0000001: 00 . 0000002: 00 . 0000003: 00 . 0000004: 14 . 0000005: 00 . 0000006: 00 . 0000007: 00 .
The first column is the offset, and the second is the byte at that offset. So we have 8 bytes in the file. The first 4 bytes are 0x0a000000 and the second are 0x14000000. We know from the previous lesson that computers store the least significant byte first, so this these are the numbers 0x0a and 0x14, or 10 and 20 respectively.
Now if we want to read that data back into a pair structure, we can use a similar strategy as we did before, but in reverse:
/*read_struct.c*/ #include <stdio.h> #include <stdlib.h> typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t pair; FILE * stream = fopen("pair.dat","r"); fread(&pair, sizeof(pair_t), 1, stream); close(stream); printf("pair.left: %d pair.right: %d\n", pair.left, pair.right); }
3.2 Reading/Writing multiple elements
This time, we are going to take advantage of the number of elements field to read/write an array of pairs. We'll set up the program just as before, but with an array of pairs initialized.
/*write_arraystruct.c*/ #include <stdio.h> #include <stdlib.h> typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t pairs[10]; int i; for( i=0; i < 10 ; i++){ pairs[i].left = 10*i; pairs[i].right = 20*i; } FILE * stream = fopen("arraypair.dat","w"); fwrite(pairs, sizeof(pair_t), 10, stream); close(stream); }
Looking clsoely, you can see that the fwrite()
is now writing 10
elements, and which will write out the whole array. Each pair in the
array has a fixed size and there are 10 of them. IF we look at the data in hex:
aviv@saddleback: demo $ xxd -c 8 arraypair.dat 0000000: 0000 0000 0000 0000 ........ 0000008: 0a00 0000 1400 0000 ........ 0000010: 1400 0000 2800 0000 ....(... 0000018: 1e00 0000 3c00 0000 ....<... 0000020: 2800 0000 5000 0000 (...P... 0000028: 3200 0000 6400 0000 2...d... 0000030: 3c00 0000 7800 0000 <...x... 0000038: 4600 0000 8c00 0000 F....... 0000040: 5000 0000 a000 0000 P....... 0000048: 5a00 0000 b400 0000 Z.......
This time, I printed 8 bytes per line, and looking closely at the
data, you can see that, yes, the data is well encoded. Then reading
the data back in with fread()
is the same as writing:
/*read_arraystructs.c*/ #include <stdio.h> #include <stdlib.h> typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t pairs[10]; FILE * stream = fopen("arraypair.dat","r"); fread(pairs, sizeof(pair_t), 10, stream); //<-- int i; for( i=0; i < 10 ; i++){ printf("pair.left: %d pair.right: %d\n", pairs[i].left, pairs[i].right); } close(stream); }
3.3 Reading/Writing Raw Bytes
The last thing we might want to do is be able to read and write any kind of binary file. To do this, we will take advantage of the fact that an array of characters can store any kind of data because each char is 1 byte, so an array of char's stores just a bunch of bytes. Generally, we like to think of character arrays as strings, but that is just a special form of character arrays. To differentiate character arrays that are strings from general byte contains, we use the term buffer to refer byte containers, as in "a buffer of bytes."
To see how this relates to reading/writing from file streams, lets
look at a program that does something as simple as copy data from one
file to another. That is, an implementation of the cp
command.
/*mycp.c*/ #include <stdio.h> #include <stdlib.h> int main(int argc, char * argv[]){ if(argc < 3){ fprintf(stderr, "ERROR: %s src dest\n", argv[0]); return 1; } FILE * src, * dest; src = fopen(argv[1], "r"); dest = fopen(argv[2], "w"); char data[1024]; int n; while ( (n = fread(data, 1, 1024, src)) > 0){ fwrite(data, 1, n, dest); } fclose(src); fclose(dest); return 0; }
Looking closely, you see we define a data buffer, data
, which can
store 1024 bytes at a time. In the while loop, we read a size of 1
byte with 1024 elements, to fill up the buffer. Note that fread()
returns the number of bytes read, so once the end of file is reached,
0 is returned so that there is no data left to read. Finally, we
write the data we read, but only as much as we read as stored in n
.