IC221: Systems Programming (SP16)


Home Policy Calendar Resources

Lec 12: File and Device I/O using System Calls

Table of Contents

1 System Calls, File Management and Device I/O

In the last lesson, we identified the system resources that the OS is responsible for managing. These include:

  • Device Management : Hardware devices, such as keyboard, monitors, printers, hard drives, etc., are resources managed by the computer. When a programmer wishes to interact with these devices, a unified interface provided by the OS is used.
  • Process Management : The invocation and execution of a program, a process, is managed by the OS, including managing its current state, running or stopped, as well as the loading of code.
  • Memory Management : The access to physical and virotual memory is controlled by the OS, and a programs memory layout and current allocations is carefully managed.
  • File System Management : The OS is also responsible for ensuring that programs can read and write from the filesystem, but also that programs don't corrupt the file sysystem or access files/directory that they do not have permission to.

The major theme of this course is understanding the OS System Call API that the Unix system uses to access these resources. So far, we've been using the system call API via the C standard library, but underneath the covers, system calls were being used. As an example of the difference between a system call and a library lesson, in the last lesson, we identified that the memory management functions , malloc() and calloc(), is actually a standard library function. Real memory allocation occurs via the system call sbrk(), which adjust the break point to increase the size of the heap.

In today's lesson, we are going to do the same for the Device Management and File System Management resources. In previous lessons, we have interacted with the file system and performed I/O via the file stream interface, FILE *, and used standard I/O library functions like, fprintf() and fputc() and etc. The file stream interface is a lot like malloc(), it is a nice library feature that provides a service; under the file stream interface lies the system calls that help manage file opening and closing as well as reading and writing from files. We are going to go back to first principles, let's do Hello World again!

2 Hello World (again)

From now on, let's assume that we don't have the C standard library, or the C standard I/O library: How do we write our Hello World program? We need to use a lower level function, a system call, to write directly to the standard out device, i.e., the terminal window. The system call that writes to a file or device is write(). Bellow, is a the system call hello world:

#include <unistd.h>

int main(int argc, char * argv[]){
  char hello[] = "Hello World\n";
  char *p;

  for(p = hello ; *p ; p++){
     write(1, p, 1); 
  }

}

Much of this program should be familiar to you. We assign the "Hello World" string to the array hello, and we then iterate over that array one character at a time using pointer arithmetic until the end. The output is via the write() system call, but the specifics of that system call, as well as the complimentary system call, read(), need further explanation.

2.1 Basics of write() to terminal

Both the read() and write() system calls operate over file descriptors rather than file streams, and read from or write to buffers not strings. A file descriptor is just an integer, a number, that refers to a currently open file. The OS uses the file descriptor number as an index into the file descriptor table of currently open files to gain access to the actual device the I/O operations should be performed, like a file on disk, or writing to the terminal, or reading data from the network controller. (We discuss file descriptors more in the next section.)

While you might not know how to open new files yet as a file descriptor, we still have the standard input/output/error streams to work with. Finally, we can use standard descriptor numbers:

  • Standard Input : 0 : STDIN_FILENO
  • Standard Output : 1 : STDOUT_FILENO
  • Standard Error : 2 : STDERR_FILENO

You should note that the file descriptor numbers are the same as the numbers we used for bash programming and redirects … everything is connected. Also, unistd.h provides three constants to refer to the standard file descriptors, STIDN_FILENO, STDOUT_FILENO, and STDERR_FILENO, which can help improve the readbility of your code.

We can now start to piece together the write() command from above a bit more:

//    .-- Write to standard out, file descriptor 1
//    v
write(1, p, 1); 
//       ^  ^
//       |  '--- Number of bytes to write, just the char p points to
//       |
//       '-------- char *, points to the byte we want to write

The first argument to write() is the file descriptor of where we are writing. In this case, we are programming "Hello World", so we want to print to standard out, or file descriptor 1. The next argument is a bit more obvious, p points to a char we want to write, and the last argument is the number of bytes we want to write. Go through the "Hello World" program from top to bottom, we can now see that it just prints each character of the hello string to standard out, one at a time, until the NULL terminator is reached.

Now that we know how to write a string to the terminal using the system call write() without library functions, it's fun to tilt your head a bit and think for a minute about how just this little bit of code, just the write() system call, can be used to program all the file output we've learned so far. How might we program fputc() or printf()? I'm sure you could, but thankfully, we don't have to because someone did it for us in the standard library.

3 read() and write() in Detail

3.1 write()'ing a Buffer of Bytes

The above example was working with one byte at a time, but system call I/O is buffered. The write() and read() system calls are not string based I/O, like the format print functions. They will read and write any data type. Let's look at bit more at the actual function prototype form the man page:

  ssize_t write(int fd, const void *buf, size_t count);
//                  ^               ^            ^
//file descriptor---'       buffer--'            '-- num. bytes to write

Note that the second argument is not a char *, but rather a void *, which means that it accepts a pointer to any type. We refer to this as the buffer. A buffer is the general term for an array of bytes. Unlike a string, which is also an array of bytes, as char's, strings have the added property of always being NULL terminated. Buffers are more low-level, and can refer to any data type. As we learned in previous lessons, pointers and arrays are the same thing and that we can arbitrarily cast between different pointer types. This allows us to arbitrarily cast any data type to a byte array, a buffer, and work with the data byte-by-byte. Consider the example below, where we write a pair_t.

/*write_pair.c*/
#include <unistd.h>

typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t p;
  p.left = 10;
  p.right = 20;

  write(1, &p, sizeof(pair_t));

  return 0;
}

Now, this bit of code probably wouldn't give us terminal output that make sense to us humans because we are not writing strings. It will not print "10" or "20", and that's because write() is writes raw bytes. The data that is the pair is not ascii, and its individual bytes will not render like normal ascii characters. The temrinal does not understand how to render aribtrary bytes that are not unicode or ascii, and as a result, nothing gets dispalyed. But, the bytes are definitely getting written, and we can see that by read()'ing those bytes.

3.2 read()'ing a Buffer of Bytes

The read() command is exactly the same as the write() command, but in reverse. Data is read from the descriptor and written into the buffer. Here is the function prototype from the man page:

  ssize_t read(int fd,      void *buf, size_t count);
//                  ^               ^            ^
//file descriptor---'       buffer--'            '-- num. bytes to write

Again, the concept of a buffer as just an array of bytes is important. read() will attempt to read up to count number of bytes and store them into the buffer. The total number of bytes read is returned. This is important so that you know how many bytes made it into the buffer. If EOF is reached, read() returns 0.

To demonstrate the connection between read()'ing and write()'ing raw bytes, let's continue the example from above. Suppose we are interested in reading in the raw bytes of a pair_t. We can do the following:

/*read_pair.c*/
#include <unistd.h>
#include <stdio.h> //format print

typedef struct{
  int left;
  int right;
} pair_t;

int main(int argc, char * argv[]){

  pair_t p;

  read(0, &p, sizeof(pair_t));

  printf("left: %d right: %d\n",p.left, p.right);

  return 0;
}

Note that the read() is reading from file descriptor 0, which is standard input, and the buffer is the address of the p, reading at most the size of a pair_t. The read() command is just reading byte-by-byte the data that is the pair_t and filling up the memory region of p with those bytes. It might be a bit mystifying, but this actually works, and we can test it by aligning the two programs in a pipeline.

#>./write_pair | ./read_pair 
left: 10 right: 20

The write_pair program writes the raw bytes of a pair to standard output which is piped to the standard input of the read_pair program. read_pair then fills the buffer, that is the pair, with those bytes, and finally, we can print them out to the screen. In the parlance of system programming, "we're just shoveling bits around".

4 Opening Files

The last piece of the I/O puzzle is reading and writing from files. Previously, we've been using the fopen() and fclose() system call which returns a file stream, that is a FILE *. It works really well and is very easy to use, but these are C library functions which really use system calls. You know this because the OS is responsible for managing device I/O resources, such as reading and writing from keyboards, disks, etc, and the OS is also responsible for managing the file system, such as keeping track of files, directories, and paths. Both of those resources come into play when opening a file and reading and writing from that file.

4.1 File descriptors

The system call to open a file is open(), which is well named, and the system call to close a file is close(), also well named. These system calls are low level and do operate over file streams, as FILE *, but instead return an integer value which is the file descriptor.

All open files in the operating system are managed via file descriptors, which are indexes into the file descriptor table. The file descriptor table is a kernel data structure which tracks open files for all programs, and we'll discuss the details of this in a later lesson. For the purposes of today, the key concept is that we reference open files via an integer value, the file descriptor.

As previously discussed, each program comes ready made with three open file descriptors, the standard file descriptors. Each has an assigned number: 0, stdin; 1, stdout; and, 2 stderr. When you open a file, it will be assigned the next lowest file descriptor number available, which might be 3 for the first file, and then 4, and so on.

4.2 open()'ing a File Descriptor

To open a file we use the open() system call is define in the file control librar, fcntl.h, and the function prototype is as follows:

int   open(const char *path, int oflag, ... /*mode_t mode*/ );

There is either two or three arguments to open(). In the simple case, where we are not creating a file, open() only takes two arguments, but if a file is created, we need to specify the permission mode of that file, such ad read/write/exec.

Conceptually, open(), is a lot like fopen() in the simple case when you are opening a file for reading.

int fd = open("path/to/file", O_RDONLY);

The oflag argument is a lot like the mode from fopen(), but instead of using a string we use integer (and binary-combinations thereof) to indicate the desired open condition. Fortunately, these values are defined constants for us, so we don't have to combine integer flags ourselves. In the above example, the file at the given path is opened for reading, only, with O_RDONLY flag.

If we wanted to open a file for writing, truncate the file if it exists or create it if it does not exist, then we need to combine some flags and specify a mode. Here is an example:

int fd = open("test.txt", O_WRONLY | O_TRUNC | O_CREAT, 0664);

The second argument O_WRONLY | O_TRUNC | O_CREAT is often called an ORing, and refers to a set of options that are combined using the bit-wise OR operator. The way this works is that each option sets a bit in a field, in this case, one bit in the integer. The bitwise or, will result in the accumulation of all the set bits.

00000000000000000000000000000001      O_WRONLY
00000000000000000000010000000000      O_TRUNC
00000000000000000000001000000000      O_CREAT
--------------------------------- OR
00000000000000000000011000000001      O_WRONLY | O_TRUNC | O_CREAT

Here are the relevant option flags for opening a file:

  • O_RDONLY open for reading only
  • O_WRONLY open for writing only
  • O_RDWR open for reading and writing
  • O_APPEND append on each write
  • O_CREAT create file if it does not exist
  • O_TRUNC truncate size to 0

The mode portion of the arguments, 0664, is an octet, just like we use for chmod in has. The leading 0 is indicator that the following values are in octal, not base 10.

There are also settings shortcuts to reference different mode settings to use in an ORING

  • S_IRUSR 00400 owner has read permission
  • S_IWUSR 00200 owner has write permission
  • S_IXUSR 00100 owner has execute permission
  • S_IRGRP 00040 group has read permission
  • S_IWGRP 00020 group has write permission
  • S_IXGRP 00010 group has execute permission
  • S_IROTH 00004 others have read permission
  • S_IWOTH 00002 others have write permission
  • S_IXOTH 00001 others have execute permission

So 0644 is equivalent to the ORING:

S_IRUSR | S_IWUSER | S_IRGRP | S_IWGRP | S_IROTH

We since writing modes in octal is relatively straight forward (and less typing), we will switch between these two settings during the semester.

4.3 User Masks for File Creation

The last aspect of opening and creating files is the user mask, or umask. This is a mechanism to specify which permissions of newly created files should be turned off by default, and functions as a security parameter for the system.

The umask is specified as a mode in octal, just like above, except it is inverted. The bits that are set to one indicate that those permissions should be turned off by default. We can see the current umask of the system using the shell command:

aviv@saddleback: part2 $ umask
0027

The umask of 0027 specifies that files that are initially created should have all other read, write, and execute off and group execute off, but all other permissions should be allowable.

The way this is enforced, is when open() creates a new file it sets the permissions to:

mode & ~umask

The & and ~ symbols are bitwise operators for AND (&) and NOT (). The NOT operator () on bits will invert all the bits, so ones becomes zeros and zeros ones. And the AND operator (&) is a checks if both values are true.

So for the file creation with permission mode 0644 and mask 0027, we get the final creation permissoin:

   0664  & ~(0027) 
=  0664  &   0750
=  0640

Following the math, the inverse of 0027 in binary is,

~ 000  -> 111  = 7    = 7-0 
~ 010  -> 101  = 5    = 7-2
~ 111  -> 000  = 0    = 7-7

Because inverse flips all the bits, it is the same as subtracting the value from 7. With the inverse complete, we can now do the bitwise AND of 0750 and 0644 is:

   101 101 000
&  111 100 000
--------------
   110 100 000 = 640

The mask ensures that we don't create a file with more permissions than we want. In this example, with this umask, we removed the possibility of writing from the group and read,write, execute from everyone else.

4.4 close()'ing a File

Finally, to close the file descriptor you use the close() system call, which is defined in unistd.h. It has the following function prototype:

int     close(int filde)

All open file descriptors should be closed whenever they are no longer needed. Once a program exists, the file descriptors are closed automatically.