Lec 12: File and Device I/O using System Calls
Table of Contents
1 System Calls, File Management and Device I/O
In the last lesson, we identified the system resources that the OS is responsible for managing. These include:
- Device Management : Hardware devices, such as keyboard, monitors, printers, hard drives, etc., are resources managed by the computer. When a programmer wishes to interact with these devices, a unified interface provided by the OS is used.
- Process Management : The invocation and execution of a program, a process, is managed by the OS, including managing its current state, running or stopped, as well as the loading of code.
- Memory Management : The access to physical and virotual memory is controlled by the OS, and a programs memory layout and current allocations is carefully managed.
- File System Management : The OS is also responsible for ensuring that programs can read and write from the filesystem, but also that programs don't corrupt the file sysystem or access files/directory that they do not have permission to.
The major theme of this course is understanding the OS System Call
API that the Unix system uses to access these resources. So far,
we've been using the system call API via the C standard library, but
underneath the covers, system calls were being used. As an example of
the difference between a system call and a library lesson, in the
last lesson, we identified that the memory management functions ,
malloc()
and calloc()
, is actually a standard library
function. Real memory allocation occurs via the system call sbrk()
,
which adjust the break point to increase the size of the heap.
In today's lesson, we are going to do the same for the Device
Management and File System Management resources. In previous lessons,
we have interacted with the file system and performed I/O via the
file stream interface, FILE *
, and used standard I/O library
functions like, fprintf()
and fputc()
and etc. The file stream
interface is a lot like malloc()
, it is a nice library feature that
provides a service; under the file stream interface lies the system
calls that help manage file opening and closing as well as reading
and writing from files. We are going to go back to first principles,
let's do Hello World again!
2 Hello World (again)
From now on, let's assume that we don't have the C standard library,
or the C standard I/O library: How do we write our Hello World
program? We need to use a lower level function, a system call, to
write directly to the standard out device, i.e., the terminal
window. The system call that writes to a file or device is
write()
. Bellow, is a the system call hello world:
#include <unistd.h> int main(int argc, char * argv[]){ char hello[] = "Hello World\n"; char *p; for(p = hello ; *p ; p++){ write(1, p, 1); } }
Much of this program should be familiar to you. We assign the "Hello
World" string to the array hello
, and we then iterate over that
array one character at a time using pointer arithmetic until the
end. The output is via the write()
system call, but the specifics
of that system call, as well as the complimentary system call,
read()
, need further explanation.
2.1 Basics of write()
to terminal
Both the read()
and write()
system calls operate over file
descriptors rather than file streams, and read from or write to
buffers not strings. A file descriptor is just an integer, a
number, that refers to a currently open file. The OS uses the file
descriptor number as an index into the file descriptor table of
currently open files to gain access to the actual device the I/O
operations should be performed, like a file on disk, or writing to
the terminal, or reading data from the network controller. (We
discuss file descriptors more in the next section.)
While you might not know how to open new files yet as a file descriptor, we still have the standard input/output/error streams to work with. Finally, we can use standard descriptor numbers:
- Standard Input : 0 :
STDIN_FILENO
- Standard Output : 1 :
STDOUT_FILENO
- Standard Error : 2 :
STDERR_FILENO
You should note that the file descriptor numbers are the same as
the numbers we used for bash programming and redirects …
everything is connected. Also, unistd.h
provides three constants
to refer to the standard file descriptors, STIDN_FILENO
,
STDOUT_FILENO
, and STDERR_FILENO
, which can help improve the
readbility of your code.
We can now start to piece together the write()
command from above
a bit more:
// .-- Write to standard out, file descriptor 1 // v write(1, p, 1); // ^ ^ // | '--- Number of bytes to write, just the char p points to // | // '-------- char *, points to the byte we want to write
The first argument to write()
is the file descriptor of where we
are writing. In this case, we are programming "Hello World", so we
want to print to standard out, or file descriptor 1. The next
argument is a bit more obvious, p
points to a char
we want to
write, and the last argument is the number of bytes we want to
write. Go through the "Hello World" program from top to bottom, we
can now see that it just prints each character of the hello
string
to standard out, one at a time, until the NULL
terminator is
reached.
Now that we know how to write a string to the terminal using the
system call write()
without library functions, it's fun to tilt
your head a bit and think for a minute about how just this little
bit of code, just the write()
system call, can be used to program
all the file output we've learned so far. How might we program
fputc()
or printf()
? I'm sure you could, but thankfully, we
don't have to because someone did it for us in the standard library.
3 read()
and write()
in Detail
3.1 write()
'ing a Buffer of Bytes
The above example was working with one byte at a time, but system
call I/O is buffered. The write()
and read()
system calls are
not string based I/O, like the format print functions. They will
read and write any data type. Let's look at bit more at the actual
function prototype form the man page:
ssize_t write(int fd, const void *buf, size_t count); // ^ ^ ^ //file descriptor---' buffer--' '-- num. bytes to write
Note that the second argument is not a char *
, but rather a void
*
, which means that it accepts a pointer to any type. We refer to
this as the buffer. A buffer is the general term for an array of
bytes. Unlike a string, which is also an array of bytes, as
char
's, strings have the added property of always being NULL
terminated. Buffers are more low-level, and can refer to any data
type. As we learned in previous lessons, pointers and arrays are
the same thing and that we can arbitrarily cast between different
pointer types. This allows us to arbitrarily cast any data type to
a byte array, a buffer, and work with the data
byte-by-byte. Consider the example below, where we write a
pair_t
.
#include <unistd.h> typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t p; p.left = 10; p.right = 20; write(1, &p, sizeof(pair_t)); return 0; }
Now, this bit of code probably wouldn't give us terminal output
that make sense to us humans because we are not writing strings. It
will not print "10" or "20", and that's because write()
is writes
raw bytes. The data that is the pair is not ascii, and its
individual bytes will not render like normal ascii characters. The
temrinal does not understand how to render aribtrary bytes that are
not unicode or ascii, and as a result, nothing gets dispalyed. But,
the bytes are definitely getting written, and we can see that by
read()
'ing those bytes.
3.2 read()
'ing a Buffer of Bytes
The read()
command is exactly the same as the write()
command,
but in reverse. Data is read from the descriptor and written into
the buffer. Here is the function prototype from the man page:
ssize_t read(int fd, void *buf, size_t count); // ^ ^ ^ //file descriptor---' buffer--' '-- num. bytes to write
Again, the concept of a buffer as just an array of bytes is
important. read()
will attempt to read up to count
number of
bytes and store them into the buffer. The total number of bytes
read is returned. This is important so that you know how many bytes
made it into the buffer. If EOF is reached, read()
returns 0.
To demonstrate the connection between read()
'ing and
write()
'ing raw bytes, let's continue the example from above.
Suppose we are interested in reading in the raw bytes of a
pair_t
. We can do the following:
#include <unistd.h> #include <stdio.h> //format print typedef struct{ int left; int right; } pair_t; int main(int argc, char * argv[]){ pair_t p; read(0, &p, sizeof(pair_t)); printf("left: %d right: %d\n",p.left, p.right); return 0; }
Note that the read()
is reading from file descriptor 0, which is
standard input, and the buffer is the address of the p
, reading at
most the size of a pair_t
. The read()
command is just reading
byte-by-byte the data that is the pair_t
and filling up the memory
region of p
with those bytes. It might be a bit mystifying, but
this actually works, and we can test it by aligning the two programs
in a pipeline.
#>./write_pair | ./read_pair left: 10 right: 20
The write_pair
program writes the raw bytes of a pair to standard
output which is piped to the standard input of the read_pair
program. read_pair
then fills the buffer, that is the pair, with
those bytes, and finally, we can print them out to the screen. In
the parlance of system programming, "we're just shoveling bits
around".
4 Opening Files
The last piece of the I/O puzzle is reading and writing from
files. Previously, we've been using the fopen()
and fclose()
system call which returns a file stream, that is a FILE *
. It
works really well and is very easy to use, but these are C library
functions which really use system calls. You know this because the
OS is responsible for managing device I/O resources, such as reading
and writing from keyboards, disks, etc, and the OS is also
responsible for managing the file system, such as keeping track of
files, directories, and paths. Both of those resources come into
play when opening a file and reading and writing from that file.
4.1 File descriptors
The system call to open a file is open()
, which is well named,
and the system call to close a file is close()
, also well
named. These system calls are low level and do operate over file
streams, as FILE *
, but instead return an integer value which is
the file descriptor.
All open files in the operating system are managed via file descriptors, which are indexes into the file descriptor table. The file descriptor table is a kernel data structure which tracks open files for all programs, and we'll discuss the details of this in a later lesson. For the purposes of today, the key concept is that we reference open files via an integer value, the file descriptor.
As previously discussed, each program comes ready made with three open file descriptors, the standard file descriptors. Each has an assigned number: 0, stdin; 1, stdout; and, 2 stderr. When you open a file, it will be assigned the next lowest file descriptor number available, which might be 3 for the first file, and then 4, and so on.
4.2 open()
'ing a File Descriptor
To open a file we use the open()
system call is define in the
file control librar, fcntl.h
, and the function prototype is as
follows:
int open(const char *path, int oflag, ... /*mode_t mode*/ );
There is either two or three arguments to open()
. In the simple
case, where we are not creating a file, open()
only takes two
arguments, but if a file is created, we need to specify the
permission mode of that file, such ad read/write/exec.
Conceptually, open()
, is a lot like fopen()
in the simple case
when you are opening a file for reading.
int fd = open("path/to/file", O_RDONLY);
The oflag
argument is a lot like the mode from fopen()
, but
instead of using a string we use integer (and binary-combinations
thereof) to indicate the desired open condition. Fortunately, these
values are defined constants for us, so we don't have to combine
integer flags ourselves. In the above example, the file at the
given path is opened for reading, only, with O_RDONLY
flag.
If we wanted to open a file for writing, truncate the file if it exists or create it if it does not exist, then we need to combine some flags and specify a mode. Here is an example:
int fd = open("test.txt", O_WRONLY | O_TRUNC | O_CREAT, 0664);
The second argument O_WRONLY | O_TRUNC | O_CREAT
is often called
an ORing, and refers to a set of options that are combined using
the bit-wise OR operator. The way this works is that each option
sets a bit in a field, in this case, one bit in the integer. The
bitwise or, will result in the accumulation of all the set bits.
00000000000000000000000000000001 O_WRONLY 00000000000000000000010000000000 O_TRUNC 00000000000000000000001000000000 O_CREAT --------------------------------- OR 00000000000000000000011000000001 O_WRONLY | O_TRUNC | O_CREAT
Here are the relevant option flags for opening a file:
O_RDONLY
open for reading onlyO_WRONLY
open for writing onlyO_RDWR
open for reading and writingO_APPEND
append on each writeO_CREAT
create file if it does not existO_TRUNC
truncate size to 0
The mode portion of the arguments, 0664, is an octet, just like we
use for chmod
in has. The leading 0 is indicator that the
following values are in octal, not base 10.
There are also settings shortcuts to reference different mode settings to use in an ORING
S_IRUSR
00400 owner has read permissionS_IWUSR
00200 owner has write permissionS_IXUSR
00100 owner has execute permissionS_IRGRP
00040 group has read permissionS_IWGRP
00020 group has write permissionS_IXGRP
00010 group has execute permissionS_IROTH
00004 others have read permissionS_IWOTH
00002 others have write permissionS_IXOTH
00001 others have execute permission
So 0644 is equivalent to the ORING:
S_IRUSR | S_IWUSER | S_IRGRP | S_IWGRP | S_IROTH
We since writing modes in octal is relatively straight forward (and less typing), we will switch between these two settings during the semester.
4.3 User Masks for File Creation
The last aspect of opening and creating files is the user mask, or
umask
. This is a mechanism to specify which permissions of newly
created files should be turned off by default, and functions as a
security parameter for the system.
The umask
is specified as a mode in octal, just like above,
except it is inverted. The bits that are set to one indicate that
those permissions should be turned off by default. We can see the
current umask
of the system using the shell command:
aviv@saddleback: part2 $ umask 0027
The umask of 0027 specifies that files that are initially created should have all other read, write, and execute off and group execute off, but all other permissions should be allowable.
The way this is enforced, is when open()
creates a new file it
sets the permissions to:
mode & ~umask
The & and ~ symbols are bitwise operators for AND (&) and NOT
(). The NOT operator (
) on bits will invert all the bits, so ones
becomes zeros and zeros ones. And the AND operator (&) is a checks
if both values are true.
So for the file creation with permission mode 0644
and mask
0027
, we get the final creation permissoin:
0664 & ~(0027) = 0664 & 0750 = 0640
Following the math, the inverse of 0027 in binary is,
~ 000 -> 111 = 7 = 7-0 ~ 010 -> 101 = 5 = 7-2 ~ 111 -> 000 = 0 = 7-7
Because inverse flips all the bits, it is the same as subtracting the value from 7. With the inverse complete, we can now do the bitwise AND of 0750 and 0644 is:
101 101 000 & 111 100 000 -------------- 110 100 000 = 640
The mask ensures that we don't create a file with more permissions than we want. In this example, with this umask, we removed the possibility of writing from the group and read,write, execute from everyone else.
4.4 close()
'ing a File
Finally, to close the file descriptor you use the close()
system call, which
is defined in unistd.h
. It has the following function prototype:
int close(int filde)
All open file descriptors should be closed whenever they are no longer needed. Once a program exists, the file descriptors are closed automatically.