IC221: Systems Programming (SP18)


Home Policy Calendar Units Assignments Resources

Unit 1: The Unix System

Table of Contents

1 UNIX and You

unix-plate2.jpg

1.1 The 1000 Foot View of the UNIX system

unix-history.png

unix-simple.png

Why UNIX?

  • UNIX is an important OS in the history of computing
  • Two major OS variants, UNIX-based and Windows-based
  • Used in a lot of back-end systems and personal computing
  • UNIX derivatives are open source, and well known to the community and developed in the open where we can study and understand them.
  • The skills you learn on UNIX will easily translate to other OS platforms because all UNIX-based systems share standard characteristics.

1.2 Unix History

Unix has a rich history dating back to the development of Multics and then Unic system. First developed at AT&T Bell labs in the early 70's. These early forms have dispersed and integrated into computing, include the Ubuntu, Debian Linux Variants variants we use here and the MacOS, BSD variants. Unix is even run on the mobile platforms, both iOS and Android are based on Unix system architecture.

1.3 The UNIX Components

The UNIX-Computer ecosystem can be divided into three main parts:

  • User Space: Defines the applications, libraries, and standard utilities that are user accessible. When we write a program, it is from this perspective that we operate, without concern for the underlying components. For example, writing a "Hello World" program on any computer is the same from the user-perspective, but might be different when it comes to actually executing the program and writing "Hello World" to the terminal.
  • Kernel Space: This refers to the operations of OS that manage the interface between user actions and the hardware. It is the central part of the OS, and its primary job is to pair user applications with the underlying hardware and allow multiple programs to share singular hardware components. For example, how does a user input event, such as typing 'a' on the keyboard, get translated into 'a' appearing on the screen? Or, how does two programs both read from disc at the same time or run on the CPU at the same time?
  • Hardware: The underlying physical components of the computer. These include Input/Output devices, like keyboards and monitors, the CPU which does calculations, the memory components, and the network interface.

TheBigPicture.png

The OS'es primary task is to manage services as an interface between the user and the hardware. Examples include:

  • File System: managing files on the user
  • Device I/O: managing input from devices
  • Processes: Starting, running, and stopping programs, and allowing multiple programs to run at once, i.e., program multiprogramming.
  • Memory Management: Allocating runtime memory for process and separating memory between process and between user-space and the kernel-space

From a user of the OS, you will see these interactions from two perspectives:

  • Shell: You will use the shell to interact with the OS
  • System Call API: You will program in C to interact with the OS

The big part of this interaction comes from the System Call API, which you will use C. Why C?

  • C is a low level language
  • The OS is written in C
  • Understanding the OS and C together is a natural process and will make you a better programmer

2 UNIX Design Philosophy

The UNIX Design Philosophy is best exemplified through a quote by Doug McIlroy, a key contributor to early UNIX systems:

This is the UNIX philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

All the UNIX command line tools we've looked at so far meet this philosophy, and the tools we will program in the class will as well.

2.1 Write programs that do one thing and do it well

If we look at the command line tools for processing files, we see that there is a separate tool for each task. For example, we do not have a tool called headortail that can take either the first or last lines of a file. We have separate tools for each of the tasks.

While this might seem like extra work, it actually enables the user to be more precise about what he/she is doing, as well as be more expressive. It also improves the readability and comprehension of commands; the user doesn't have to read lots of command line arguments to figure out what's going on.

2.2 Write programs that work well together

The command line tools we look at also inter-operate really well because they complement each other. For example, you can use cut to get a field from structure data, then you can use grep to isolate some set of those fields, and finally you can use wc to count how many fields remain.

2.3 Write programs to handle text streams

Finally, the ability to handle text streams is the basis of the pipeline that enables small and simple UNIX commands to be "glued" together to form more complex and interesting UNIX operations.

This philosophy leads to the development of well formed UNIX command line tools that have the three properties:

  1. They can take input from the terminal, through a pipe or redirection, or by reading an input file provided as an argument
  2. They write all non-error output back to the terminal such that it can be read as input by another command via a pipe or can be redirected to a file.
  3. They do not write error information to standard output in a way that does not interfere with a pipeline.

This process of taking input from the terminal and writing output to the terminal is the notion of handling text streams through the pipeline. In this lecture, we will look at this process in more detail.

3 Standard Input, Standard Output, and Standard Error

Every UNIX program is provided with three standard file streams or standard file descriptors to read and write input from.

  1. Standard Input (stdin file stream, file descriptor 0): The primary input stream for reading information printed to the terminal
  2. Standard Output (stdout file stream, file descriptor 1): The primary output stream for printing information and program output to the terminal
  3. Standard Error (stderr file stream, file descriptor 2): The primary error stream for printing information to the terminal that resulted from an error in processing or should not be considered part of the program output.

3.1 Pipelines

Pipelines are sequence of processes, executed in parallel, where the standard output of one program to the standard input of another program, via a pipe, provides dependencies (e.g., one program waiting on the output of another before proceedings) such that programs complete in series. This fits into the UNIX design philosophy well by allowing smaller programs to work well together, connecting computation via a text stream.

We denote the piping between programs in a pipeline using the the | (pipe symbol). Consider this simple C++ program that reads a string from a user and prints it in reverse:

#include <iostream>
using namespace std;

int main(){
  string s;
  cin >> s; //read from stdin
  for(int i=s.size()-1;i>=0;i--)
    cout << s[i]; //write to stdout
  cout << endl;
}

We can run this program directly on the command line like so:

$ ./reverse 
Hello
olleH

But that requires us the user to type, but what if we could use another program to produce the output. There is a built in command called echo that will do just that:

$ echo "Hello"
Hello
$ echo "Hello" | ./reverse 
olleH

This time, we have piped the output of the echo command to the input of the reverse command, producing the same output as before.

Now, what if we wanted to reverse our reversed output? We could also pipe that output again to the next program.

$ echo "Hello" | ./reverse | ./reverse 
Hello

This is the power of a pipeline, and if you imagined, there is nothing stopping us from doing more complicated stuff.

3.2 Sample Commands Using stdin and stdout

As an example of using pipes to control standard input and output, lets look at the head and tail command, each either prints the first lines or last lines of a file, respectively.

For example, the file file1 in this example has 1000 lines, each labeled. We could just print the first 10 lines like so:

$ head file1
file1 line 1
file1 line 2
file1 line 3
file1 line 4
file1 line 5
file1 line 6
file1 line 7
file1 line 8
file1 line 9
file1 line 10

A similar command for tail prints the last 10 lines

$ tail file1
file1 line 991
file1 line 992
file1 line 993
file1 line 994
file1 line 995
file1 line 996
file1 line 997
file1 line 998
file1 line 999
file1 line 1000

We could connect these two commands together to print any consecutive lines of a file. For example, suppose we want to print lines 800-810:

$ head -810 file1 | tail
file1 line 801
file1 line 802
file1 line 803
file1 line 804
file1 line 805
file1 line 806
file1 line 807
file1 line 808
file1 line 809
file1 line 810

The -810 flag to head, says to print the first 810, then tail reads from stdin and prints the last 10 of those lines, giving us the lines between 800 and 810.

The cat command, which you should be familiar with already, prints each of the files specified as its arguments to the terminal. For example, consider the following two files:

$ cat BeatArmy.txt 
Beat Army
$ cat GoNavy.txt
Go Navy
$ cat GoNavy.txt BeatArmy.txt 
Go Navy
Beat Army

cat will also read from the stdin using a pipe with no arguments, but we can also use - by itself to say when in the sequence of files to read from stdin. For example.

$ cat GoNavy.txt | cat BeatArmy.txt - BeatArmy.txt 
Beat Army
Go Navy
Beat Army

So we can now hook up our previous command to surround the 800-810 file with "Go Navy" and "Beat Army"

$ head -810 file1 | tail | cat GoNavy.txt - BeatArmy.txt 
Go Navy
file1 line 801
file1 line 802
file1 line 803
file1 line 804
file1 line 805
file1 line 806
file1 line 807
file1 line 808
file1 line 809
file1 line 810
Beat Army

3.3 Pipes and stderr

One challenge with a pipeline is that all the output of a program gets redirected to the input of the next program. What if there was a problem or error to report?

Given the description of the standard file descriptors, we can better understand a pipelines with respect to the standard file descriptors.

Head writes to stdout--. .---the stdout of head is the stdin of cat
                       | |
                       v v

   head -3 BAD_FILENAME | cat GoNavy.txt - BeatArmy.txt  

                       \_/
                        |
                A pipe just connects the stdout of 
                one command to the stdin of another

The pipe (|) is a semantic construct for the shell to connect the standard output of one program to the standard input of another program, thus piping the output to input.

The fact that input is connected to output in a pipeline actually necessitates stderr because if an error was to occur along the pipeline, you would not want that error to propagate as input to the next program in the pipeline especially when the pipeline can proceed despite the error. There needs to be a mechanism to report the error to the terminal outside the pipeline, and that mechanism is standard error.

As an example, consider the the case where head is provided a bad file name.

#> head -3 BAD_FILENAME| cat BeatArmy.txt - GoNavy.txt  
head: BAD_FILENAME: No such file or directory          <--- Written to stderr not piped to cat
Go Navy!                                 
Beat Army!

Here, head has an error BAD_FILENAME doesn't exist, so head prints an error message to stderr and does not write anything to stdout, and thus, cat only prints the contents of the two files to stdout. If there was no stderr, then head could only report the error to stdout and thus it would interfere with the pipeline;

head: BAD_FILENAME: No such file or directory        

is not part of the first 3 lines of any file.

3.4 Redirecting stdin, stdout, and stderr

In addition to piping the standard file streams, you can also redirect them to a file on the filesystem. The redirect symbols is > and <. Consider a dummy command below:

cmd < input_file > output_file 2> error_file

This would mean that cmd (a fill in for a well formed UNIX command) will read input from the file input_file, all output that would normally go to stdout is now written to output_file, and any error messages will be written to error_file. Note that 2 and the > together (2>) indicates to redirect file descriptor 2, which maps to stderr (see above).

You can also use redirects interspersed in a pipeline like below.

cmd < input_file | cmd 2> error_file |  cmd > output_file

However, you cannot mix two redirects for the same standard stream, like so:

cat input_file > output_file | head 

This command will result in nothing being printed to the screen via head and all redirected to output_file. This is because the > and < redirects always take precedence over a pipe, and the last > or < in a sequence takes the most precedence. For example:

cat input_file > out1 > out2 | head

will write the contents of the input file to the out2 file and not to out1.

Output redirects will create the file if it doesn't already exist or will truncate the file if it does exists. By truncate, we mean that the redirection will erase an existing file and create a new one. There are situations where, instead, you want to append to the end of the file, such as accumulating log files. You can do such output redirects with >> symbols, double greater-then signs. For example,

cat input_file > out
cat input_file >> out

will produce two copies of the input file concatenated together in the output file, out.

4 Reading and Writing to /dev/null and other /dev's

There are times when you are building UNIX commands that you want to redirect your output or error information to nowhere … you just want it to disappear. This is a common enough need that UNIX has built in files that you can redirect to and from.

Perhaps the best known is /dev/null. Note that this file exists in the /dev path which means it is not actually a file, but rather a device or service provided by the UNIX Operating System. The null device's sole task in life is to turn things into null or zero them out. For example, consider the following pipeline with the BAD_FILENAME from before.

#> head -3 BAD_FILENAME 2> /dev/null | cat BeatArmy.txt - GoNavy.txt  
Go Navy!
Beat Army!

Now, we are redirecting the error from head to /dev/null, and thus it goes nowhere and is lost. If you try and read from /dev/null, you get nothing, since the null device makes things disappear. The below command is similar to using touch to create a new, empty file since head reads nothing and then writes nothing to file, creating an empty file.

head /dev/null > file. 

You may think that this is a completely useless tool, but there are plenty of times where you need something to disappear – such as input or output or your ic221 homework – that is when you need /dev/null.

4.1 Other useful redirect dev's

UNIX also provides a number of device files for getting information:

  • /dev/zero : Provide zero bytes. If you read from /dev/zero you only get zero. For example the following writes 20 zero bytes to a file:
head -c 20 /dev/zero > zero-20-byte-file.dat
  • /dev/urandom : Provides random bytes. If you read from /dev/urandom you get a random byte. For example the following writes a random 20 bytes to a file:
head -c 20 /dev/urandom > random-20-byte-file.dat

4.2 (Extra) A note on the /dev directory and the OS

The files you find in /dev are not really files, but actually devices provided by the Operating System. A device generally connects to some input or output component of the OS. The three devices above (null, zero, and urandom) are special functions of the OS to provide the user with a null device (null), a consistent zero base (zero) , and a source of random entropy (urandom).

If we take a closer look at the /dev directory you see that there is actually quite a lot going on here.

alarm            hidraw0          network_throughput  ram9      tty13  tty35  tty57      ttyS2       vboxusb/
ashmem           hidraw1          null                random    tty14  tty36  tty58      ttyS20      vcs
autofs           hpet             oldmem              rfkill    tty15  tty37  tty59      ttyS21      vcs1
binder           input/           parport0            rtc@      tty16  tty38  tty6       ttyS22      vcs2
block/           kmsg             port                rtc0      tty17  tty39  tty60      ttyS23      vcs3
bsg/             kvm              ppp                 sda       tty18  tty4   tty61      ttyS24      vcs4
btrfs-control    lirc0            psaux               sda1      tty19  tty40  tty62      ttyS25      vcs5
bus/             log=             ptmx                sda2      tty2   tty41  tty63      ttyS26      vcs6
cdrom@           loop0            pts/                sg0       tty20  tty42  tty7       ttyS27      vcsa
cdrw@            loop1            ram0                sg1       tty21  tty43  tty8       ttyS28      vcsa1
char/            loop2            ram1                shm@      tty22  tty44  tty9       ttyS29      vcsa2
console          loop3            ram10               snapshot  tty23  tty45  ttyprintk  ttyS3       vcsa3
core@            loop4            ram11               snd/      tty24  tty46  ttyS0      ttyS30      vcsa4
cpu/             loop5            ram12               sr0       tty25  tty47  ttyS1      ttyS31      vcsa5
cpu_dma_latency  loop6            ram13               stderr@   tty26  tty48  ttyS10     ttyS4       vcsa6
disk/            loop7            ram14               stdin@    tty27  tty49  ttyS11     ttyS5       vga_arbiter
dri/             loop-control     ram15               stdout@   tty28  tty5   ttyS12     ttyS6       vhost-net
dvd@             lp0              ram2                tpm0      tty29  tty50  ttyS13     ttyS7       watchdog
dvdrw@           mapper/          ram3                tty       tty3   tty51  ttyS14     ttyS8       watchdog0
ecryptfs         mcelog           ram4                tty0      tty30  tty52  ttyS15     ttyS9       zero
fb0              mei              ram5                tty1      tty31  tty53  ttyS16     uinput
fd@              mem              ram6                tty10     tty32  tty54  ttyS17     urandom
full             net/             ram7                tty11     tty33  tty55  ttyS18     vboxdrv
fuse             network_latency  ram8                tty12     tty34  tty56  ttyS19     vboxnetctl

You will learn more about /dev's in your OS class, but for now you should know that this is a way to connect the user-space with the kernel-space through the file system. It is incredibly powerful and useful, beyond just sending stuff to /dev/null.

Each of the files is the input to some OS process. For example, each of the tty information is a terminal that is open on the computer. The ram refer to what is currently in the computer's memory. The dvd and cdrom, that is the file that you write and read to when connecting with the dvd/cd-rom. And the items under disk, that a way to get to the disk drives.

5 File Permissions and Ownership chmod and chown

Continuing our exploration of the UNIX file system and command line operations, we now turn our attention to the file ownership and permissions. One of the most important services that the OS provides is security oriented, ensuring that the right user access the right file in the right way.

Lets first remind ourselves of the properties of a file that are returned by running ls -l:

.- Directory?
|    .-------Permissions                   .- Directory Name
| ___|___     .----- Owner                 |
v/       \    V     ,---- Group            V
drwxr-x--x 4 aviv scs 4096 Dec 17 15:14 ic221
-rw------- 1 aviv scs 400  Dec 19  2013 .ssh/id_rsa.pub
                       ^   \__________/    ^   
File Size -------------'       |           '- File Name
  in bytes                     |              
                               |
   Last Modified --------------'

There are two important parts to this discussion: the owner/group and the permissions. The owner and the permissions are directly related to each other. Often permissions are assigned based on user status to the file, either being the owner or part of a group of users who have certain access to the file.

5.1 File Ownership and Groups

The owner of a file is the user that is directly responsible for the file and has special status with respect to the file permission. Users can also be grouped together in group, a collection of users who posses the same permissions. A file also has a group designation to specify which permission should apply.

You all are already aware of your username. You use it all the time, and it should be a part of your command prompt. To have UNIX tell you your username and connection information on this machine, use the command, who am i:

aviv@unix: ~ $ who am i
aviv     pts/24       2014-12-29 10:44 (potbelly.academy.usna.edu)

You can use the whoami (no spaces) command to print only your username:

aviv@unix: ~ $ whoami
aviv

The first part of the output is the username, for me that is aviv, for you it will be your username. The rest of the information in the output refers to the terminal, the time the terminal was created, and from which host you are connected. We will learn about terminals later in the semester. (And yes, I name my computers after pigs.)

You can determine which groups you are in using the groups command.

aviv@unix: ~ $ groups
scs sudo

On this computer, I am in the scs group which is for computer science faculty members. I am also in the sudo group, which is for users who have super user access to the machine. Since UNIX is my personal work computer, I have sudo access.

5.2 The password and group file

Groupings are defined in two places. The first is a file called /etc/passwd which manages all the users of the system. Here is my /etc/passwd entry:

aviv@unix: ~ $ grep aviv /etc/passwd
aviv:x:35001:10120:Adam Aviv {}:/home/scs/aviv:/bin/bash

The first two parts of that file describe the userid and groupid, which are 35001 and 10120, respectively. These numbers are the actual group and user names, but UNIX nicely converts these numbers into names for our convenience. The translation between userid and username is in the password file. The translation between groupid and group name is in the group file, /etc/group. Here is the SCS entry in the group file:

aviv@unix: ~ $ grep scs /etc/group
scs:*:10120:webadmin,www-data,lucas,slack

There you can see that the users webadmin, www-data, lucas and slack are also in the SCS group. While my username is not listed directly, I am still in the scs group as defined by the entry in the password file.

Take a moment to explore these files and the commands. See what groups you are in.

5.3 File Permissions

We can now turn our attention to the permission string. A permission is simply a sequence of 9 bits broken into 3 octets of 3 bits each. An octet is a base 8 number that goes from 0 to 7, and 3 bits uniquely define an octet since all the numbers between 0 and 7 can be represented in 3 bits.

Within an octet, there are three permission flags, read, write and execute. These are often referred to by their short hand, r, w, and x. The setting of a permission to on means that the bit is 1. Thus for a set of possible permission states, we can uniquely define it by an octal number

rwx -> 1 1 1 -> 7
r-x -> 1 0 1 -> 5
--x -> 0 0 1 -> 1
rw- -> 1 1 0 -> 6

A full file permission consists of the octet set in order of user, group, and global permission.

 ,-Directory Bit
|       
|       ,--- Global Permission
v      / \
-rwxr-xr-x 
 \_/\_/
  |  `--Group Permission
  |   
   `-- Owner Permission

These define the permission for the user of the file, what users in the same group of the file, and what everyone else can do. For a full permission, we can now define it as 3 octal numbers:

-rwxrwxrwx -> 111 111 111 -> 7 7 7 
-rwxrw-rw- -> 111 110 110 -> 7 6 6
-rwxr-xr-x -> 111 101 101 -> 7 5 5

To change a file permission, you use the chmod command and indicate the new permission through the octal. For example, in part5 directory, there is an executable file hello_world. Let's try and execute it. To do so, we insert a ./ in the front to tell the shell to execute the local file.

> ./hello_world
-bash: ./hello_world: Permission denied

The shell returns with a permission denied. That's because the execute bit is not set.

#> ls -l hello_world 
-rw------- 1 aviv scs 7856 Dec 23 13:51 hello_world

Let's start by making the file just executable by the user, the permission 700. And now we can execute the file:

#> chmod 700 hello_world 
#> ls -l hello_world
-rwx------ 1 aviv scs 7856 Dec 23 13:51 hello_world
#> ./hello_world
Hello World!

This file can only be executed by the user, not by anyone else because the permissions for the group and the world are still 0. To add group and world permission to execute, we use the permission setting 711:

#> chmod 711 hello_world 
#> ls -l hello_world 
-rwx--x--x 1 aviv scs 7856 Dec 23 13:51 hello_world

At times using octets can be cumbersome, for example, when you want to set all the execute or read bits but don't want to calculate the octet. In those cases you can use shorthands.

  • r, w, x shorthands for permission bit read, write and execute
  • The + indicates to add a permission, as in +x or +w
  • The - indicates to remove a permission, as in -x or -w
  • u, g, o shorthands for permission bit user, group, and other
  • a shorthand refers to all, applying the permission to user, group, and other

Then we can change the permission

chmod +x file   <-- set all the execute bits
chmod a+r file  <-- set the file world readable
chmod -r  file  <-- unset all the read bits
chmod gu+w file <-- set the group and user write bits to true

Depending on the situation, either the octets or the shorthands are preferred.

5.4 Changing File Ownership and Group

The last piece of the puzzle is how do we change the ownership and group of a file. Two commands:

  • chown user file/directory : change owner of the file/directory to the user
  • chgrp group file.directory : change group of the file to the group

Permission to change the owner of a file is reserved only for the super user for security reasons. However, changing the group of the file is reserved only for the owner.

aviv@unix: demo $ ls -l
total 16
-rwxr-x--- 1 aviv scs 9133 Dec 29 10:39 helloworld
-rw-r----- 1 aviv scs   99 Dec 29 10:39 helloworld.cpp
aviv@unix: demo $ chgrp mids helloworld
aviv@unix: demo $ ls -l
total 16
-rwxr-x--- 1 aviv mids 9133 Dec 29 10:39 helloworld
-rw-r----- 1 aviv scs    99 Dec 29 10:39 helloworld.cpp

Note now the hello world program is in the mids group. I can still execute it because I am the owner:

aviv@unix: demo $ ./helloworld 
Hello World

However if I were to change the owner, to say, pepin, we get the following error:

  aviv@unix: demo $ chown pepin helloworld
chown: changing ownership of ‘helloworld’: Operation not permitted

Consider why this might be. If any user can change the ownership of a file, then they could potentially upgrade or downgrade the permissions of files inadvertently, violating a security requirement. As such, only the super user, or the administrator, can change ownership settings.