IC221: Systems Programming (SP15)


Home Policy Calendar

Lab 01: Unix Familiarization and Command Line Tools

Table of Contents

1 Unix File System and Command Line Tools intro

1.1 File System Preliminaries

The file system is a way to organize data into files and directories for ease of access. A file system is hierarchal, which means that everything is arranged in order of rank, like your chain of command. We describe hiearchical structures like this as trees.

fs_tree.png

Consider the tree above. All trees are composed of three components, root, sub-roots, and leaves. The root of the tree is the top-ranked item, in the example above, this would be foo. Sub-roots are roots of smaller trees, but are not at the top, for example, bar is a sub-root. Leaves exists at the bottom, and do not have any sub-ranked items.

We also describe items as having parents and children. What makes an item a root is that it has no parent, like foo. A sub-root, can be defined as an item that has a parent and children; like bar, whose parent is foo and children are xyzzy and bar. The leaves are items that only have parents and no children, like garply and baz.

In a file system, we have terms for roots and sub roots. They are called folders or directories. A folder can contain folders and files, which is a child in the tree. A folder may also be a child in the tree if it contains no sub-folders or files.

A path in the file system hierarchy for a given file or folder describes the parents all the way up to the root. For example, the path of baz is /foo/bar/baz. We designate the parent-child relationship using the forward slash.

A path also provides a way to uniquely identify a file. Two files can have the same file name, but may not have the same path. For example, baz exists twice in the file system above, but they have two different paths: /foo/baz and /foo/bar/baz. Thus, they can be distinguished, but if the two files had the same path, that is, existed in the same directory, then they would not be distinguishable and an inconsistency would occur in the file system.

1.2 The UNIX File System

You are likely familiar with the Windows file system structure, which is organized into drives, like the "C" drive, and then items fall out from there. There can be many drives attached to the Windows file system, and you navigate under various drives by letter name.

On Unix, instead of having a "C" drive, everything begins at the root directory or /, and unlike windows, Unix uses forward slashes instead of backslashed to designate different directories.

unixfs.png

1.2.1 The Key Components of UNIX File System

The base UNIX file systems always has the same basic structure:

  • /: The root of the file system. All files and directories fall under this.
  • /usr: Stands for Unix System Resources. Contains system utilities.
  • /sbin: System binaries. Contains essential system administration programs that are generally run by the superuser (on Windows the superuser is called the Administrator).
  • /opt: Optional software. Third party software is usually installed here. It's kind of like the "Program Files" directory in Windows.
  • /etc: System configuration files. This is where things like the password file, and global system configuration files live.
  • /home: Contains the user home directories. Like the "Documents and Settings" folder in Windows.
  • /tmp: Temporary files. When the system reboots, these go away.
  • /kernel: The core operating system. Like the "Windows" folder in Windows.
  • /usr/lib: Contains precompiled libraries for use by everyone on the system. For instance, in this directory, you can find the file libstdc++.so.6, which is needed for C++ programs. (Remember linking from IC210?). This is a little bit like the folder containing all of the .DLL files in Windows.
  • Any directory that ends in bin: Contains binary executable files or links to them.

1.2.2 Home Directories

Each user on the UNIX file system has a /home directory in the /home folder.

  • My username is aviv, then my home directory is /home/aviv.
  • Your home directory will be some thing like /home/m17XXXXX
  • The ~ (tilda) is short hand for your home directory.
    • For example, if you use a path like ~/Downloads you are referring the Downloads folder in your home directory. The ~ is replaced by /home/m17XXXX automatically.
  • You can also refer to someone else home directory via the tilda, ~aviv refers to aviv's home directory, and ~m17xxxx may refer to your home directory.
    • For example, ~aviv/Dowloads refers to the downloads directory in user aviv's home directory and ~m17xxxx/Downloads refers to the downloads directory in user m17xxxx's home directory.

1.2.3 Unix Directory Paths

              +--- Root Directory
              |
              V
              /home/aviv/foo.txt
                 ^   ^    ^
Sub-Directory ---'---'    '-- Target

1.2.4 Parent and Current Directory

Every directory has two special sub-director:

  • . : ("dot") The current directory
  • .. : ("dot-dot") The parent directory

Another way to interpret the dot and dot-dot is by replacing them with "current" and "parent". Consider the path below:

/home/aviv/../m175678

Reading it from left to right, you might say: "From the root, go to home directory, then to the aviv directory, then to the parent of aviv, then to m175678, the target." Here is another path:

/home/aviv/./././../aviv/./foo.txt

The dot is replaced with current folder, and by stringing then together, it has no affect on the path. In the example above, all the dots refer to the current directory, aviv. Following the rest of the path, the dot-dot then refers to go to the parent directory, home, but that is followed by traversing back to the aviv directory. Then the dot means to stay put again, and then, finally, the target, foo.txt.

path_ex.png

1.3 The Shell Preliminaries

unix-parts.png

The shell or terminal is the primary user interface for interacting with the operating system through text input. You have already used various shell's previously, like the Windows command terminal as well as the Unix shell. You compile C++ programs by typing a command, and then the shell tells the OS to execute those commands.

A shell is just another program running on the OS, like any other application. Most shells are the program, bash, which we will look at in detail later. What makes a shell special is that it is a program designed specifically to enable the user to launch other programs. In many ways, it is the primary user interface of the OS. Additionally, the shell and the OS on Unix provide a simple set of command line tools that enable you to navigate the file system, manipulate the file system by creating or deleting files/folders, read and parse files on the system, and monitor current running process and programs.

Today, we'll focus on the command line tools associated with the navigating and manipulating the file system. As we work through this class we will dive into the Unix system by emphasizing shell scripting and interacting with the Unix kernel through C system call API.

1.3.1 Current Working Directory

The shell has a notion of location, or current working directory or present working directory, that indicates where in the file system the shell is currently operating. When you first log into a computer, a shell is started for you set to your home directory as the current working directory. The shell can change the current working directory to view a different parts of the file system.

1.3.2 Navigating the file system

There are three important commands for navigating the file system via the shell:

  • cd path : Change the current directory to the one specified by path or go to your home directory if path is ommited
  • ls path : List the contents of the directory at that path or the current directory if path is ommited
  • pwd : Print to the screen your current working directory.

Here is a sample of using these commands to explore the file system:

aviv@zee:~$ pwd
/home/scs/aviv
aviv@zee:~$ ls
aviv-local@  class/      Downloads/  local/     Public/     test.c   VBox-Map/        #VM-notes.txt#
#.bashrc#    Desktop/    ic221/      Music/     Templates/  test.c~  Videos/          VM-notes.txt
bin/         Documents/  id_rsa.pub  Pictures/  test*       tmp/     VirtualBox VMs/  VM-notes.txt~
aviv@zee:~$ cd tmp/
aviv@zee:~/tmp$ ls
aviv@zee:~/tmp$ cd ..
aviv@zee:~$ ls ~blenk
ls: cannot open directory /home/scs/blenk: Permission denied
aviv@zee:~$ ls class/ic221/
hw/     lab/    submit/ 
aviv@zee:~$ ls class/ic221/hw
hw1.pdf

Note that the OS manages who can view what directories. When I tried to read someone Elise's home directory, I received a permission denied error.

aviv@zee: ~ $ ls ~m171110 
ls: cannot access /home/mids/m171110: No such file or directory

This is an example of another important role of the operating system to provide security services on a shared systems. With multiple users and programs running on the same hardware base, it is the responsibility of the OS to ensure that users and programs do not access information they should not as well as not interfere with the execution of programs.

1.3.3 Understanding a Shell Prompt:

All shells have a command prompt (or just "prompt"), which indicates to the user to provide input. The prompt, itself, also provides some useful information about the shell, including things like the current working directory, your user name, and the host you are working on.

Here is an example prompt:

     + User Name
     |        +--Current Working Directory
     |        |  
     V        V
     aviv@zee:~$  
            ^  ^  ^
Hostname-.__|  |  |_____.--- Where you enter commands
               |
 The prompt----+

You can see this command prompt change as you navigate the file system:

aviv@zee:~$ cd tmp/
aviv@zee:~/tmp$ ls
aviv@zee:~/tmp$ cd ..
aviv@zee:~$ 

While your shell may have a slightly different command prompt, all the same information is likely there. You can also change your command prompt as you want, and we will provide examples of doing that in later lessons.

1.4 File System Command Line Tools

Throughout this class, we will make use of a lot of standard Unix command line tools. These tools are common to nearly all Unix platforms. Above, we introduced three tools for navigating the file system, now we will explore some more, as well as there options.

1.4.1 Disecting a Command Line Argument

Some terminology regarding command line tools in shell

   +- Sell prompt, not included in the command
   |
   v                           
aviv@zee:~$ command arg1 arg2 arg3 ...
              ^      ^     ^    ^
              |      |_____|____|_____,-- The command argumets, 
              |       
              +-- The command, such as mv or cd

Most commands do not require arguments, but they are ways to provide a different set of information.

1.4.2 ls and it's arguments

First consider ls, which has a number of different options to display different listing of a directory.

  • ls path : list contents of directory at path
  • ls -l path : long list the contents of the directory at path, which includes permission, ownership, last edited, and file size.
  • ls -a path : list all contents of directory at path including hidden files that start with a ".", such as .bashrc
  • ls -al path : long list all contents of directory at path including hidden files
aviv@zee:~$ ls
aviv-local@  class/      Downloads/  local/     Public/     test.c   VBox-Map/        #VM-notes.txt#
#.bashrc#    Desktop/    ic221/      Music/     Templates/  test.c~  Videos/          VM-notes.txt
bin/         Documents/  id_rsa.pub  Pictures/  test*       tmp/     VirtualBox VMs/  VM-notes.txt~
aviv@zee:~$ ls -a
./              bin/        .dmrc         .gnome2/          .launchpadlib/  .pulse-cookie  Videos/
../             .cache/     Documents/    .gnome2_private/  .lesshst        .ssh/          VirtualBox VMs/
aviv-local@     class/      Downloads/    .gtk-bookmarks    local/          .tcshrc        #VM-notes.txt#
.bash_history   .compiz/    .emacs        .gvfs/            .local/         Templates/     VM-notes.txt
.bash_profile   .compiz-1/  .emacs~       .hplip/           .mozilla/       test*          VM-notes.txt~
.bash_profile~  .config/    .emacs.d/     ic221/            Music/          test.c         .vmware/
.bashrc         .cshrc      .fontconfig/  .ICEauthority     Pictures/       test.c~        .Xauthority
.bashrc~        .dbus/      .gconf/       id_rsa.pub        Public/         tmp/           .xsession-errors
#.bashrc#       Desktop/    .gksu.lock    .inputrc          .pulse/         VBox-Map/      .xsession-errors.old

Notice all the files that start with . that are now visible with the -a option, as well as . and ..

When using ls -l you gate a lot of extra information.

aviv@saddleback: ~ $ ls -l 
total 484
-rw-r----- 1 aviv scs  85816 May 27  2014 302.packages.dat
-rw-r----- 1 aviv scs  14284 May 27  2014 302.pakages.installed.dat
-rw------- 1 aviv scs  13524 Dec 23  2013 accesslog.gz
-rw------- 1 aviv scs 115373 Dec 23  2013 authlog.dat
lrwxrwxrwx 1 aviv scs     17 Nov  5  2013 aviv-local -> /local/aviv-local
drwxr-xr-x 2 aviv scs   4096 Dec 17 15:07 bin
drwxr-x--x 4 aviv scs   4096 Aug 18 11:10 class
drwx--x--x 2 aviv scs   4096 Dec 22  2013 Desktop
drwx--x--x 2 aviv scs   4096 Oct 17  2013 Documents
drwx--x--x 2 aviv scs   4096 Dec 22  2013 Downloads
drwxr-x--- 3 aviv scs   4096 Oct 23 11:42 etector
drwxr-xr-x 2 aviv scs   4096 Apr 22  2014 final-practicum
drwx------ 3 aviv scs   4096 Dec 23  2013 git
drwxr-xr-x 4 aviv scs   4096 Feb 12  2014 GNUstep
-rw-r----- 1 aviv scs    396 Mar 23  2014 guineapig.id_rsa.pub
-rw-r----- 1 aviv scs    366 Feb 26  2014 helloworld.c
-rw-r----- 1 aviv scs    358 Feb 26  2014 helloworld.c~
-rw-r----- 1 aviv scs  32109 Apr 10  2014 lab.html
drwx--x--x 4 aviv scs   4096 Dec 24  2013 local
drwxr-x--- 3 aviv scs   4096 Apr 23  2014 Mail
-rw------- 1 aviv scs    185 Jan 16  2014 most_recent_sub.sh
drwx--x--x 2 aviv scs   4096 Oct 17  2013 Music
drwxr-x--- 4 aviv scs   4096 Dec 17 15:06 old-ic221
-rw-r----- 1 aviv scs      0 Mar 19  2014 output.txt
drwx--x--x 2 aviv scs   4096 Dec 22  2013 Pictures
drwx--x--x 2 aviv scs   4096 Jan  7  2014 Public
drwxr-xr-x 4 aviv scs   4096 Nov 14 11:05 public_html
-rw-r----- 1 aviv scs  85938 May 27  2014 saddleback.packages.dat
-rw-r----- 1 aviv scs  14213 May 27  2014 saddleback.pakages.installed.dat
drwxr-x--- 4 aviv scs   4096 Aug 18 10:51 si221
drwxr-x--- 5 aviv scs   4096 May 27  2014 svn
-rw-r----- 1 aviv scs   4530 Dec 23  2013 syslog.dat
drwx--x--x 2 aviv scs   4096 Oct 17  2013 Templates
drwx--x--x 4 aviv scs   4096 Dec 29 10:38 tmp
drwx--x--x 2 aviv scs   4096 Nov 11  2013 VBox-Map
drwx--x--x 2 aviv scs   4096 Oct 17  2013 Videos
drwx--x--x 2 aviv scs   4096 Nov  5  2013 VirtualBox VMs
-rw-r--r-- 1 aviv scs    465 Nov  8  2013 VM-notes.txt
lrwxrwxrwx 1 aviv scs     12 Jan  9  2014 web -> public_html/

We can interpret this information as:

.- Directory?
|    .-------Permissions                   .- Directory Name
| ___|___     .----- Owner                 |
v/       \    V     ,---- Group            V
drwxr-x--x 4 aviv scs 4096 Dec 17 15:14 ic221
-rw------- 1 aviv scs 400  Dec 19  2013 .ssh/id_rsa.pub
                       ^   \__________/    ^   
File Size -------------'       |           '- File Name
  in bytes                     |              
                               |
   Last Modified --------------' 

On your own: Try using the following variants of the ls command:

  • ls -h
  • ls -k
  • ls *

You may also notice some other kinds of files there, for example:

lrwxrwxrwx 1 aviv scs     12 Jan  9  2014 web -> public_html/

This is a symbolic link, notice the l in the prefix. We'll discuss these later in the semester. For now, understand this a lot like a shortcut, web is a shortcut for the directory pubic_html/.

1.4.3 File System Manipulation Commands

So far, we've looked at commands for navigating the file system, now we are going to look at commands that can manipulate the file system. The most basic actions for file system manipulation and their corresponding commands are:

  • cp from to : Copy a file/directory from path from to path to
  • mv from to : Move a file/directory from path from to path to, also used to change the name of a file/directory
  • rm path : Remove a file from path
  • mkdir path : Make a directory at path
  • touch path : Create an empty file at path

Let's look at a demo of doing this

aviv@zee:~/class/ic221/demo$ mkdir NewDir               <--- Create a new directory
aviv@zee:~/class/ic221/demo$ ls                         <--- Show it was created
NewDir/

aviv@zee:~/class/ic221/demo$ cd NewDir/                 <--- Change into that directory
aviv@zee:~/class/ic221/demo/NewDir$ ls                  <--- List contents, it's empty
aviv@zee:~/class/ic221/demo/NewDir$ touch foo.txt       <--- Create an empty file foo.txt
aviv@zee:~/class/ic221/demo/NewDir$ cp foo.txt baz.txt  <--- Copy foo.txt to baz.txt
aviv@zee:~/class/ic221/demo/NewDir$ ls                  <--- List contents of directry with foo.txt and baz.txt
baz.txt  foo.txt                  

aviv@zee:~/class/ic221/demo/NewDir$ mv baz.txt ..       <--- Move baz.txt to parent directory
aviv@zee:~/class/ic221/demo/NewDir$ ls                  <--- List contents of directory, no baz.txt
foo.txt

aviv@zee:~/class/ic221/demo/NewDir$ cd ..               <--- Change to partent directory
aviv@zee:~/class/ic221/demo$ ls                         <--- List contents, show baz.txt and NewDir
baz.txt  NewDir/                

aviv@zee:~/class/ic221/demo$ rm baz.txt                 <--- Remove baz.txt
rm: remove regular empty file `baz.txt'? y              <--- confirm it's removal
aviv@zee:~/class/ic221/demo$ rm NewDir/foo.txt          <--- Remove foo.txt by using a path to it 

rm: remove regular empty file `NewDir/foo.txt'? y       <--- confirm it's removal
aviv@zee:~/class/ic221/demo$ rm NewDir/                 <--- Remove the direcotry
rm: cannot remove `NewDir/': Is a directory             <--- FAIL!

1.4.4 Handling Directories and Recursive (-r) Option

Note that rm cannot directrly remove a directory, instead you have to use a special form of remove, the rmdir command.

aviv@zee:~/class/ic221/demo$ rmdir NewDir/
aviv@zee:~/class/ic221/demo$ ls
aviv@zee:~/class/ic221/demo$ 

However, you cannot rmdir if there is contents in the directory

aviv@zee:~/class/ic221/demo$ mkdir NewDir
aviv@zee:~/class/ic221/demo$ touch NewDir/foo.txt
aviv@zee:~/class/ic221/demo$ rmdir NewDir/
rmdir: failed to remove `NewDir/': Directory not empty

There is an option to remove -r, which stands for recursive, that will recursively remove a directory and its contents.

aviv@zee:~/class/ic221/demo$ rm -r NewDir/
rm: descend into directory `NewDir/'? y
rm: remove regular empty file `NewDir/foo.txt'? y
rm: remove directory `NewDir'? y
aviv@zee:~/class/ic221/demo$ ls
aviv@zee:~/class/ic221/demo$ 

Similar issues occur when you are trying to copy a directory with cp, you need to specify the recursive option "-r". As you can see from the demo below, this also copies the entire contents of the directory.

aviv@zee:~/class/ic221/demo$ mkdir NewDir
aviv@zee:~/class/ic221/demo$ touch NewDir/foo.txt
aviv@zee:~/class/ic221/demo$ ls
NewDir/
aviv@zee:~/class/ic221/demo$ cp NewDir/ CopyDir
cp: omitting directory `NewDir/'
aviv@zee:~/class/ic221/demo$ ls
NewDir/
aviv@zee:~/class/ic221/demo$ cp -r NewDir/ CopyDir
aviv@zee:~/class/ic221/demo$ ls
CopyDir/  NewDir/
aviv@zee:~/class/ic221/demo$ ls CopyDir/
foo.txt

The move command does not require a recursive option with interacting with directories.

aviv@zee:~/class/ic221/demo$ ls
CopyDir/  NewDir/
aviv@zee:~/class/ic221/demo$ mv NewDir/ CopyDir/
aviv@zee:~/class/ic221/demo$ ls
CopyDir/
aviv@zee:~/class/ic221/demo$ ls CopyDir/
foo.txt  NewDir/
aviv@zee:~/class/ic221/demo$ ls CopyDir/NewDir/
foo.txt

1.5 Where Commands "Live"

Now you are more familiar with navigating and manipulating the Linux file system, let's return to the basic structure of the Linux root file system.

unixfs.png

When you type the command ls or rm these commands are really program binaries that have to exist somewhere in the file system. Since these are binaries, by convention they exist in a directory that ends in bin. The way the shell finds these commands is by searching through a sequence of bin folders until it finds it.

The search path for binaries is called the $PATH or just path. You can display your current path using the echo command, which just print to the screen.

aviv@zee:~$ echo $PATH
/home/scs/aviv/bin:/opt/local/bin:/opt/local/sbin:usr/lib/jvm/java-6-sun/bin:/home/scs/aviv/bin:

So when you type a command like ls, the shell looks in each of the folders for a program named ls to run. It happens that ls exists in the base /bin folder, which means you can run it in shorthand and using it's full path:

aviv@zee:~$ ls
aviv-local@  class/      Downloads/  local/     Public/     test.c   VBox-Map/        #VM-notes.txt#
#.bashrc#    Desktop/    ic221/      Music/     Templates/  test.c~  Videos/          VM-notes.txt
bin/         Documents/  id_rsa.pub  Pictures/  test*       tmp/     VirtualBox VMs/  VM-notes.txt~
aviv@zee:~$ /bin/ls
aviv-local  class      Downloads   local     Public     test.c   VBox-Map        #VM-notes.txt#
#.bashrc#   Desktop    ic221       Music     Templates  test.c~  Videos          VM-notes.txt
bin         Documents  id_rsa.pub  Pictures  test       tmp      VirtualBox VMs  VM-notes.txt~

1.5.1 The which command

Unix provides a command line utility for finding where a command lives, the which command.

aviv@zee:~$ which ls
/bin/ls
aviv@zee:~$ which rm
/bin/rm
aviv@zee:~$ which which
/usr/bin/which 

Most basic commands that are part of the Base system are in /bin but user system resources (usr) command line tools exist in /usr/bin, like which itself. What other commands have we looked at is in /usr/bin?



2 Lab Preliminaries

2.0.1 Lab Learning Goals

The goal of this lab are:

  1. To familiarize yourself with the linux enviorment
  2. To learn the manual pages
  3. To learn about file permissions
  4. To learn file parsing command line tools
  5. To learn to create text files

2.0.2 Lab Setup:

  • Run the following command in your terminal
~aviv/bin/ic221-up
  • Then change into the following directory
cd ic221/labs/01
  • You will find all the material you need to complete this lab in that directory.
  • During the course of this lab, we will refer to the ic221/labs/01 as the lab directory

2.0.3 Lab Worksheet

Where indicated through lab questions and directed work, you may be asked to answer questions on a worksheet or create files. You should do so in the lab directory.

Additionally, we provide directions for editing files using emacs, and directions can be found at the end of the lab (emacs) and on the course resources pages (resources). Through this course, we encourage you to learn the emacs editor. You may, however, use any editor you feel comfortable with, and the only requirement is that you become "good" at the editor. Learn the shortcuts! It will greatly improve your experience as a computer programmer.

2.0.4 Lab Submission

To submit this lab you will place all relevant content into your lab directory:

ic221/labs/01

Then issue the submission script

~aviv/bin/ic221-submit

Select your section number and the option for labs/01, and confirm. Your submission was successful if you see SUCCESS at the end. You may submit multiple times up until the submission deadline. Only your final submission will be considered for grading.


2.1 Part 1: The Man Pages

One of Unix'es greatest features is that it is self-documenting through a set of manuals. To access the manuals, you use the man command. Let's start by looking at the manual page for ls:

aviv@mich342csdtestu:~$man ls

This brings up the manual page, whose header looks like this:

LS(1)                                             User Commands                                            LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List information about the FILEs (the current directory by default).  Sort entries alphabetically if none
       of -cftuvSUX nor --sort is specified.

       Mandatory arguments to long options are mandatory for short options too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

The manual page provides both a brief description of the command and the arguments and options. For example, we see that the option -a or --all both list all entries for the directorie, including . and .. and all hidden files starting with . while, conversely, the -A or --almost-all option list all hidden files while not displaying the . or .. entries.

This is just one of many options for the ls command, and you can scroll down using the up and down arrow key. You can quit the manual page using q, and if you need to panic C^g until you can hit q and exit.

2.1.1 Lab Questions and Tasks

Perform the following taks and answer the following questions in the worksheet.txt file found in your lab directory.

  1. For the ls command, what option prints information out in long form, like -l, but does not print any file ownership information? In the worksheet, provide a copy of the output using ls with this option run from the top level of the lab directory.
  2. Change into the part1 directory and type ls. You will see a list of files a b c d e f g.
    1. Note that ls lists the files in alphabetic order. What ls option will list the files in reverse alphabetic order? Provide a copy of your output of your ls with the addition of -l in your worksheet.
    2. What ls options will sort the files by size from smallest to largest? Provide a copy of your output of your ls with the addition of -l in your worksheet.
    3. What ls option will sort the files in reverse size order from largest to smallest. Provide a copy of your output of your ls with the addition of -l in your worksheet.
  3. Remove the g file using the rm command. Notice that the shell asked you to confirm removing the item. Look at the manual for rm, what option must have been invoked when you issued that command? What option can you use to avoid having to confirm the removal of an item?
  4. (Challenge) Read the manual page for the touch command. One of the uses for touch is to update the last modified timestamp of a file (you can view that last modified using ls -l). Use the touch command to create a file y2k whose last modification time was Dec. 31 1999 at 23:59.59. Include the command you used on your worksheet and a copy of your ls -l output of the y2k file.

2.2 Part 2: cat, less and more

Another really import part of using the command line tools is to be able to read/view the contents of files. There are a number of ways to do this without the command line, for example, you could just open the file in an editor like emacs or gedit, but as Unix users, we know there must be a better way if we are only viewing the contents of the file.

2.2.1 Viewing files with cat

If you can think of the most basic functionality for viewing the file, this would include just printing the contents of the file directly to the terminal screen. The command to do just that is the cat command, which is short for concatenate. Here is the man page synopsis for cat:

NAME
     cat -- concatenate and print files

SYNOPSIS
     cat [-benstuv] [file ...]

DESCRIPTION
     The cat utility reads files sequentially, writing them to the standard output.  The file operands
     are processed in command-line order.  If file is a single dash (`-') or absent, cat reads from the
     standard input.  If file is a UNIX domain socket, cat connects to it and then reads it until EOF.
     This complements the UNIX domain binding capability available in inetd(8).

More simply, the cat command takes a file or sequence of files, and write them to standard out, which is the terminal. Let's do a quick example. Navigate to part2 in the lab directory, and let's cat the output of the GoNavy.txt file:

#> cat GoNavy.txt
Beat Army!

The output of the cat command, printed to the terminal standard out, is "Beat Army!". You should verify contents of the file by opening it an editor.

cat can also take multiple files as input, and print there contents to the terminal one after the other, or, another way to put it, cat will concatenate the contents of two files by printing it to standard output.

Let's use the cat command to view the contents of the files in part2 of the lab directory. Use cd to navigate to there now.

#> cat BeatArmy.txt GoNavy.txt
Go Navy!
Beat Army!

As you can see the contents of BeatArmy.txt is "Go Navy!" and the contents of GoNavy.tx is "Beat Army!. The concatenation of those contents is "Go Navy! Beat Army!" across two lines.

2.2.2 Viewing files with less and more

One draw back of viewing files with cat is that it clutters up your terminal, and any reasonable Unix user hates a cluttered terminal. You could always clear your terminal using clear or C^l (Control-l) — go ahead and try that now — but it can get bothersome the more you need to do so.

Instead, Unix provides two ways to view a file within a terminal application: less and more. In the jocular style of Unix design, less is more and more is less. The basic difference between the two file viewers is that less allows you to go forward and backwards in a file while more only allows you to move forward in the file, exiting at the end. Thus, in Unix, less is really more than more.

Let's see an example of why this is useful. Consider two great authors of literature, Charles Dickens and Ernest Hemingway. Dickens was paid by the word and so his stories are very long indeed, while Hemingway was a minimalists, and his stories were quite short. In the part2 directory you have two text files, one named dickens.txt and one named hemingway.txt.

We can easily read hemingway.txt using more, it just moves forward in the file by pressing :space: to page down or the arrow key :down:. The indicator at the bottom of the more screen

--More--(95%)

Describes how far in the file we've progressed. Lets now use more to read dickens.txt … oh man! This is going to take forever, and there is only one way to go, forward. Clearly, we need a more powerful viewer, so we use less. The less terminal allows you to move forward and back, plust a bunch of other useful navigation tools. Here are some below

less Navigation

  • To quit: q
  • To search forward: / then type your search (regex allowed) then use the following
    • Find Next match going forwards: n
    • Find Prev match going backwards: N
  • To search backward: ? then type your search (regex allowed) then use the following
    • Find Prev going backwards: n
    • Find Next going forwards: N
  • Go To line: : the type line number
  • Start of File: <
  • End of Fole: >
  • Panic: C-g mash this if you are in a state you don't understand

2.2.3 Lab Questions and Tasks:

  1. Use cat to place a "Beat Army!" at the start of Hemmingway's a very short story and "Go Navy!" at the end. Include the command you used on your worksheet.
  2. Why is less more?
  3. Use less to open dickens.txt:
    1. Search for the first instance of "Fagin", what is the line of that text?
    2. Find the second to last instance of "Fagin". Describe how you did that and the sentence it appears in.
    3. Go to line 1845, what is the name of that chapter?

2.3 Part 3: Viewing Files with head, tail, sed and grep

When we do want to print the contents of a file to the terminal, we may not want to print the whole thing, as cat does. Instead, sometimes we'd like just to print the first n lines, or the last n lines, or some set of lines in the middle, or just print lines that match a given search string. For that we have set of very useful commands.

For the following examples, navigate to the part4 directory in the lab directory. There is a sample file sample-db.csv that you will use for this part that contains fake records of people entering information on a web server.

2.3.1 View the first or last n lines with head or tail

The head command line tools is used to print the head of the file. By default, head prints the first 10 lines:

#> head sample-db.csv 
#first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com
Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com
Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com
Lenna,Paprocki,Feltz Printing Service,639 Main St,Anchorage,Anchorage,AK,99501,907-385-4412,907-921-2010,lpaprocki@hotmail.com,http://www.feltzprintingservice.com
Donette,Foller,Printing Dimensions,34 Center St,Hamilton,Butler,OH,45011,513-570-1893,513-549-4561,donette.foller@cox.net,http://www.printingdimensions.com
Simona,Morasca,Chapman, Ross E Esq,3 Mcauley Dr,Ashland,Ashland,OH,44805,419-503-2484,419-800-6759,simona@morasca.com,http://www.chapmanrosseesq.com
Mitsue,Tollner,Morlong Associates,7 Eads St,Chicago,Cook,IL,60632,773-573-6914,773-924-8565,mitsue_tollner@yahoo.com,http://www.morlongassociates.com
Leota,Dilliard,Commercial Press,7 W Jackson Blvd,San Jose,Santa Clara,CA,95111,408-752-3500,408-813-1105,leota@hotmail.com,http://www.commercialpress.com
Sage,Wieser,Truhlar And Truhlar Attys,5 Boston Ave #88,Sioux Falls,Minnehaha,SD,57105,605-414-2147,605-794-4895,sage_wieser@cox.net,http://www.truhlarandtruhlarattys.com

Similarly, tail by default will show the last 10 lines:

#> tail sample-db.csv 
Carlee,Boulter,Tippett, Troy M Ii,8284 Hart St,Abilene,Dickinson,KS,67410,785-347-1805,785-253-7049,carlee.boulter@hotmail.com,http://www.tippetttroymii.com
Thaddeus,Ankeny,Atc Contracting,5 Washington St #1,Roseville,Placer,CA,95678,916-920-3571,916-459-2433,tankeny@ankeny.org,http://www.atccontracting.com
Jovita,Oles,Pagano, Philip G Esq,8 S Haven St,Daytona Beach,Volusia,FL,32114,386-248-4118,386-208-6976,joles@gmail.com,http://www.paganophilipgesq.com
Alesia,Hixenbaugh,Kwikprint,9 Front St,Washington,District of Columbia,DC,20001,202-646-7516,202-276-6826,alesia_hixenbaugh@hixenbaugh.org,http://www.kwikprint.com
Lai,Harabedian,Buergi & Madden Scale,1933 Packer Ave #2,Novato,Marin,CA,94945,415-423-3294,415-926-6089,lai@gmail.com,http://www.buergimaddenscale.com
Brittni,Gillaspie,Inner Label,67 Rv Cent,Boise,Ada,ID,83709,208-709-1235,208-206-9848,bgillaspie@gillaspie.com,http://www.innerlabel.com
Raylene,Kampa,Hermar Inc,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,46514,574-499-1454,574-330-1884,rkampa@kampa.org,http://www.hermarinc.com
Flo,Bookamer,Simonton Howe & Schneider Pc,89992 E 15th St,Alliance,Box Butte,NE,69301,308-726-2182,308-250-6987,flo.bookamer@cox.net,http://www.simontonhoweschneiderpc.com
Jani,Biddy,Warehouse Office & Paper Prod,61556 W 20th Ave,Seattle,King,WA,98104,206-711-6498,206-395-6284,jbiddy@yahoo.com,http://www.warehouseofficepaperprod.com
Chauncey,Motley,Affiliated With Travelodge,63 E Aurora Dr,Orlando,Orange,FL,32804,407-413-4842,407-557-8857,chauncey_motley@aol.com,http://www.affiliatedwithtravelodge.com

You can describe how many lines you wish to show in two ways, either by using -n argument, where n is replaced by the number of lines. For example, to print the first 3 lines:

#> head -3 sample-db.csv 
#first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com
Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com

Or, by passing the number of lines, following -n like -n 3:

>head -n 3 sample-db.csv 
#first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com
Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com

2.3.2 Printing intermediate lines with sed

The sed command is very powerful, and it has many more features than just printing intermediary lines. We will explore some of those features later in the course and today just focus on intermediary line printing.

Here is the format of the sed command:

Line Number Input
      |          ,--File to process
      v          v
 sed -n 3,10p filename
        ^  ^^ 
Start---'  ||
Finish-----''---Print those lines

As example, what if we want to print the first 3 lines of the database file without including the index, the line starting with #. Then, we'd like to print lines 2 through 4.

#> sed -n 2,4p sample-db.csv 
James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com
Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com
Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com

2.3.3 Printing only matching lines with grep

Finally, we need a mechanism to only process lines that match a condition. The grep command is used for that, and it is such an important command in the Unix ecosystem, it is used as a verb. For example, "We grep out lines matching the string" is something you'll hear your instructors say throughout this class, and "grep" is loosely defined as "match."

For example, let's consider trying to just print the lines where the person is from the state of New Jersey. To do that, we need to first identify a unique part of lines for people from New Jeresey, and that is "NJ" in the address field.

#> grep NJ sample-db.csv 
Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com
Alisha,Slusarski,Wtlz Power 107 Fm,3273 State St,Middlesex,Middlesex,NJ,08846,732-658-3154,732-635-3453,alisha@slusarski.com,http://www.wtlzpowerfm.com
Ernie,Stenseth,Knwz Newsradio,45 E Liberty St,Ridgefield Park,Bergen,NJ,07660,201-709-6245,201-387-9093,ernie_stenseth@aol.com,http://www.knwznewsradio.com
(...)

Note that in a grep command, the first part is the search term and the second part is the file to be searched. grep is a very powerful command that uses a special search langauge called regular expressions, and you can search for all sorts of things. This is not the focus of this course, but if you would like to learn more, speak with your instructor.

2.3.4 Lab Questions

  1. Read the man pages for head and tail, produce a command line to print the first kilobyte of the database file. Note, a kilobyte is 210 or 1024 bytes. Include the command line in your worksheet.
  2. Use less or grep to find the line number of "Mastella". Produce a sed command to just print the line with "Mastella" and the following 5 lines. Include the command line in your worksheet.
  3. How many people's first name is "Pamella"? Use grep to find that out.
  4. Read the man page for grep. Print out the all the lines from people who do not have an address in NJ. Include the command line in your worksheet.

2.4 Part 4: Pipelines with cut, sort, uniq, and wc

The final piece of the puzzle for file processing is to take the output of processing a file and set it as the input to another process. These process parts can be chained together into a pipeline. Consider this simmple pipeline below:

#> cat sample-db.csv | head -20

The pipe or | takes the output of one command and sets it as the input of another. In the example above, the output of the cat is to print the whole contents of the file to the input of head, which then only prints the first 20 lines of its input. While this is a contrived example, you should start to see the power of the pipline. Consider the below command:

#> grep NJ sample-db.csv | wc -l

The first part of the command will print out only the lines that contain the pattern "NJ", this output is then set as the input to the wc command, which is a command line tools to count words, lines, and bytes. The -l option says to just print the line count, and thus, the command above prints out the number of lines.

Nearly all Unix command line tools have an option to either read from a file or from standard input along a pipline. For example, the above command can be rewritten with cat at the front, as follows:

#> cat sample-db.csv | grep NJ | wc -l

For more options of wc, refer to the man page.

2.4.1 Parsing just fields with cut

A very useful pipeline tool is cut, which is used to extract fields from a formatted file, like our database file. Here is a basic command line argument:

          ,--- Deliminator
        __|_           ,-----Input File, or leave off to read from stdin
       /    \          v
#> cut -d "," -f 1 sample-db.csv | head -5
              \__/
                \__.-- Field

The deliminator determines how the file is to be cut. The sample-db.csv file is a comma separated file, so it is delimitated by commas; that is, every item in the line is separated by comma to distinguish it from other items on the line. The above command will print the first 5 lines of output from the first delimitated item:

#> cut -d "," -f 1 sample-db.csv | head -5
#first_name
James
Josephine
Art
Lenna

That is the first_name field. If we wished to not include the index, first_name, we could use a head and tail command combination:

#> cut -d "," -f 1 sample-db.csv | head -6 | tail -5
#first_name
James
Josephine
Art
Lenna
Donette

As you can see, this quickly builds into a very powerful tool just by adding commands to the pipeline.

2.4.2 Sorting and removing duplicates with sort and uniq

The last two parsing tools we'll use in this lab is sort and uniq, the former will sort input and the later will remove any adjacent duplicate lines. Consider how these might be used in conduction to solve different file parsing problems. We leave their definitions and usage for you to determine by reading the man pages.

2.4.3 Lab Questions and Tasks

You should continue to work in part 4 to complete these lab questions:

  1. Create a pipeline to count the number of unique states represented in the database file. Include the pipeline in your worksheet.
  2. How many first names in the file repeat? How many last names? Include the pipelines used to determine this.
  3. Write a pipeline to first print to the terminal all the unique telephone area codes? Add to your pipeline how to sort those numerical? (Hint: read the man page for sort).