Lab 01: File Permissions/Ownership and Unix Familiarization and Command Line Tools
Table of Contents
1 File Permissions and Ownership chmod
and chown
Continuing our exploration of the UNIX file system and command line operations, we now turn our attention to the file ownership and permissions. One of the most important services that the OS provides is security oriented, ensuring that the right user access the right file in the right way.
Lets first remind ourselves of the properties of a file that are
returned by running ls -l
:
.- Directory? | .-------Permissions .- Directory Name | ___|___ .----- Owner | v/ \ V ,---- Group V drwxr-x--x 4 aviv scs 4096 Dec 17 15:14 ic221 -rw------- 1 aviv scs 400 Dec 19 2013 .ssh/id_rsa.pub ^ \__________/ ^ File Size -------------' | '- File Name in bytes | | Last Modified --------------'
There are two important parts to this discussion: the owner/group and the permissions. The owner and the permissions are directly related to each other. Often permissions are assigned based on user status to the file, either being the owner or part of a group of users who have certain access to the file.
1.1 File Ownership and Groups
The owner of a file is the user that is directly responsible for the
file and has special status with respect to the file
permission. Users can also be grouped together in group
, a
collection of users who posses the same permissions. A file also has
a group designation to specify which permission should apply.
You all are already aware of your username. You use it all the time,
and it should be a part of your command prompt. To have UNIX tell
you your username, use the command, who am i
:
aviv@saddleback: ~ $ who am i aviv pts/24 2014-12-29 10:44 (potbelly.academy.usna.edu)
The first part of the output is the username, for me that is aviv
,
for you it will be your username. The rest of the information in
the output refers to the terminal, the time the terminal was
created, and from which host you are connected. We will learn about
terminals later in the semester. (And yes, I name my computers after
pigs.)
You can determine which groups you are in using the groups
command.
aviv@saddleback: ~ $ groups scs sudo
On this computer, I am in the scs
group which is for computer science
faculty members. I am also in the sudo
group, which is for users
who have super user access to the machine. Since saddleback is my
personal work computer, I have sudo access.
1.2 The password and group file
Groupings are defined in two places. The first is a file called
/etc/passwd
which manages all the users of the system. Here is my
/etc/passwd
entry:
aviv@saddleback: ~ $ grep aviv /etc/passwd aviv:x:35001:10120:Adam Aviv {}:/home/scs/aviv:/bin/bash
The first two parts of that file describe the userid and
groupid, which are 35001 and 10120, respectively. These numbers
are the actual group and user names, but Unix nicely converts
these numbers into names for our convenience. The translation
between userid and username is in the password file. The translation
between groupid and group name is in the group file,
/etc/group
. Here is the SCS entry in the group file:
aviv@saddleback: ~ $ grep scs /etc/group scs:*:10120:webadmin,www-data,lucas,slack
There you can see that the users webadmin
, www-data
, lucas
and
slack
are also in the SCS group. While my username is not listed
directly, I am still in the scs group as defined by the entry in the
password file.
Take a moment to explore these files and the commands. See what groups you are in.
1.3 File Permissions
We can now turn our attention to the permission string. A permission is simply a sequence of 9 bits broken into 3 octets of 3 bits each. An octet is a base 8 number that goes from 0 to 7, and 3 bits uniquely define an octet since all the numbers between 0 and 7 can be represented in 3 bits.
Within an octet, there are three permission flags, read, write
and execute. These are often referred to by their short hand, r
,
w
, and x
. The setting of a permission to on means that the bit
is 1. Thus for a set of possible permission states, we can uniquely
define it by an octal number
rwx -> 1 1 1 -> 7 r-x -> 1 0 1 -> 5 --x -> 0 0 1 -> 1 rw- -> 1 1 0 -> 6
A full file permission consists of the octet set in order of user, group, and global permission.
,-Directory Bit | | ,--- Global Permission v / \ -rwxr-xr-x \_/\_/ | `--Group Permission | `-- User Permission
These define the permission for the user of the file, what users in the same group of the file, and what everyone else can do. For a full permission, we can now define it as 3 octal numbers:
-rwxrwxrwx -> 111 111 111 -> 7 7 7 -rwxrw-rw- -> 111 110 110 -> 7 6 6 -rwxr-xr-x -> 111 101 101 -> 7 5 5
To change a file permission, you use the chmod
command and
indicate the new permission through the octal. For example, in
part5
directory, there is an executable file hello_world
. Let's
try and execute it. To do so, we insert a ./
in the front to tell
the shell to execute the local file.
> ./hello_world -bash: ./hello_world: Permission denied
The shell returns with a permission denied. That's because the execute bit is not set.
#> ls -l hello_world -rw------- 1 aviv scs 7856 Dec 23 13:51 hello_world
Let's start by making the file just executable by the user, the permission 700. And now we can execute the file:
#> chmod 700 hello_world #> ls -l hello_world -rwx------ 1 aviv scs 7856 Dec 23 13:51 hello_world #> ./hello_world Hellow World!
This file can only be execute by the user, not by anyone else because the permissions for the group and the world are still 0. To add group and world permission to execute, we use the permission setting 711:
#> chmod 711 hello_world #> ls -l hello_world -rwx--x--x 1 aviv scs 7856 Dec 23 13:51 hello_world
At times using octets can be cumbersome, for example, when you want to set all the execute or read bits but don't want to calculate the octet. In those cases you can use shorthands.
r
,w
,x
shorthands for permission bit read, write and execute- The
+
indicates to add a permission, as in+x
or+w
- The
-
indicates to remove a permission, as in-x
or-w
u
,g
,a
shorthand's for permission bit user, group, and global (or all)
Then we can change the permission
chmod +x file <-- set all the execute bits chmod a+r file <-- set the file world readable chmod -r file <-- unset all the read bits chmod gu+w file <-- set the group and user write bits to true
Depending on the situations, both the octets and the shorthand's are preferred.
1.4 Changing File Ownership and Group
The last piece of the puzzle is how do we change the ownership and group of a file. Two commands:
chown user file/directory
: change owner of the file/directory to the userchgrp group file.directory
: change group of the file to the group
Permission to change the owner of a file is reserved only for the super user for security reasons. However, changing the group of the file is reserved only for the owner.
aviv@saddleback: demo $ ls -l total 16 -rwxr-x--- 1 aviv scs 9133 Dec 29 10:39 helloworld -rw-r----- 1 aviv scs 99 Dec 29 10:39 helloworld.cpp aviv@saddleback: demo $ chgrp mids helloworld aviv@saddleback: demo $ ls -l total 16 -rwxr-x--- 1 aviv mids 9133 Dec 29 10:39 helloworld -rw-r----- 1 aviv scs 99 Dec 29 10:39 helloworld.cpp
Note now the hello world program is in the mids group. I can still execute it because I am the owner:
aviv@saddleback: demo $ ./helloworld Hello World
However if I were to change the owner, to say, pepin
, we get the
following error:
aviv@saddleback: demo $ chown pepin helloworld chown: changing ownership of ‘helloworld’: Operation not permitted
Consider why this might be. If any user can change the ownership of a file, then they could potentially upgrade or downgrade the permissions of files inadvertently, violating a security requirement. As such, only the super user, or the administrator, can change ownership settings.
2 Lab Preliminaries
2.0.1 Lab Learning Goals
The goal of this lab are:
- To familiarize yourself with the Linux environment
- To learn the manual pages
- To learn about file permissions
- To learn file parsing command line tools
- To learn to create text files
2.0.2 Lab Setup:
- Run the following command in your terminal
~aviv/bin/ic221-up
- Then change into the following directory
cd ic221/labs/01
- You will find all the material you need to complete this lab in that directory.
- During the course of this lab, we will refer to the
ic221/labs/01
as the lab directory
2.0.3 Lab Worksheet
Where indicated through lab questions and directed work, you may be asked to answer questions on a worksheet or create files. You should do so in the lab directory.
2.0.4 Lab Submission
To submit this lab you will place all relevant content into your lab directory:
ic221/labs/01
Then issue the submission script
~aviv/bin/ic221-submit
Select your section number and the option for labs/01
, and
confirm. If you see SUCCESS
at the
end. You may submit multiple times up until the submission deadline.
Only your final submission will be considered for grading.
2.1 Part 1: The Man Pages
One of Unix'es greatest features is that it is self-documenting
through a set of manuals. To access the manuals, you use the man
command. Let's start by looking at the manual page for ls
:
aviv@mich342csdtestu:~$man ls
This brings up the manual page, whose header looks like this:
LS(1) User Commands LS(1) NAME ls - list directory contents SYNOPSIS ls [OPTION]... [FILE]... DESCRIPTION List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is specified. Mandatory arguments to long options are mandatory for short options too. -a, --all do not ignore entries starting with . -A, --almost-all do not list implied . and ..
The manual page provides both a brief description of the command
and the arguments and options. For example, we see that the option
-a
or --all
both list all entries for the directory, including
.
and ..
and all hidden files starting with .
while,
conversely, the -A
or --almost-all
option list all hidden files
while not displaying the .
or ..
entries.
This is just one of many options for the ls
command, and you can
scroll down using the up and down arrow key. You can quit the manual
page using q
, and if you need to panic C^g
until you can hit q
and exit.
2.1.1 Lab Questions and Tasks
Perform the following tasks and answer the following questions in
the worksheet.txt
file found in your lab directory.
- For the
ls
command, what option prints information out in long form, like-l
, but does not print any file ownership information? In the worksheet, provide a copy of the output usingls
with this option run from the top level of the lab directory. - Change into the
part1
directory and typels
. You will see a list of filesa b c d e f g
.- Note that
ls
lists the files in alphabetic order. Whatls
option will list the files in reverse alphabetic order? Provide a copy of your output of yourls
with the addition of-l
in your worksheet. - What
ls
options will sort the files by size from smallest to largest? Provide a copy of your output of yourls
with the addition of-l
in your worksheet. - What
ls
option will sort the files in reverse size order from largest to smallest. Provide a copy of your output of yourls
with the addition of-l
in your worksheet.
- Note that
- Remove the
g
file using therm
command. Notice that the shell asked you to confirm removing the item. Look at the manual forrm
, what option must have been invoked when you issued that command? What option can you use to avoid having to confirm the removal of an item? - (Challenge) Read the manual page for the
touch
command. One of the uses fortouch
is to update the last modified timestamp of a file (you can view that last modified usingls -l
). Use thetouch
command to create a filey2k
whose last modification time was Dec. 31 1999 at 23:59.59. Include the command you used on your worksheet and a copy of yourls -l
output of they2k
file.
2.2 Part 2: cat
, less
and more
Another really import part of using the command line tools is to be
able to read/view the contents of files. There are a number of ways
to do this without the command line, for example, you could just
open the file in an editor like emacs
or gedit
, but as Unix users,
we know there must be a better way if we are only viewing the
contents of the file.
2.2.1 Viewing files with cat
If you can think of the most basic functionality for viewing the
file, this would include just printing the contents of the file
directly to the terminal screen. The command to do just that is the
cat
command, which is short for concatenate. Here is the man
page synopsis for cat
:
NAME cat -- concatenate and print files SYNOPSIS cat [-benstuv] [file ...] DESCRIPTION The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (`-') or absent, cat reads from the standard input. If file is a UNIX domain socket, cat connects to it and then reads it until EOF. This complements the UNIX domain binding capability available in inetd(8).
More simply, the cat
command takes a file or sequence of files,
and writes them to standard out, which is the terminal. Let's do a
quick example. Navigate to part2
in the lab directory, and let's
cat
the output of the GoNavy.txt
file:
#> cat GoNavy.txt Beat Army!
The output of the cat
command, printed to the terminal standard
out, is "Beat Army!". You should verify contents of the file by
opening it an editor.
cat
can also take multiple files as input, and print there
contents to the terminal one after the other, or, another way to put
it, cat
will concatenate the contents of two files by printing it
to standard output.
Let's use the cat
command to view the contents of the files in
part2
of the lab directory. Use cd
to navigate to there now.
#> cat BeatArmy.txt GoNavy.txt Go Navy! Beat Army!
As you can see the contents of BeatArmy.txt
is "Go Navy!" and the
contents of GoNavy.txt
is "Beat Army!. The concatenation of those
contents is "Go Navy! Beat Army!" across two lines.
2.2.2 Viewing files with less
and more
One draw back of viewing files with cat
is that it clutters up
your terminal, and any reasonable Unix user hates a cluttered
terminal. You could always clear your terminal using clear
or
C^l
(Control-l) — go ahead and try that now — but it can get
bothersome the more you need to do so.
Instead, Unix provides two ways to view a file within a terminal
application: less
and more
. In the jocular style of Unix
design, less
is more
and more
is less
. The basic difference
between the two file viewers is that less
allows you to go
forward and backwards in a file while more
only allows you to
move forward in the file, exiting at the end. Thus, in Unix,
less
is really more than more
.
Let's see an example of why this is useful. Consider two great
authors of literature, Charles Dickens and Ernest
Hemingway. Dickens was paid by the word and so his stories are very
long indeed, while Hemingway was a minimalist, and his stories
were quite short. In the part2
directory you have two text files,
one named dickens.txt
and one named hemingway.txt
.
We can easily read hemingway.txt
using more
, it just moves
forward in the file by pressing :space: to page down or the arrow
key :down:. The indicator at the bottom of the more
screen
--More--(95%)
Describes how far in the file we've progressed. Lets now use more
to read dickens.txt
… oh man! This is going to take forever,
and there is only one way to go, forward. Clearly, we need a more
powerful viewer, so we use less
. The less terminal allows you to
move forward and back, plus a bunch of other useful navigation
tools. Here are some below
less
Navigation
- To quit:
q
- To search forward:
/
then type your search (regex allowed) then use the following- Find Next match going forwards:
n
- Find Prev match going backwards:
N
- Find Next match going forwards:
- To search backward:
?
then type your search (regex allowed) then use the following- Find Prev going backwards:
n
- Find Next going forwards:
N
- Find Prev going backwards:
- Go To line:
:
the type line number - Start of File:
<
- End of File:
>
- Panic:
C-g
mash this if you are in a state you don't understand
2.2.3 Lab Questions and Tasks:
- Use
cat
to place a "Beat Army!" at the start of Hemmingway's a very short story and "Go Navy!" at the end. Include the command you used on your worksheet. - Why is
less
more
? - Use
less
to opendickens.txt
:- Search for the first instance of "Fagin", what is the line of that text?
- Find the second to last instance of "Fagin". Describe how you did that and the sentence it appears in.
- Go to line 1845, what is the name of that chapter?
2.3 Part 3: Viewing Files with head
, tail
, sed
and grep
When we do want to print the contents of a file to the terminal, we
may not want to print the whole thing, as cat
does. Instead,
sometimes we'd like just to print the first n lines, or the last
n lines, or some set of lines in the middle, or just print lines
that match a given search string. For that we have set of very
useful commands.
For the following examples, navigate to the part4
directory in the
lab directory. There is a sample file sample-db.csv
that you will
use for this part that contains fake records of people entering
information on a web server.
2.3.1 View the first or last n lines with head
or tail
The head
command line tools is used to print the head of the
file. By default, head
prints the first 10 lines:
#> head sample-db.csv #first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com Lenna,Paprocki,Feltz Printing Service,639 Main St,Anchorage,Anchorage,AK,99501,907-385-4412,907-921-2010,lpaprocki@hotmail.com,http://www.feltzprintingservice.com Donette,Foller,Printing Dimensions,34 Center St,Hamilton,Butler,OH,45011,513-570-1893,513-549-4561,donette.foller@cox.net,http://www.printingdimensions.com Simona,Morasca,Chapman, Ross E Esq,3 Mcauley Dr,Ashland,Ashland,OH,44805,419-503-2484,419-800-6759,simona@morasca.com,http://www.chapmanrosseesq.com Mitsue,Tollner,Morlong Associates,7 Eads St,Chicago,Cook,IL,60632,773-573-6914,773-924-8565,mitsue_tollner@yahoo.com,http://www.morlongassociates.com Leota,Dilliard,Commercial Press,7 W Jackson Blvd,San Jose,Santa Clara,CA,95111,408-752-3500,408-813-1105,leota@hotmail.com,http://www.commercialpress.com Sage,Wieser,Truhlar And Truhlar Attys,5 Boston Ave #88,Sioux Falls,Minnehaha,SD,57105,605-414-2147,605-794-4895,sage_wieser@cox.net,http://www.truhlarandtruhlarattys.com
Similarly, tail
by default will show the last 10 lines:
#> tail sample-db.csv Carlee,Boulter,Tippett, Troy M Ii,8284 Hart St,Abilene,Dickinson,KS,67410,785-347-1805,785-253-7049,carlee.boulter@hotmail.com,http://www.tippetttroymii.com Thaddeus,Ankeny,Atc Contracting,5 Washington St #1,Roseville,Placer,CA,95678,916-920-3571,916-459-2433,tankeny@ankeny.org,http://www.atccontracting.com Jovita,Oles,Pagano, Philip G Esq,8 S Haven St,Daytona Beach,Volusia,FL,32114,386-248-4118,386-208-6976,joles@gmail.com,http://www.paganophilipgesq.com Alesia,Hixenbaugh,Kwikprint,9 Front St,Washington,District of Columbia,DC,20001,202-646-7516,202-276-6826,alesia_hixenbaugh@hixenbaugh.org,http://www.kwikprint.com Lai,Harabedian,Buergi & Madden Scale,1933 Packer Ave #2,Novato,Marin,CA,94945,415-423-3294,415-926-6089,lai@gmail.com,http://www.buergimaddenscale.com Brittni,Gillaspie,Inner Label,67 Rv Cent,Boise,Ada,ID,83709,208-709-1235,208-206-9848,bgillaspie@gillaspie.com,http://www.innerlabel.com Raylene,Kampa,Hermar Inc,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,46514,574-499-1454,574-330-1884,rkampa@kampa.org,http://www.hermarinc.com Flo,Bookamer,Simonton Howe & Schneider Pc,89992 E 15th St,Alliance,Box Butte,NE,69301,308-726-2182,308-250-6987,flo.bookamer@cox.net,http://www.simontonhoweschneiderpc.com Jani,Biddy,Warehouse Office & Paper Prod,61556 W 20th Ave,Seattle,King,WA,98104,206-711-6498,206-395-6284,jbiddy@yahoo.com,http://www.warehouseofficepaperprod.com Chauncey,Motley,Affiliated With Travelodge,63 E Aurora Dr,Orlando,Orange,FL,32804,407-413-4842,407-557-8857,chauncey_motley@aol.com,http://www.affiliatedwithtravelodge.com
You can describe how many lines you wish to show in two ways,
either by using -n
argument, where n
is replaced by the number
of lines. For example, to print the first 3 lines:
#> head -3 sample-db.csv #first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com
Or, by passing the number of lines, following -n
like -n 3
:
>head -n 3 sample-db.csv #first_name,last_name,company_name,address,city,county,state,zip,phone1,phone2,email,web James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com
2.3.2 Printing intermediate lines with sed
The sed
command is very powerful, and it has many more features
than just printing intermediary lines. We will explore some of
those features later in the course but today just focus on
intermediary line printing.
Here is the format of the sed command:
Line Number Input | ,--File to process v v sed -n 3,10p filename ^ ^^ Start---' || Finish-----''---Print those lines
As example, what if we want to print the first 3 lines of the database file without including the index, the line starting with #. Then, we'd like to print lines 2 through 4.
#> sed -n 2,4p sample-db.csv James,Butt,Benton, John B Jr,6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com Josephine,Darakjy,Chanay, Jeffrey A Esq,4 B Blue Ridge Blvd,Brighton,Livingston,MI,48116,810-292-9388,810-374-9840,josephine_darakjy@darakjy.org,http://www.chanayjeffreyaesq.com Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com
2.3.3 Printing only matching lines with grep
Finally, we need a mechanism to only process lines that match a
condition. The grep
command is used for that, and it is such an
important command in the Unix ecosystem, it is used as a verb. For
example, "We grep out lines matching the string" is something
you'll hear your instructors say throughout this class, and "grep"
is loosely defined as "match."
For example, let's consider trying to just print the lines where the person is from the state of New Jersey. To do that, we need to first identify a unique part of lines for people from New Jersey, and that is "NJ" in the address field.
#> grep NJ sample-db.csv Art,Venere,Chemel, James L Cpa,8 W Cerritos Ave #54,Bridgeport,Gloucester,NJ,08014,856-636-8749,856-264-4130,art@venere.org,http://www.chemeljameslcpa.com Alisha,Slusarski,Wtlz Power 107 Fm,3273 State St,Middlesex,Middlesex,NJ,08846,732-658-3154,732-635-3453,alisha@slusarski.com,http://www.wtlzpowerfm.com Ernie,Stenseth,Knwz Newsradio,45 E Liberty St,Ridgefield Park,Bergen,NJ,07660,201-709-6245,201-387-9093,ernie_stenseth@aol.com,http://www.knwznewsradio.com (...)
Note that in a grep
command, the first part is the search term
and the second part is the file to be searched. grep
is a very
powerful command that uses a special search language called regular
expressions, and you can search for all sorts of things. This is
not the focus of this course, but if you would like to learn more,
speak with your instructor.
2.3.4 Lab Questions
- Read the man pages for
head
andtail
, produce a command line to print the first kilobyte of the database file. Note, a kilobyte is 210 or 1024 bytes. Include the command line in your worksheet. - Use
less
orgrep
to find the line number of "Mastella". Produce ased
command to just print the line with "Mastella" and the following 5 lines. Include the command line in your worksheet. - How many people's first name is "Pamella"? Use
grep
to find that out. - Read the
man
page for grep. Print out the all the lines from people who do not have an address in NJ. Include the command line in your worksheet.
2.4 Part 4: Pipelines with cut
, sort
, uniq
, and wc
The final piece of the puzzle for file processing is to take the output of processing a file and set it as the input to another process. These process parts can be chained together into a pipeline. Consider this simple pipeline below:
#> cat sample-db.csv | head -20
The pipe or |
takes the output of one command and sets it as the
input of another. In the example above, the output of the cat
is
to print the whole contents of the file to the input of head
, which
then only prints the first 20 lines of its input. While this is a
contrived example, you should start to see the power of the
pipeline. Consider the below command:
#> grep NJ sample-db.csv | wc -l
The first part of the command will print out only the lines that
contain the pattern "NJ", this output is then set as the input to
the wc
command, which is a command line tools to count words,
lines, and bytes. The -l
option says to just print the line count,
and thus, the command above prints out the number of lines.
Nearly all Unix command line tools have an option to either read
from a file or from standard input along a pipeline. For example, the
above command can be rewritten with cat
at the front, as follows:
#> cat sample-db.csv | grep NJ | wc -l
For more options of wc
, refer to the man page.
2.4.1 Parsing just fields with cut
A very useful pipeline tool is cut
, which is used to extract
fields from a formatted file, like our database file. Here is a
basic command line argument:
,--- Deliminator __|_ ,-----Input File, or leave off to read from stdin / \ v #> cut -d "," -f 1 sample-db.csv | head -5 \__/ \__.-- Field
The deliminator determines how the file is to be cut. The
sample-db.csv
file is a comma separated file, so it is delimitated
by commas; that is, every item in the line is separated by comma to
distinguish it from other items on the line. The above command will
print the first 5 lines of output from the first delimitated item:
#> cut -d "," -f 1 sample-db.csv | head -5 #first_name James Josephine Art Lenna
That is the first_name
field. If we wished to not include the
index, first_name
, we could use a head
and tail
command
combination:
#> cut -d "," -f 1 sample-db.csv | head -6 | tail -5 James Josephine Art Lenna Donette
As you can see, this quickly builds into a very powerful tool just by adding commands to the pipeline.
2.4.2 Sorting and removing duplicates with sort
and uniq
The last two parsing tools we'll use in this lab is sort
and
uniq
, the former will sort input and the later will remove any
adjacent duplicate lines. Consider how these might be used in
conduction to solve different file parsing problems. We leave their definitions and usage for you to determine by reading the man
pages.
2.4.3 Transposing text with tr
The last bit of command line trickery for you is the tr
command, or
translate character tool. It is really simple, and takes two arguments:
tr from to
It will, read from standard input and convert anytime it find from
on a line to to
. For example, if we wished to take a phone numbers we find:
#> cut -d, -f 9 sample-db.csv | head phone1 907-385-4412 513-570-1893 773-573-6914 408-752-3500 605-414-2147 631-335-3414 310-498-5651 440-780-8425 602-277-4385
And we wish to replace the hyphan with a space, we can do so easily like this:
#> cut -d, -f 9 sample-db.csv | tr "-" " " | head phone1 907 385 4412 513 570 1893 773 573 6914 408 752 3500 605 414 2147 631 335 3414 310 498 5651 440 780 8425 602 277 4385
2.4.4 Lab Questions and Tasks
You should continue to work in part 4 to complete these lab questions:
- Create a pipeline to count the number of unique states represented in the database file. Include the pipeline in your worksheet.
- How many first names in the file repeat? How many last names? Include the pipelines used to determine this.
- Write a pipeline to first print to the terminal all the unique
telephone area codes? This includes both sets of phone
numbers. However, if you're unable to do that for both sets,
provide a pipeline for at least one set. Once complete, add to
your pipeline how to sort those numerical? (Hint: read the man
page for
sort
).