IC221: Systems Programming (SP14)


Home Policy Calendar Syllabus Resources Piazza

Lec 19: File Links: Hard and Symbolic

Table of Contents

1 Kernel Data Structures Review

In the lats lesson, we looked at the way the different kernel file data structures provide a way to understand the way we've been access and using open files. Today, we continue that discussion, this time at the level of the v-nodes/i-nodes, the lowest level structure describing how to access the underlying file. To review, let's take a look at the data structures gain.

fs-kernel-datastructures.png

Figure 1: Kernel File System Data Structures

Moving from left to right: Every process has an entry in the process table that contains information about open files, indexed by file descriptor numbers; the process table references an open file table in the kernel, which contains information about the current state of the file, like the offset; and, then on the far right, is the v-node/i-nodes, which represent a virtual representation of the file (v-node) and the actual mechanisms needed to open the file/device (i-node).

This lesson will explore the v-node/i-node relationship and how this translates into the file system. In particular, we will explore file linking, which comes in two varieties, hard links and symbolic links.

2 Hard Links

The technical term of linking refers to the process of a file system entry linking to an underlying file-block as it exists on disc. It is an abstraction that enables us to view the file system and the underlying storage of data separately. The v-node and the i-node abstraction is the same as linking; the v-nodes are the file system enteries while the i-nodes represent the underlying storage mechanisms for that data within that file.

When we describe a hard link, this means that a file block can exist in multiple places within the file system. Essentially, there are two virtual v-nodes that reference the same i-node:

fs-hard-links.png

Figure 2: Hard Linking of V-nodes

Another way to think of a hard-link is that the file exists in multiple places within the file system, and can have different names. But, whenever you open that file, it is the same underlying data that is accessed.

fs-hard-link-fs.png

Figure 3: Hard Linking within File System

2.1 Creating Links with ln

The command line tool to create a link is ln, which is short for link, and by default, the ln command will create hard links. Let's look at a simple example. First, let's create a test file, f.

#> touch f
#> ls -l
-rw-r--r--  1 aviv  staff  0 Mar 23 12:51 f

In this format of ls -l output we have the following information:

   links--.                  .--file size
           \                / 
-rw-r--r--  1 aviv  staff  0 Mar 23 12:51 f
\________/    \_________/    \__________/  \_
     |             |              |          |
permissions    user/group       mod time     file name

All the fields except for links should be familiar to you, and link field describe how many hard links except to the underlying i-node that this file references. Let's see what happens when we create a hard link to f:

#> ln f hl   #Create a hard link to f named hl
#> ls -l
total 0
-rw-r--r--  2 aviv  staff  0 Mar 23 12:51 f
-rw-r--r--  2 aviv  staff  0 Mar 23 12:51 hl

You see the number of links in both f and hl incremented by 1 to 2, because there are now two files that link to the same i-node. We can further explore this by using the -i option on the ls output, which shows us the i-node number:

#> ls -li
34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 f
34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 hl

And, there it is, the same i-node for each hard link.

2.2 Hard Links and Directories

You've actually seen hard links before. Let's rerun ls -li with the -a option so we can see all files and directories:

#> ls -lia
34996362 drwxr-x--- 2 aviv scs 4096 Mar 23 18:14 .
34996236 drwxr-x--- 3 aviv scs 4096 Mar 23 18:14 ..
34996364 -rw-r----- 2 aviv scs    0 Mar 23 18:14 f
34996364 -rw-r----- 2 aviv scs    0 Mar 23 18:14 hl

Notice that the . or current directory has 2 hard links. Where do those come from? That's because every . and .. reference is actually a hard-link you get for free. The current directory is referenced by . to itself, and then directly to itself by name in the upper directory, that's 2. The .. directory have 3 references: the current directly link ..; the upper level directory . link; and the name of the directory itself.

Another thing to note is that a directory and a file in Unix is actually the same thing with respect to v-nodes/i-nodes. The only difference is that a file stores data and a directory stores a list of files within the directory.

2.3 Un-linking is the same as removal

In the same way we can add links, we can also remove links. The command to do that is unlink. Let's see the result:

#> ls -li
total 0
34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 f
34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 hl
#> unlink hl
#>ls -l
total 0
-rw-r----- 1 aviv scs 0 Mar 23 18:14 f

Notice that all that happened when we un-linked hl is that we removed it, and that is exactly what remove does —it un-links a v-node from a i-node. In other terms, the unlink command removes an entry from the file system by removing a link. The result of the unlinking can be the remove of a file if no links exist further. In the above example, f still links to the i-node, so the i-node is not removed. An i-node without any links can be reclaimed for further use, like freeing memory.

2.4 Hard-linking across file systems

Two files that are hard-linked to the same file block or i-node is specific to the file system that is mounted. That means you cannot create a hard-link across mounted file systems because the link may span different kind of devices. If you think about it, this make sense because the i-node also stores device specific information for how to read and write from that device, it is not possible for a link to span different mounted file systems.

We can see this for ourselves. Recall the mount table from running mount on a lab machine:

#> mount
/dev/sda1 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
gvfs-fuse-daemon on /var/lib/lightdm/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=lightdm)
nuzee.academy.usna.edu:/home/mids on /home/mids type nfs (rw,nfsvers=3,bg,addr=10.1.83.28)
zee.cs.usna.edu:/courses on /courses type nfs (rw,bg,addr=10.1.83.18)
zee.cs.usna.edu:/home/scs on /home/scs type nfs (rw,nfsvers=3,bg,addr=10.1.83.18)

Your home directories are mounted at /home/scs and /home/mids, which comes from the network file server zee, but the root file system is a disc from device /dev/sda1. Let's see what happens when we try and create a hard link from our home directory to the /tmp directory on the root file system:

#> ln f /tmp/f
ln: failed to create hard link `/tmp/f' => `f': Invalid cross-device link

3 Symbolic Links

There is a second kind of linking in file systems that is more common for users. Symbolic Linking is when one file links to another file, which refernces the underlying file block.

fs-sym-link.png

Figure 4: Symbolic Links

Unlike a hard link, where both files are essentially the same and have the same i-node, but different names. With a symbolic link, the two files have separate i-nodes, with one file referencing another.

fs-sym-link-fs.png

Figure 5: Symbolic Links

Symbolic links in Unix are a lot like "shortcuts" in Windows. While you can treat a shortcut like the like the original file by clicking on it and running it, it is not the original file. It is just a link to the original file, it is the same with symbolic links.

3.1 Creating Symbolic Links

You also create symbolic links with ln, but with an -s option:

#> ln -s f sl
#> ls -l
-rw-r----- 1 aviv scs 0 Mar 23 18:14 f
lrwxrwxrwx 1 aviv scs 1 Mar 23 18:44 sl -> f

You can see sl links to f, and this is indicated in the ls output. The permission of sl are not clear, since this is also part of the linking. If f has a particularly permission, so does sl.

3.2 Dangling Links

Since symbolic links do not need to reference an i-node, it's ok if the linked too file does not exist. We can even remove the file that is being linked and sl just sticks around as before:

#> rm f
rm: remove regular empty file `f'? y
#> ls -l
lrwxrwxrwx 1 aviv scs 1 Mar 23 18:44 sl -> f
#> cat sl
cat: sl: No such file or directory

But if we try and use sl and follow the link, we'll get an error since f does not exist. A symbolic link that links to a nonexistent file or directory is described as a dangling link or broken link.

3.3 Symbolic Links across Files Systems

Unlike a hard link where the i-node must exist, a symbolic link can dangle. This means we can create symbolic links across file systems mounts.

#> ln -s f /tmp/sl
#> ls -l /tmp/sl
lrwxrwxrwx 1 aviv scs 1 Mar 23 18:51 /tmp/sl -> f

Think about why we can create symbolic links across file systems while we cannot create hard links. A symbolic link only references a file, it has nothing to do with the underlying device and file storage, so symbolic links can cross device boundaries. Further, since a symbolic link can be bad, not link anything, then even if that file system is not mounted, the symbolic link can exist irrespective.

3.4 Relative vs. Absolute Symbolic Links

One thing to be careful about with symbolic links is if the links are absolute or relative. An absolute symbolic link refers to a file by its absolute path, while a relative link uses an relative path from the location of the file. Let's look at a quick example. The setup is as follows:

#> echo "Hello World" > a
#> cat a
Hello World
#> mkdir dir1
#> ls -l
total 8
-rw-r----- 1 aviv scs   12 Mar 25 08:59 a
drwxr-x--- 2 aviv scs 4096 Mar 25 09:00 dir1

There is a file a with "Hello World" stored in it and a directory dir1. We want to create a symbolic link from within dir1 to a. Let's start by doing the obvious thing:

#> ln -s a dir1/a
#> ls -l dir1
total 0
lrwxrwxrwx 1 aviv scs 1 Mar 25 09:03 a -> a

This seemed to work. There is a symbolic link in dir1 called a which refers to a, but which a is that? Is it the one in the upper level directory, the one with "Hello World" in it, or is it the one in its own directory? We can see by trying to follow the link:

#> cat dir1/a 
cat: dir1/a: Too many levels of symbolic links

It is a self-referencing symbolic link and that's because the link is relative. When we see a -> a the link is relative to the location of the symbolically linking file; that is, dir/a refers to itself. What we need is for dir/a to refer to ../a, the a in the upper level directory with "Hello World."

To create a proper relative link we have to specify the right path in relation to the location of the link.

#> ln -s ../a dir1/a
#> ls -l dir1
total 0
lrwxrwxrwx 1 aviv scs 4 Mar 25 09:07 a -> ../a

Now, we can use that link properly:

#> cat dir1/a
Hello World

An easy trick to get your absolute paths right when creating symbolic links is to always change into the working directory where you wish the link to exist.

#> cd dir1/
#> ln -s ../a b
#> ls -l
total 0
lrwxrwxrwx 1 aviv scs 4 Mar 25 09:07 a -> ../a
lrwxrwxrwx 1 aviv scs 4 Mar 25 09:12 b -> ../a

Now its a bit more direct that you are linking ../a to b rather then having to envision the relative path from within file system.