Lec 22: File Links: Hard and Symbolic
Table of Contents
1 Kernel Data Structures Review
In the last lesson, we looked at the ways kernel file data structures help us understand the varied accesses of files and file descriptors. Today, we continue that discussion, this time at the level of the v-nodes/i-nodes, the lowest level structure describing how to access the underlying file. To review, let's take a look at the data structures gain.
Moving from left to right: Every process has an entry in the process table that contains information about open files, indexed by file descriptor numbers; the process table references an open file table in the kernel, which contains information about the current state of the file, like the offset; and, then on the far right, is the v-node/i-nodes, which represent a virtual representation of the file (v-node) and the actual mechanisms needed to open the file/device (i-node).
This lesson will explore the v-node/i-node relationship and how this translates into the file system. In particular, we will explore file linking, which comes in two varieties, hard links and symbolic links.
2 Hard Links
The technical term of linking refers to the process of a file system entry linking to an underlying file-block as it exists on disc. It is an abstraction that enables us to view the file system and the underlying storage of data separately. The v-node and the i-node abstraction is the same as linking; the v-nodes are the file system enteries while the i-nodes represent the underlying storage mechanisms for that data within that file.
When we describe a hard link, this means that a file block can exist in multiple places within the file system. Essentially, there are two virtual v-nodes that reference the same i-node:
Another way to think of a hard-link is that the file exists in multiple places within the file system, and can have different names. But, whenever you open that file, it is the same underlying data that is accessed.
2.1 Creating Links with ln
The command line tool to create a link is ln
, which is short for
link, and by default, the ln
command will create hard links. Let's
look at a simple example. First, let's create a test file, f
.
#> touch f #> ls -l -rw-r--r-- 1 aviv staff 0 Mar 23 12:51 f
In this format of ls -l
output we have the following information:
links--. .--file size \ / -rw-r--r-- 1 aviv staff 0 Mar 23 12:51 f \________/ \_________/ \__________/ \_ | | | | permissions user/group mod time file name
All the fields except for links should be familiar to you, and link
field describe how many hard links except to the underlying i-node
that this file references. Let's see what happens when we create a
hard link to f
:
#> ln f hl #Create a hard link to f named hl #> ls -l total 0 -rw-r--r-- 2 aviv staff 0 Mar 23 12:51 f -rw-r--r-- 2 aviv staff 0 Mar 23 12:51 hl
You see the number of links in both f
and hl
incremented by 1 to
2, because there are now two files that link to the same i-node. We
can further explore this by using the -i
option on the ls
output,
which shows us the i-node number:
#> ls -li 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 f 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 hl
And, there it is, the same i-node
for each hard link.
2.2 Hard Links and Directories
You've actually seen hard links before. Let's rerun ls -li
with the
-a
option so we can see all files and directories:
#> ls -lia 34996362 drwxr-x--- 2 aviv scs 4096 Mar 23 18:14 . 34996236 drwxr-x--- 3 aviv scs 4096 Mar 23 18:14 .. 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 f 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 hl
Notice that the .
or current directory has 2 hard links. Where do
those come from? That's because every .
and ..
reference is
actually a hard-link you get for free. The current directory is
referenced by .
to itself, and then directly to itself by name in
the upper directory, that's 2. The ..
directory have 3 references:
the current directly link ..
; the upper level directory .
link;
and the name of the directory itself.
Another thing to note is that a directory and a file in Unix is actually the same thing with respect to v-nodes/i-nodes. The only difference is that a file stores data and a directory stores a list of files within the directory.
2.3 Un-linking is the same as removal
In the same way we can add links, we can also remove links. The
command to do that is unlink
. Let's see the result:
#> ls -li total 0 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 f 34996364 -rw-r----- 2 aviv scs 0 Mar 23 18:14 hl #> unlink hl #>ls -l total 0 -rw-r----- 1 aviv scs 0 Mar 23 18:14 f
Notice that all that happened when we un-linked hl
is that we
removed it, and that is exactly what remove does —it un-links a
v-node from a i-node. In other terms, the unlink
command removes an
entry from the file system by removing a link. The result of the
unlinking can be the full removal (or deletion) of the underlying file
if no further links exist to that i-node. In the above example, f
still links to the i-node, so the i-node is not removed. An i-node
without any links can be reclaimed for further use, like freeing
memory.
2.4 Hard-linking across file systems
Two files that are hard-linked to the same file block or i-node is specific to the file system that is mounted. That means you cannot create a hard-link across mounted file systems because the link may span different kind of devices. If you think about it, this make sense because the i-node also stores device specific information for how to read and write from that device, it is not possible for a link to span different mounted file systems since that information is different dependent on the file system.
We can see this for ourselves. Recall the mount table from running
mount
on a lab machine:
#> mount /dev/sda1 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880) none on /run/shm type tmpfs (rw,nosuid,nodev) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev) rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw) gvfs-fuse-daemon on /var/lib/lightdm/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=lightdm) mope.academy.usna.edu:/home/mids on /home/mids type nfs (rw,nfsvers=3,bg,addr=10.1.83.28) zee.cs.usna.edu:/courses on /courses type nfs (rw,bg,addr=10.1.83.18) zee.cs.usna.edu:/home/scs on /home/scs type nfs (rw,nfsvers=3,bg,addr=10.1.83.18)
Your home directories are mounted at /home/scs
and /home/mids
,
which comes from the network file server zee
and mope
, but the
root file system is a disc from device /dev/sda1
. Let's see what
happens when we try and create a hard link from our home directory to
the /tmp
directory on the root file system:
#> ln f /tmp/f ln: failed to create hard link `/tmp/f' => `f': Invalid cross-device link
3 Symbolic Links
There is a second kind of linking for file systems that is more common for users. Symbolic Linking is when one file links to another file, which refernces the underlying file block.
Unlike a hard link, where both files are essentially the same and have the same i-node, but different names. With a symbolic link, the two files have separate i-nodes, with one file referencing another.
Symbolic links in Unix are a lot like "shortcuts" in Windows. While you can treat a shortcut like the like the original file by clicking on it and running it, it is not the original file. It is just a link to the original file, it is the same with symbolic links.
3.1 Creating Symbolic Links
You also create symbolic links with ln
, but with an -s
option:
#> ln -s f sl #> ls -l -rw-r----- 1 aviv scs 0 Mar 23 18:14 f lrwxrwxrwx 1 aviv scs 1 Mar 23 18:44 sl -> f
You can see sl
links to f
, and this is indicated in the ls
output. The permission of sl
are not clear, since this is also part
of the linking. If f
has a particularly permission, so does sl
.
3.2 Dangling Links
Since symbolic links do not need to reference an i-node, it's ok if
the linked too file does not exist. We can even remove the file that
is being linked and sl
just sticks around as before:
#> rm f rm: remove regular empty file `f'? y #> ls -l lrwxrwxrwx 1 aviv scs 1 Mar 23 18:44 sl -> f #> cat sl cat: sl: No such file or directory
But if we try and use sl
and follow the link, we'll get an error
since f
does not exist. A symbolic link that links to a nonexistent
file or directory is described as a dangling link or broken
link.
3.3 Symbolic Links across Files Systems
Unlike a hard link where the i-node must exist, a symbolic link can dangle. This means we can create symbolic links across file systems mounts.
#> ln -s f /tmp/sl #> ls -l /tmp/sl lrwxrwxrwx 1 aviv scs 1 Mar 23 18:51 /tmp/sl -> f
Think about why we can create symbolic links across file systems while we cannot create hard links. A symbolic link only references a file, it has nothing to do with the underlying device and file storage, so symbolic links can cross device boundaries. Further, since a symbolic link can be bad, not link anything, then even if that file system is not mounted, the symbolic link can exist irrespective.
3.4 Relative vs. Absolute Symbolic Links
One thing to be careful about with symbolic links is if the links are absolute or relative. An absolute symbolic link refers to a file by its absolute path, while a relative link uses an relative path from the location of the file. Let's look at a quick example. The setup is as follows:
#> echo "Hello World" > a #> cat a Hello World #> mkdir dir1 #> ls -l total 8 -rw-r----- 1 aviv scs 12 Mar 25 08:59 a drwxr-x--- 2 aviv scs 4096 Mar 25 09:00 dir1
There is a file a
with "Hello World" stored in it and a directory
dir1
. We want to create a symbolic link from within dir1
to
a
. Let's start by doing the obvious thing:
#> ln -s a dir1/a #> ls -l dir1 total 0 lrwxrwxrwx 1 aviv scs 1 Mar 25 09:03 a -> a
This seemed to work. There is a symbolic link in dir1
called a
which refers to a
, but which a
is that? Is it the one in the upper
level directory, the one with "Hello World" in it, or is it the one in
its own directory? We can see by trying to follow the link:
#> cat dir1/a cat: dir1/a: Too many levels of symbolic links
It is a self-referencing symbolic link and that's because the link is
relative. When we see a -> a
the link is relative to the location of
the symbolically linking file; that is, dir/a
refers to itself. What
we need is for dir/a
to refer to ../a
, the a
in the upper level
directory with "Hello World."
To create a proper relative link we have to specify the right path in relation to the location of the link.
#> ln -s ../a dir1/a #> ls -l dir1 total 0 lrwxrwxrwx 1 aviv scs 4 Mar 25 09:07 a -> ../a
Now, we can use that link properly:
#> cat dir1/a Hello World
An easy trick to get your absolute paths right when creating symbolic links is to always change into the working directory where you wish the link to exist.
#> cd dir1/ #> ln -s ../a b #> ls -l total 0 lrwxrwxrwx 1 aviv scs 4 Mar 25 09:07 a -> ../a lrwxrwxrwx 1 aviv scs 4 Mar 25 09:12 b -> ../a
Now its a bit more direct that you are linking ../a
to b rather then
having to envision the relative path from within file system.