29 File – System Interface – II
Mary Anitha Rajam
32.1 Introduction
Systems stores millions of files on disk. Therefore, it is necessary to organize the disk to manage this large number of files. In this module, we learn the structure of a disk and we discuss issues related to file-system design.
32.2 Disk Structure
The disk which can be in the size of terabytes can be subdivided into partitions. A disk or partition can be used raw – without a file system, or can be formatted with a file system. The partitions of the disk are also known as minidisks or slices. Each disk has at least one partition. Partitions can store multiple operating systems. That is, each partition can have a different operating system. Each entity containing a file system is known as a volume. Each volume containing a file system also needs to track that file system’s information. In each volume, this information is maintained in a device directory or volume table of contents. The device directory records information such as name, location, size, type for all files in that volume.
Fig. 32.1 A Typical File-system Organization
Figure 32.1 shows the organization of a typical file-system. Disk1 has two partitions. Partition A has a file system and a corresponding device directory. Partition B has another file system and a corresponding device directory. It is also possible for a partition to cover two disks. Partition C has a file system that is kept in disk2 and disk3.
In the following section, we learn the different ways in which the directory structure can be maintained.
32.3 Directory Structure
A directory structure is a collection of nodes containing information about all files that are kept in the disk. Both the directory structure and the files reside on the disk. The backups of these two structures are kept on tapes. Figure 32.2 shows a few directories and files under the directories.
Fig.32.2 Directories and Files
32.4 Operations Performed on Directory
Similar to how operations can be performed on files, there are operations that can be performed on directories. Some of the operations that can be performed on directories are given below:
• Search for a file – A directory has information about all the files present in the directory. To search for a file, it is necessary to search the directory.
• Create a file – To create a file, an entry is created in the directory. For this, it is necessary to write into the directory.
• Delete a file – To delete a file, it is necessary to remove the name of the file and all other details about the file from the directory. This again needs write permissions in the directory.
• List a directory – For listing the contents of a directory, it is necessary to read from the directory.
• Rename a file – To change the name of the file, it is necessary to read, write and search in the directory. Note that renaming a file may change the position of the file name in the directory.
• Traverse the file system – This also needs reading and searching operations on the directory.
32.5 Organizing the Directory
It is necessary to organize a directory logically so that the following are achieved:
• Efficiency – locating a file quickly. The directory should be organized such that files can be located quickly when searching.
• Naming – convenient to users. The names of the files cannot be arbitrary names and must be easier for the users to remember the names. Two users cannot have the same name for different files. The same file can have several different names.
• Grouping – Based on the properties of files, it is necessary to logically group the files. For example, all Java programs may have to be put under one directory, all games under another directory and so on.
32.6 Logical Structure of directory
In this section, the most common schemes used for defining the logical structure of a directory are explained.
32.6.1 Single-Level Directory
In this scheme, a single directory is kept for all the users. This is the simplest scheme. But there are limitations in this scheme. Since there is only one directory, it becomes difficult to manage the files when the number of files increases. When the system has more than one user, it becomes difficult for different users to assign unique names to files. Even a single-user finds it difficult to remember the names of his/her files because all files are kept in the same directory. Figure 32.3 shows an example of a single-level directory. The files cat, bo, a, …, records are kept in a single directory.
Fig.32.3 Single-level directory
32.6.2 Two-Level Directory
In a two-level directory, there is a master file directory (MFD) and under this directory, a separate directory is assigned for each user (Figure 32.4). This scheme has more advantages compared to the single-level directory. Different users can use the same file name. Creation and deletion of files are confined to the user’s user file directory (UFD). Therefore, naming becomes easier in this scheme. But the limitation is that users can’t cooperate and access one another’s files. To access another user’s file, the path name of that file should be specified. The path name of a file is specified using the name of the MFD (root) followed by the name of UFD and the name of file.
Another limitation in this system is the placing of system files like loaders, assemblers and compilers. These files are needed by all the users. If these files are copied to each UFD, it is waste of space. Therefore, a special user directory can be created to contain the system files.
For accessing files that are not kept in the current directory, a search path can be defined. By default, files are first searched in the current directory. If the files are not available in the current directory, the files are searched in the directories specified in the search path. The system files can also be searched in this manner.
Fig.32.4 Two-level directory
32.6.3 Tree-Structured Directories
Fig.32.5 Tree-Structured directory
The two-level directory was extended to have multiple levels and the tree-structured directory was formed. The tree is the most common directory structure. Figure 32.5 shows an example of a tree-structured directory. The directory is treated as a special kind of file. Under a directory, there can be files as well as other subdirectories. A bit is used to differentiate between a file and a directory present in the same level. The tree has a root directory and beneath the root directory, there are a number of subdirectories. System calls are available for creating and deleting directories. Every file has a unique path name.
Each user has a current directory (working directory). Whenever a file is to be searched, the current directory is first searched. The current directory can be changed from one directory to another using the cd command or using the chdir system call. For example cd /spell/mail/prog changes the current directory to the directory prog. After changing the current directory to prog, the default search for any file will be done in the directory prog. When a parent process creates a child process, the child process inherits the current directory from the parent process.
The introduction of tree-structured directories introduced two types of path names called the absolute path name and the relative path name. If the path name starts from the root, it is called as absolute path name. If the path name starts from the current directory, it is called as relative path name. For example, if root/spell/mail is the current directory, prt/first is the relative path name and absolute path name is root/spell/mail/prt/first.
Whenever a file is newly created, it is created in the current directory. A new subdirectory is also created in the current directory. The command used for creating a subdirectory is mkdir <dir-name> where dir-name is the name of the subdirectory that is created. For example, if in current directory /mail, a subdirectory called count is to be created, then the command mkdir count is used. Figure 32.6 shows that a subdirectory count is created beneath the directory mail.
Fig.32.6 Example of creation of sub-directory
We now see how deletion is handled in a tree-structured directory. To delete a file, the command rm <file-name> is used, where file-name is the name of the file to be deleted. A directory can be deleted if it is empty. If the directory is not empty, what happens to the files and subdirectories beneath this directory? Different operating systems handle this in a different manner. Some operating systems do not delete directories unless the directories are empty. Some operating systems delete the directory and all the files and subdirectories underneath it.
In a tree-structured directory, the files of other users are accessed by specifying path names. A path name may be an absolute pathname or a relative pathname. It is also possible to change the current directory to other user’s directory, provided permissions are given. Users are allowed to define their own search paths. That is, whenever a file is searched, the users can specify the sequence of directories in which search should take place. For example, local directory first, system file directory next and then other user’s directory. The tree-structured directory prohibits the sharing of files and directories among different users.
32.6.4 Acyclic-Graph Directories
To overcome the disadvantage of tree-structured directories, acyclic-graph directories were formed which have shared subdirectories and files. Figure 32.7 shows that file count is shared by the directory spell and the directory dict. Therefore, the file count has two different path names, /dict/count and /spell/count. This sharing of files is helpful when there is more than person working on a project and have to access a common file. This is not the same as having two copies of the same file.
Sharing of files can be implemented in two ways. One way is to duplicate all the information in both the directories. But when information is duplicated, it is necessary to take care of the consistency of data. The other way is using links. Links can be hard links or soft links.
Fig.32.7 Acyclic-graph structured directory
In hard links, both the path names that refer to the same file use the same inode number and the link count of the file is made 2. To access the shared file, the inode is used. In soft links, there is an original file and a link file. The original file is the one that is first created and the link file is the one that is created next. The content of the link file is the pathname of the original file. When the link file is used to access the file, the link file is opened, the contents (the path name of the original file) are taken, the original file is opened and the file contents are accessed.
It is necessary to take special care during the deletion of shared files. When a file is deleted the file is removed.
In the case of symbolic links, if the original file is removed first, it may result in dangling pointers. The link file’s contents has the pathname of the original file but the original file does not exist. But, in symbolic links, if the link file is removed, there is no harm.
In the case of hard links, a link count is maintained. The link count gives the number of links to a file. Whenever a file is deleted, the link count is decremented. The file is preserved until all the references to the file are deleted (that is, till the link count goes down to zero).
Since there are no cycles in an acyclic graph directory, the algorithms required to traverse the graph are simple. But the limitation is that it is necessary to ensure that there are no cycles.
32.5.5 General Graph Directory
The general graph directory is similar to the tree-structured directory except that cycles are allowed in the structure. (A cycle is path of edges and vertices wherein a vertex is reachable from itself. Here, vertices denote directories or files and there is an edge between a file/subdirectory and a directory if the directory is the parent directory of the file/directory). Figure
32.6 Shows an example of a general graph directory.
But if cycles are allowed to exist in directories, there is a possibility of searching a subdirectory twice, that is, it is possible to go from a parent directory to a subdirectory and from the subdirectory to the parent directory and so on. This may result in an infinite loop. To avoid this situation, one way is to limit the number of directories accessed during a search (say, only 16 during a search). The second way is to allow only links to file and not to subdirectories.
Let us now see how deletion of a file is handled in a general graph directory. Since cycles are allowed, if there is a self-referencing directory, there is a possibility to have a non-zero
Fig.32.8 General graph directory
reference count even after deletion. To overcome this, garbage collection is to be done. During garbage collection, the file system is traversed and everything that can be accessed is marked in the first pass. In second pass, everything that is not marked is collected onto a list of free space. But this method is extremely time-consuming for a disk-based file system.
Acyclic graphs are easier to work with. Every time a new link is added, it is better to use a cycle detection algorithm to ensure that there are no cycles.
32.6 Summary
In this module, we learnt the structure of a disk with partitions, volumes, directories and so on. We also learnt the structure of a directory. We understood the design of different logical structures of directories like single-level directory, two-level directory, tree-structured directory, acyclic graph directory and general graph directory.
References
- Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, Sixth Edition, John Wiley & Sons Inc., 2003.
- Andrew S. Tanenbaum, Herbert Bos, “Modern Operating Systems”, Fourth Edition, Pearson Education, 2014.
- Gary Nutt, “Operating Systems”, Third Edition, Pearson Education, 2009.