28 File – System Interface – I

Mary Anitha Rajam

 

31.1  Introduction

 

The operating system defines a logical storage unit called a file. The files are stored in disk blocks in the disk. The mapping of files onto the disk physical devices is done by the operating system. The physical devices are nonvolatile.Information is stored on different storage media. For example, hard disks, pen drives etc. are used to store information. In a disk, data are stored in small units called as disk blocks. That is, the disk is logically divided into disk blocks in which data are stored. The user need not be aware that there are disk blocks in the disk where information is stored. It is enough for the users to understand information in terms of files.

 

In this module we learn the uses of files and file systems, different attributes and types of files, understand file operations. We also discuss different file access methods.

 

31.2  File

 

A file is a named collection of related information that is recorded on secondary storage. The file is the smallest allotment on secondary storage. A file may represent programs or data. That is, a file may be a program file or a data file. The program files can be source programs, objects programs and so on. The data files can have numeric data, alphabetic data, alphanumeric data, binary data and so on.

 

A file has a defined structure depending on the type of the file. For example, the text file is a sequence of characters organized into lines. A source file has a source program that has a sequence of subroutines and functions, organized as declarations followed by executable statements. An object file has a sequence of bytes organized into blocks understandable by the linker. An executable file has a series of code sections that the loader can bring into the memory and execute.

 

31.3  File System

 

The file system consists of a collection of files. When there are a number of files kept in the secondary memory, it is better to keep the files organized. For example, similar types of files can be grouped and the group can be given a name. This group is called a directory. Many directories can be grouped under another directory and so on. Thus, the directory structure organizes and provides information about all the files in the system. This forms the file system. To physically or logically separate large collections of directories, partitions are maintained. Each partition can have a different file system.

 

31.4  File Attributes 

 

Each file has a number of attributes indicating some information of the file. The most common attributes are the name, type, location, size, protection bits, time, date and user identification.

 

Name is the only information kept in human-readable form. This is the name given to identify the file.

Type is the type of the file and is needed for systems that support different types. For example, the different types of files are text files, image files and so on.

Location points to the location of the file on the disk. That is the disk block in the disk where the contents of the file are kept.

Size refers to the current file size.

Protection bits control who can do reading, writing, executing. Some users can read, some can write, some can execute or some can have a combination of different permissions.

Time, date, and user identification

– Information kept for last creation, last modification and last use

– Data useful for protection, security, and usage monitoring.

 

Information about the attributes of the files is kept in the directory structure, which is maintained on the disk.

 

31.5  File Operations

 

The operating system provides a number of system calls to create, write, read, reposition, delete and truncate files. We now see how each of these operations is carried out.

 

31.5.1  Create

 

This operation creates a new file. First, it is necessary to find if there is space for the file in the disk. Then, an entry is made for the new file in the directory. The directory entry stores information about the file like the name of the file, location of the file in disk and so on.

 

31.4.2. Write

 

This operation is used to write contents into a file. The system searches the directory and finds the location of the file. Using this location, contents can be written into the file. The system keeps a write pointer to the location in the file where the next write has to take place. The write pointer is updated after a write occurs.

 

31.5.2  Read

 

The read operation is used to read the contents of a file. The system call used for reading specifies the name of the file and where the next block must be read from. The system needs to keep a read pointer to the location in the file where the next read is to take place. The read pointer is updated after read has taken place. For a particular file, the file position pointer need not be the same for all processes that access the file. Each process maintains its own file position pointer for a particular file.

 

31.5.3  File seek

 

This operation is used to reposition the file pointer within the file. Whenever a read or a write operation is done, the read or write is done on the location which is pointed to by the file pointer. This value of the file pointer or the current-file-position can be modified using the seek operation. The system call used for seek searches the directory for the appropriate entry. The current-file-position maintained in the directory structure is set to the given value. This operation does not access the contents of the file kept in the disk and hence, no I/O is needed. The value of the file pointer alone is changed in the directory structure.

 

31.5.4   Delete

 

This operation is used to delete a file. The name of the file to be deleted is provided in the system call. The file is searched in the directory. The space allocated for the file in the disk is released. The directory entry created for the file is erased.

 

31.5.5  Truncate

 

This operation releases all the contents of the file. The file is not deleted. The file length is reset to zero so that it can be overwritten. The space allocated to the file in the disk is released.

 

31.5.6  Append

 

This operation is used to add new information to the end of the file. The current file position is moved to the end of the file and the contents to be added are written from that position.

 

31.5.7  Rename

 

The name of the file is changed to the new name provided in the system call. For this, the directory entry is modified. The old name is removed and the new name is entered.

 

31.5.8  Copy

 

This operation is used to make a copy of an existing file. The name of the old file and the name of the new file (copy) to be created are provided through the system call. A new file is created, contents of the old file are read and written to the new file.

 

31.5.9  Get and Set attributes

 

There are a number of attributes for a file like the owner of the file, size of the file, time of access and so on. There are system calls to get/set the attributes of a file.

 

File operations need searching in the directory. Operations like read and write need searching the directory for the name of the file from which read/write operation has to be done. To reduce searching a directory each and every time, say, for each read and for each write, the following sequence of operations can be performed.

 

The file is first opened before performing any other operation and closed after all the operations are completed. The open system call is used before any other operation to open the file. This creates an entry in the open-file table maintained by the operating system. When a file operation like read or write is requested, the file is accessed using an index into this table. Therefore, searching the directory structure for each and every operation is not needed. It is enough to use the open-file table. When the file is not used any more, the file is closed.

 

A file can be opened using the open system call or it is implicitly opened during the first reference to the file. A file is automatically closed when the process that opened the file terminates or when the close system call is called.

 

The syntax of the open system call is given as open(filename,   access-mode)   where    access-mode    can    be read-only, read-write, append-only, …

 

The open system call searches the directory for the filename given as argument and copies the entry in the open-file table. The open system call returns a pointer to the open-file table. For further file operations, the returned pointer is used and not the file name. In multiuser systems, many users can open the same file at the same time. In UNIX, a per-process user file descriptor table and system-wide file table are used. The per-process user file descriptor table is maintained for each process. The file table is used by all the processes. For each file opened by a process, an entry is created in the user file descriptor table. For each file open in the system, there is an entry in the file table.

 

Figure 31.1 shows the tables maintained by the operating system to assist in file operations. There are two processes A and B. Processes A and B have their respective user file descriptor (UFD) tables. The file table is a system-wide table common to all processes in the system. In UNIX, each file has a data structure called an inode. The inode has the details about the attributes of the file.

Fig. 31.1 Operating system tables used for file operations

 

Suppose process A opens a file (Figure 31.2). An entry is made in the user file descriptor table of process A. This points to an entry created in the file table. The file table entry points to an entry in the inode table corresponding to the file opened. The inode also has details about the location of the file in the disk. That is, the inode has the addresses of the disk blocks where the contents of the file are kept in the disk.

Fig.31.2 Process A opens a file

 

Suppose process B also opens the same file. Since process B opens a file, an entry is created in the user file descriptor table of process B (Figure 31.3). A new entry is created in the file table corresponding to this open. The entry in the user file descriptor table of process B points to this file table entry. The file table entry points to the same entry in the inode table. The inode table has only one entry for a particular file. Since process A and process B have opened the same file, both the processes use the same inode table entry.

 

A count called the open count is maintained in the inode table. This count refers to the number of processes that have opened the file. In Figure 31.4, the count in the inode table is 2 because  two  processes  have  opened  the  file  corresponding  to  that  inode.  The  file  table maintains details about the mode in which the file was opened (say, read-only or read-write or

Fig.31.3 Process B opens the same file as that opened by process A

 

write-only). The file table entry also has a read/write pointer, that is, the file pointer that locates the position from where the next read/write should be done. We see that each process has its own read/write pointer.

Fig.31.4 Entries in file table and inode table when two processes open the same file

 

Suppose process B closes the file (Figure 31.5). Each close decrements the count in the inode table. We see that the count in the inode table is decremented from 2 to 1. The file table entry is removed. The user file descriptor table entry is also removed.

 

When process A closes the file (Figure 31.6), the count in the inode table is decremented again. Since the count becomes zero, the inode table entry is removed. The file table entry and the user file descriptor table entry are also removed.

Fig.31. 5 Process B closes the file 

Fig.31.6 Process A also closes the file

 

31.6  File Types – Name, Extension

 

Different file types are supported by operating systems. Each file type can have different extensions. Figure 31.7 shows examples of different file types and different extensions for each file type. For example a source code file can have the extensions .c or .cc or .java and so on.

Fig.31.7 Examples of file types and extensions (Source: [1]) 

 

31.7  File Access Methods

 

Information stored in files must be accessed. There are different methods to access files. Some systems support only one method. Some systems support multiple methods and the right one is chosen based on the application. The different methods of file access are sequential access, direct access and indexed sequential access. We learn each of these methods in this section one after the other.

 

31.7.1  Sequential Access

 

The sequential access is the simplest and the most common file access method. In this method, the contents of a file are accessed one after the other. That is, information is processed one record after another. The possible operations can be

read next – read the next record

write next – write the next record

reset – reset the file pointer to the initial record

Fig.31.8 Sequential access (Source: [1])

 

Figure 31.8 shows the possible operations using sequential access method. The current file position is shown. From the current file position, it is possible to read the next record or write the next record or rewind back to the beginning of the file. Using the sequential access method, it is not possible to access any disk block in random.

 

31.7.2  Direct Access 

 

The direct access method is based on the disk model of the file. File contents are stored in disk blocks in the disk. Disks allow access to any random block. Direct access also allows arbitrary blocks to be read and written.

 

With direct access, the following operations are possible:

 

read n – read the nth block

write n – write to the nth block

 

position to n – move the pointer to the nth block

read next – read the next block after moving to the nth block

write next – write to the next block after moving to the nth block

rewrite n – rewrite the nth block

 

where n is the block number relative to beginning of file

 

it is possible to implement sequential access using the operations used for direct access. Figure 31.9 shows the sequential access operations and the corresponding operations in direct access method. Thus, it is possible to simulate sequential access on a direct file. But, direct access cannot be simulated on a sequential access file.

Fig.31.9 Simulation of Sequential Access on a Direct-access File (Source: [1])

 

31.7.3  Other Access Methods

 

In another access method, an index is built which contains pointers to various blocks of the file. To find a record, the index is searched, and the pointer is used to access the file and the desired record. Searching through an index is faster. When the file size becomes large, the index file also becomes large. Therefore, to make the access faster, another secondary index can be maintained for the primary index file. The primary index file will have pointers to secondary index files, and the secondary index files point to blocks of the file. Figure 31.10 shows an example of how an index file is used.

Fig.31.10 Example of Index and Relative Files (Source: [1])

31.8 Summary

 

A file is the basic unit of storage in the disk. In this module, we learnt different file attributes, file operations, file types and different file access methods.

 

 

References

  1. Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, Sixth Edition, John Wiley & Sons Inc., 2003.
  2. Andrew S. Tanenbaum, Herbert Bos, “Modern Operating Systems”, Fourth Edition, Pearson Education, 2014.
  3. Gary Nutt, “Operating Systems”, Third Edition, Pearson Education, 2009.
  4. Maurice J. Bach, “The Design of the UNIX Operating System”, Prentice hall, 1986.