32 File Allocation Methods
Mary Anitha Rajam
35.1 Introduction
File system is the most visible part of the operating system. Files are stored in disks. Whenever files are created or appended, space for the files has to be allocated in disks. For this, the operating system should keep track of which disk blocks are free and which disk blocks are allocated. In this module, we learn how disk blocks are allocated to files (file allocation methods) and how the operating system keeps track of free and allocated disk blocks (free space management).
35.2 Allocation Methods
An allocation method refers to how disk blocks are allocated for a file. The size of disk bocks is fixed. Whenever a new file is created or when the size of a file increases because of writing to the file, disk blocks are allocated for the file. Disk space should be allocated to files such that the disk space is utilized effectively and files can be accessed quickly. There are three different methods of allocation: Contiguous allocation, Linked allocation and Indexed allocation. In this section we look at how each of these methods is used for allocation, their advantages and disadvantages.
35.2.1 Contiguous Allocation
In contiguous allocation, each file occupies a set of contiguous blocks on the disk. If only one job is accessing the disk, accessing block b+1 after block b does not require any head movement. It is enough to move the next sector under the current head position. Therefore, disk seeks are minimal. This method is simple to implement. It is enough to remember only starting location (starting disk block number) and the length of the file (number of blocks in the file).
Fig. 35.1 Contiguous allocation (Source: [1])
To know the location of a file in the disk, it is necessary to look at the entry for the file in the directory. The directory entry has the name of the file and the length of the file. Figure 35.1 shows an example of how files are allocated contiguously. In the example shown, there is a file named count which starts from disk block number 0 and the length of the file is 2 blocks. That is, disk block number 0 and disk block number 1 are allocated for the file.
Similarly, the file named mail starts from disk block number 19 and the length is 6 disk blocks. Therefore, disk blocks 19 to 24 are allocated for the file. Both sequential access and direct access are possible when files are contiguously allocated. In the example shown in Figure 35.1, consider the file mail. To access the file sequentially, the starting disk block number (19) is taken from the directory entry and the subsequent disk blocks are accessed one after the other. For direct access, the first disk block number is taken from the directory entry and the nth disk block is calculated by adding n to the first disk block number. For example, it is easy to calculate that the third disk block of the file is 21.
In contiguous allocation, the dyamic storage-allocation problem has to be sorted out. Suppose there is a list of free holes and it is necessary to satisfy a request of size n from this list of free holes. It is possible to use first fit or best fit allocation strategy. In first fit allocation, the first hole that is large enough to accommodate n blocks is chosen for allocation. In best fit strategy, the smallest hole that is large enough to accommodate n blocks is chosen for allocation.
Another issue with contiguous allocation is external fragmentation. As files are created and deleted, in due course of time, free disk blocks may not be present contiguously. But there may be free disk blocks here and there. There may be n free disk blocks in the disk, but the n free blocks may not be present contiguously. Now, when a file of size n is to be brought in, it cannot be accommodated. To overcome this external fragmentation, compaction is needed.
The next issue is that it is difficult to determine how much space is needed for a file. Initially, when a file is created, the file may occupy only a few bytes. But, it may grow over time. It is not possible to know beforehand the size to which the file will grow. One way to overcome this is to overestimate the space needed and allocate a larger number of blocks for the file. But, this results in wastage of space. Some of the allocated blocks may not be used by the file. The second way is to find a larger hole, copy the file contents to the new space and free the old space whenever the file grows. Even if the file size in known in advance, the file may grow over a long period of time. Till then, the extra space allocated remains free and unused. This results in internal fragmentation.
To overcome internal fragmentation, a modified contiguous allocation is used. Initially, a contiguous chunk of space is allocated for the file. If this is not large enough, another chunk of space (extent) is allocated. The Veritas file system uses this strategy.
35.2.2 Linked Allocation
The linked allocation solves all problems of contiguous allocation. In linked allocation, each file is a linked list of disk blocks. The allocated disk blocks may be scattered anywhere on the disk. The directory entry has a pointer to the first and the last blocks of the file. Figure 35.2 shows an example of linked allocation. There is a file named jeep. The starting disk block number 9 is given in the directory entry. Disk block 9 has the address of the next disk block, 16. Disk block 16 points to disk block 1 and so on. The last disk block of the file, disk block 25 has a pointer value of -1 indicating that it is the last disk block of the file.
Fig. 35.2 Linked allocation (Source: [1])
This allocation method does not suffer from external fragmentation. The size of a file need not be declared when the file is created. The file can continue to grow as long as free blocks are available. Since there is no external fragmentation, compaction is not needed.
The linked allocation also has some limitations. There is no possibility for random access because it is necessary to follow the pointers from one disk block to another to access the disk blocks. Only sequential-access is allowed.
The second limitation is the space required for pointers. Each disk block stores a pointer and this occupies space. To avoid this, blocks are collected into groups called clusters (say, 4 blocks). Rather than having a pointer from each disk block to the next block, there is a pointer from one cluster (group of blocks) to another cluster. This results in having less number of pointers. This also results in fewer disk-head seeks, because, within a cluster there is no need for the movement of the head. But this may lead to increased internal fragmentation. If a cluster is allocated for a file and if all the blocks in the cluster are not used, then that space is wasted and results in internal fragmentation.
The third limitation is reliability. Many pointers are in use. If the pointers are lost or damaged, it results in loss of information. To improve reliability, a doubly linked list can be used rather than using a singly linked list so that even if one of the pointers between two blocks is lost, the other will still be available. Another improvement is to store the file name and the relative block number in each block. The pointers are still there, but after moving to a disk block using a pointer, the file name and the relative block number can be checked to confirm if the block really belongs to the file.
A variation of linked allocation scheme is used in the File-allocation table (FAT). This is the disk-space allocation scheme used by MS-DOS. A section of the disk at the beginning of each volume contains this table (FAT). Figure 35.3 shows an example file allocation table. The directory entry has the address of the first disk block number (217). This 217 is used as an index into the FAT to get the entry where 618 is stored. This means that the next disk block number is 618. While using 618 as an index into the FAT, 339 is got, which means that, 339 is the next disk block. When 339 is used as index, the entry shows that end of file is reached.
Fig. 35.3 File-Allocation Table (Source: [1])
35.2.3 Indexed Allocation
The indexed allocation solves the external fragmentation and size-declaration problem of contiguous allocation. It also solves the direct access problem in linked allocation.
In indexed allocation all pointers are brought together into one block called the index block. Each file has an index block, which is an array of disk-block addresses. The ith entry in the index block points to the ith block of the file.
Fig. 35.4 Example of Indexed Allocation (Source: [1])
Figure 35.4 shows an example of indexed allocation. The directory entry has the address of the index block (disk block 19). The index block has the addresses of the disk blocks (9, 16, 1, 10 and 25) of the file.
Indexed allocation supports sequential as well as direct access. For direct access, to access the ith block, the ith entry in the index block is taken. There is no external fragmentation because any disk block from any position can be used.
The limitation of indexed allocation is wastage of space. When there is a file occupying only two blocks, in linked allocation, space for only two pointers is wasted, in indexed allocation, one whole block is wasted.
Another issue to be considered is the size of the index block. How large the index block should be? The index block should be as small as possible. But, if the index block is too small, the index block may not be able to hold enough pointers for a large file. Consider a file of maximum size of 256K (218) words and block size of 512 (29) words. The file occupies 29 disk blocks (218 / 29). To accommodate the addresses of 29 disk blocks, one index block is needed (because, here, one disk block size is 29 words).
There are different variants of index blocks.
– Linked scheme – In this scheme, the index blocks are linked together. An index block will contain the addresses of disk blocks of the file as well as the address of the next index block. Therefore, there is no limit on the size of the index block and the size of the file.
– Two-level index – In this scheme, each entry in the first-level index block points to a second-level index block and each entry in the second-level index block points to disk blocks having the contents of the file. Figure 35.6 shows diagrammatically this variant of index block. The directory entry points to an outer index block. Each entry in the outer index block points to an inner index block. Each entry in the inner index block points to a disk block with file contents.
If the size of a disk block is 512 bytes, then the maximum file size possible with this scheme is 5123.
Fig. 35.6 Indexed Allocation – Two-Level Index
Figure 35.7 shows the scheme used in UNIX systems. In this scheme, the inode has the index of disk block addresses. There are 10 entries in the inode table which are the addresses of the disk blocks of the file. The 11th entry points to an index block, which in turn has the addresses of disk blocks of the file (single level indirection). The 12th entry uses two-level indirection and the 13th entry uses three-level indirection. When the size of the file is small, only the direct disk blocks are used. Since the size of each disk block is 4K bytes, a file of size upto 40K can be addressed using these direct disk blocks. When the size of the file is more than 40K, the next levels are used.
Fig. 35.7 Combined Scheme: UNIX (4K bytes per block) (Source: [1])
35.3 Free Space Management
In this section, we learn how to keep track of free disk space. It is necessary to reuse the space got from deleted files for the newly created files. A free-space list is maintained. The free-space list records all the free disk blocks in the system. To create a new file, the operating system searches for space in the free list. If free space is available, the OS removes the space from the free-space list. If file is deleted, the space used by the file is added to the free-space list. The following subsections discuss the different ways in which free space is managed.
35.3.1 Bit vector
In this scheme, a bit map or a bit vector (n bits for n blocks) is maintained (Figure 35.8). Initially, all the bits are set to zero. When a disk block is allocated, the corresponding bit is set to one. For example, if the ith disk block is allocated, the ith bit is set to one.
Fig. 35.8 Bit vector implementation
Suppose the disk blocks 2, 5, 8, 22 are free. Then the bit vector will look like 1101101101111111111111011…… That is, disk block 0 is occupied and hence the first bit is 1. Disk block 1 is occupied and hence, the second bit is 1. Disk block 2 is free and hence the 3rd bit is 0 and so on. This scheme is simple to implement and efficient in finding the first free block. It is also easy to find m consecutive free blocks. One can sequentially check each word in the bit map to see whether the value is 0 or not. It is also possible to calculate the block number using the formula (Number of bits per word) x (number of 0-value words) + offset of first 1 bit.
The bit map requires extra space and is kept in main memory. For example, if the disk block size = 212 bytes, size of the disk = 230 bytes (1 gigabyte), the number of disk blocks in the disk = 230/212 = 218. Therefore, the number of bits in the bit map is 218 (or 32K bytes). A 1 TB disk with 4 KB block size needs 256 MB to store its bit map.
35.3.2 Linked List
In this scheme, a linked list of all the free blocks is maintained. The free-space list head keeps a pointer to the first free block on the disk and caches it in memory. The first free block contains a pointer to the next free block and so on. Figure 35.9 shows how free space is maintained in the disk using a linked list.
Fig. 35.9 Linked Free Space List on Disk (Source: [1])
The limitation of this scheme is that it is difficult to get contiguous space easily. But there is no wastage of space because any disk block in the disk can be used. Traversing the list takes time. But, usually, the first free block is allocated and the free-space head is made to move to the next disk block. The File Allocation Table (FAT) incorporates free-block accounting into the file allocation data structure.
35.3.3 Grouping
In this scheme, the addresses of n free blocks are stored in the first free block. The nth free block contains the addresses of the next n free blocks and so on. Using this scheme, addresses of large number of free blocks can be found quickly by just looking into one disk block.
35.3.4 Counting
In this scheme, the address of the first free block and the number n of free contiguous blocks that follow the first block are kept. That is, each entry has a disk address and a count.
This makes the overall list is shorter when compared to the grouping scheme.
35.4 Summary
In this module, we learnt different file allocation methods such as contiguous allocation, linked allocation and indexed allocation. We also learnt different schemes used for managing free space in disks such as bit vector, linked list, grouping and counting.
References
[1] Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, Ninth Edition, John Wiley & Sons Inc., 2012.
[2] Andrew S. Tanenbaum, Herbert Bos, “Modern Operating Systems”, Fourth Edition, Pearson Education, 2014.
[3] Gary Nutt, “Operating Systems”, Third Edition, Pearson Education, 2009.