Large Files

What happens if the file is larger than a single cluster? There are three possible solutions:

  1. We could insist that files are stored contiguously. If we know the first cluster and the file’s size we shouldn’t have any problem finding the data. However, even if there is plenty of storage available, we can run in to problems if we want to store a file which is larger than any single area of free storage.  Also, we have no way of knowing which areas on the disk are free and which are allocated, unless we maintain a separate free space map, which occupies additional disk space and slows down the file system.
  2. We could use a few bytes in each cluster to hold a pointer to the next one. This is known as a linked list. It has many advantages over a contiguous allocation system, but it also has disadvantages. In particular, the I/O system has no idea where it will find the next cluster until it reads the previous one. This can degrade performance in systems which attempt to read ahead automatically, or buffer, data to speed up file access. This system also needs to maintain a free space map so we know which clusters are available for allocation to a new or enlarged file.
  3. We could use an index table which has as many entries as there are clusters on the disk. The start cluster number from a file’s directory points into the table. This entry contains the number of the next cluster allocated to that file and this is used to point into the table again and get the next cluster number until all the file’s clusters have been accounted for. This is the solution adopted by MS-DOS and Windows.

The table is known as the File Allocation Table (FAT). It can be used as a compact and fast look-up table and also acts as a free space map. Any cluster which has not been allocated has special value in its entry in the table, so we can find a free cluster by looking through the table from the start until we encounter this value.

However, if the index table is damaged or destroyed, we will lose all the information we need to determine which cluster belongs to which file. All that would be left is the first cluster number for each file from its directory entry. To combat this problem, MS-DOS and Windows keep two identical copies of the FAT on every disk, and Microsoft provides system utility software to check and repair the FAT.

Next: FAT File System