File Systems

Lecture Notes for CS 140
Winter 2012
John Ousterhout

  • Readings for this topic from Operating System Concepts: Sections 10.1-10.2, Sections 11.1-11.2, Sections 11.4-11.6, Section 12.4.
  • Problems addressed by modern file systems:
    • Disk Management:
      • Fast access to files (minimize seeks)
      • Sharing space between users
      • Efficient use of disk space
    • Naming: how do users select files?
    • Protection: isolation between users, controlled sharing.
    • Reliability: information must last safely for long periods of time.
  • File: a named collection of bytes stored on durable storage such as disk.
  • File access patterns:
    • Sequential: information is processed in order, one byte after another.
    • Random Access: can address any byte in the file directly without passing through its predecessors. E.g. the data set for demand paging, also databases.
    • Keyed: search for blocks with particular values, e.g. hash table, associative database, dictionary. Usually provided by databases, not operating system.

File Descriptors

  • How should disk sectors be used to represent the bytes of a file?
  • File descriptor: Data structure that describes a file; stored on disk along with file data. Info in file descriptor:
    • Sectors occupied by file
    • File size
    • Access times (last read, last write)
    • Protection information (owner id, group id, etc.)
  • Issues to consider:
    • Most files are small (a few kilobytes or less).
    • Most of the disk space is in large files.
    • Many of the I/O operations are for large files.
    Thus, per-file cost must be low but large files must have good performance.
  • Contiguous allocation (also called "extent-based"): allocate files like segmented memory. Keep a free list of unused areas of the disk. When creating a file, make the user specify its length, allocate all the space at once. Descriptor contains location and size.
    • Advantages:
      • Easy access, both sequential and random
      • Simple
      • Few seeks
    • Drawbacks:
      • Fragmentation will make it hard to use disk space efficiently; large files may be impossible
      • Hard to predict needs at file creation time
    Example: IBM OS/360.
  • Linked files: keep a linked list of all free blocks. In file descriptor, just keep pointer to first block. Each block of file contains pointer to next block.
    • Advantages?
    • Drawbacks? Examples (more or less): TOPS-10, Xerox Alto.
  • Windows FAT:
    • Like linked allocation, except don't keep the links in the blocks themselves.
    • Keep the links for all files in a single table called the File Allocation Table
      • Each FAT entry is disk sector number of next block in file
      • Special values for "last block in file", "free block"
    • Originally, each FAT entry was 16 bits.
    • FAT32 supports larger disks:
      • Each entry has 28 bits of sector number
      • Disk addresses refer to clusters: groups of adjacent sectors.
      • Cluster sizes 2 - 32 KBytes; fixed for any particular disk partition.
  • Indexed files: keep an array of block pointers for each file.
    • Maximum length must be declared for file when it is created.
    • Allocate array to hold pointers to all the blocks, but don't allocate the blocks.
    • Fill in the pointers dynamically as file is written.
    • Advantages?
    • Drawbacks?
  • Multi-level indexes (4.3 BSD Unix):
    • File descriptor = 14 block pointers, initially 0 ("no block").
    • First 12 point to data blocks.
    • Next entry points to an indirect block (contains 1024 4-byte block pointers).
    • Last entry points to a doubly-indirect block.
    • Maximum file length is fixed, but large.
    • Indirect blocks aren't allocated until needed.
    • Advantages?

Buffer Cache

  • Use part of main memory to retain recently-accessed disk blocks.
  • Blocks that are referenced frequently (e.g., indirect blocks for large files) are usually in the cache.
  • This solves the problem of slow access to large files.
  • Originally, buffer caches were fixed size.
  • As memories have gotten larger, so have buffer caches.
  • Many systems now unify the buffer cache and the VM page pool: any page can be used for either, based on LRU access.
  • What happens when a block in the cache is modified?
    • Synchronous writes: immediately write through to disk.
      • Safe: data won't be lost if the machine crashes
      • Slow: process can't continue until disk I/O completes
      • May be unnecessary:
        • Many small writes to the same block
        • File deleted soon (e.g., temporary files)
    • Delayed writes: don't immediately write to disk:
      • Wait a while (30 seconds?) in case there are more writes to a block or the block is deleted
      • Fast: writes return immediately
      • Dangerous: may lose data after a system crash

Free Space Management

  • Managing disk free space: many early systems just used a linked list of free blocks.
    • At the beginning, free list is sorted, so blocks in a file are allocated contiguously.
    • Free list quickly becomes scrambled, so files are spread all over disk.
  • 4.3 BSD approach to free space: bit map:
    • Keep an array of bits, one per block.
    • 1 means block is free, 0 means block in use
    • During allocation, search bit map for a block that's close to the previous block of the file.
    • If disk isn't full, this usually works pretty well.
    • If disk is nearly full this becomes very expensive and doesn't produce much locality.
    • Solution: don't let the disk fill up!
      • Pretend disk has 10% less capacity then it really has
      • If disk is 90% full, tell user it's full and don't allow any more data to be written.

Block Sizes

  • Many early file systems (e.g. Unix) used a block size of 512 bytes (one sector).
    • Inefficient I/O: more distinct transfers, hence more seeks.
    • Bulkier file descriptors: only 128 pointers in an indirect block.
  • Increase block size (e.g. 2KB clusters in FAT32)?
  • 4.3BSD solution: multiple block sizes
    • Large blocks are 4 KBytes; most blocks are large
    • Fragments are multiples of 512 bytes, fitting within a single large block
    • The last block in a file can be a fragment.
    • Bit map for free blocks is based on fragments.
    • One large block can hold fragments from multiple files.

Disk Scheduling

  • If there are several disk I/O's waiting to be executed, what is the best order in which to execute them?
    • Goal is to minimize seek time.
  • First come first served (FCFS, FIFO): simple, but does nothing to optimize seeks.
  • Shortest seek time first (SSTF):
    • Choose next request that is as close as possible to the previous one.
    • Good for minimizing seeks, but can result in starvation for some requests.
  • Scan ("elevator algorithm").
    • Same as SSTF except heads keep moving in one direction across disk.
    • Once the edge of the disk has been reached, seek to the farthest block away and start again.