File Systems

Lecture Notes for CS 140
Spring 2019
John Ousterhout

  • Readings for this topic from Operating Systems: Principles and Practice: Chapter 11, Section 13.3 (up through page 567).
  • Problems addressed by modern file systems:
    • Disk Management:
      • Fast access to files (minimize seeks)
      • Sharing space between users
      • Efficient use of disk space
    • Naming: how do users select files?
    • Reliability: information must survive OS crashes and hardware failures.
    • Protection: isolation between users, controlled sharing.
  • File: a named collection of bytes stored on durable storage such as disk.
  • File access patterns:
    • Sequential: information is processed in order, one byte after another.
    • Random Access: can address any byte in the file directly without passing through its predecessors. E.g. the data set for demand paging, also databases.
    • Keyed (or indexed): search for blocks with particular contents, e.g. hash table, associative database, dictionary. Usually provided by databases, not operating system.
  • Issues to consider:
    • Most files are small (a few kilobytes or less), so per-file overheads must be low.
    • Most of the disk space is in large files.
    • Many of the I/O operations are for large files, so performance must be good for large files.
    • Files may grow unpredictably over time.

Inodes

  • Operating system data structure with information about a particular file
    • Stored on disk along with file data.
    • Kept in memory when file is open.
  • Info in inode:
    • File size
    • Sectors occupied by file
    • Access times (last read, last write)
    • Protection information (owner id, group id, etc.)
  • How should disk sectors be used to represent the bytes of a file?
  • Contiguous allocation (also called "extent-based"):
    • Allocate files like segmented memory (contiguous run of sectors).
    • Inode contains number of first sector, file length in sectors.
    • User must specify length when creating a file.
    • Keep a free list of unused areas of the disk.
    • Advantages:
      • Simple
      • Easy access, both sequential and random
      • Few seeks for I/O
    • Drawbacks:
      • Fragmentation will make it hard to use disk space efficiently; large files may be impossible
      • Must predict needs at file creation time
      • Can't extend files
    Example: IBM OS/360.
  • Linked files:
    • Divide disk into fixed-sized blocks (4096 bytes?)
    • Keep a linked list of all free blocks.
    • In inode, just keep pointer to first block.
    • Each block of file contains pointer to next block.
    • Advantages?
    • Drawbacks?
    Examples (more or less): TOPS-10, Xerox Alto.
  • Windows FAT:
    • Like linked allocation, except don't keep the links in the blocks themselves.
    • Keep the links for all files in a single table called the File Allocation Table
      • Table is memory resident during normal operation
      • Each FAT entry is disk sector number of next block in file
      • Special values for "last block in file", "free block"
      • Inode stores number of first block in file, size
    • Originally, each FAT entry was 16 bits.
    • FAT32 supports larger disks:
      • Each entry has 28 bits of sector number
      • Disk addresses refer to clusters: groups of adjacent sectors.
      • Cluster sizes 2 - 32 KBytes; fixed for any particular disk partition.
    • Advantages?
    • Disadvantages?
  • Multi-level indexes (4.3 BSD Unix):
    • Files divided into blocks of 4 Kbytes.
    • Blocks of each file managed with multi-level arrays of block pointers.
    • Inode = 14 block pointers, initially 0 ("no block").
    • First 12 point to data blocks (direct blocks).
    • Next entry points to an indirect block (contains 1024 4-byte block pointers).
    • Last entry points to a doubly-indirect block.
    • Maximum file length is fixed, but large.
    • Indirect blocks aren't allocated until needed.
    • Advantages?

Block Cache

  • Use part of main memory to retain recently-accessed disk blocks.
  • LRU replacement.
  • Blocks that are referenced frequently (e.g., indirect blocks for large files) are usually in the cache.
  • This solves the problem of slow access to large files.
  • Originally, block caches were fixed size.
  • As memories have gotten larger, so have block caches.
  • Many systems now unify the block cache and the VM page pool: any page can be used for either, based on LRU access.
  • What happens when a block in the cache is modified?
    • Synchronous writes: immediately write through to disk.
      • Safe: data won't be lost if the machine crashes
      • Slow: process can't continue until disk I/O completes
      • May be unnecessary:
        • Many small writes to the same block

Free Space Management

  • Managing disk free space: early Unix systems just used a linked list of free blocks.
    • Each block holds many pointers to free blocks, plus a pointer to the next block of pointers.
    • At the beginning, free list is sorted, so blocks in a file are allocated contiguously.
    • Free list quickly becomes scrambled, so files are spread all over disk.
  • 4.3 BSD approach to free space: bit map:
    • Keep an array of bits, one per block.
    • 1 means block is free, 0 means block in use
    • During allocation, search bit map for a block that's close to the previous block of the file.
    • If disk isn't full, this usually works pretty well.
    • If disk is nearly full this becomes very expensive and doesn't produce much locality.
    • Solution: don't let the disk fill up!
      • Pretend disk has 10% less capacity than it really has
      • If disk is 90% full, tell users it's full and don't allow any more data to be written.

Block Sizes

  • Many early file systems (e.g. Unix) used a block size of 512 bytes (the size of a sector for many years).
    • Inefficient I/O: more distinct transfers, hence more seeks.
    • Bulkier inodes: only 128 pointers in an indirect block (pointers will occupy 1% of disk space).
  • Increase block size (e.g. 4 KB)?
  • 4.3BSD solution: multiple block sizes
    • Large blocks are 4 KBytes; most blocks are large
    • Fragments are multiples of 512 bytes, fitting within a single large block
    • The last block in a file can be a fragment.
    • One large block can hold fragments from multiple files.
    • Bit map for free blocks is based on fragments.

Disk Scheduling

  • If there are several disk I/O's waiting to be executed, what is the best order in which to execute them?
    • Goal is to minimize seek time.
  • First come first served (FCFS, FIFO): simple, but does nothing to optimize seeks.
  • Shortest seek time first (SSTF):
    • Choose next request that is as close as possible to the previous one.
    • Good for minimizing seeks, but can result in starvation for some requests.
  • Scan ("elevator algorithm").
    • Same as SSTF except heads keep moving in one direction across disk.
    • Once the edge of the disk has been reached, seek to the farthest block away and start again.