File Systems

Lecture Notes for CS 140
Winter 2012
John Ousterhout

Readings for this topic from Operating System Concepts: Sections 10.1-10.2, Sections 11.1-11.2, Sections 11.4-11.6, Section 12.4.
Problems addressed by modern file systems:
- Disk Management:
  - Fast access to files (minimize seeks)
  - Sharing space between users
  - Efficient use of disk space
- Naming: how do users select files?
- Protection: isolation between users, controlled sharing.
- Reliability: information must last safely for long periods of time.
File: a named collection of bytes stored on durable storage such as disk.
File access patterns:
- Sequential: information is processed in order, one byte after another.
- Random Access: can address any byte in the file directly without passing through its predecessors. E.g. the data set for demand paging, also databases.
- Keyed: search for blocks with particular values, e.g. hash table, associative database, dictionary. Usually provided by databases, not operating system.

File Descriptors

How should disk sectors be used to represent the bytes of a file?
File descriptor: Data structure that describes a file; stored on disk along with file data. Info in file descriptor:
- Sectors occupied by file
- File size
- Access times (last read, last write)
- Protection information (owner id, group id, etc.)
Issues to consider:
- Most files are small (a few kilobytes or less).
- Most of the disk space is in large files.
- Many of the I/O operations are for large files.
Thus, per-file cost must be low but large files must have good performance.
Contiguous allocation (also called "extent-based"): allocate files like segmented memory. Keep a free list of unused areas of the disk. When creating a file, make the user specify its length, allocate all the space at once. Descriptor contains location and size.
- Advantages:
  - Easy access, both sequential and random
  - Simple
  - Few seeks
- Drawbacks:
  - Fragmentation will make it hard to use disk space efficiently; large files may be impossible
  - Hard to predict needs at file creation time
Example: IBM OS/360.
Linked files: keep a linked list of all free blocks. In file descriptor, just keep pointer to first block. Each block of file contains pointer to next block.
- Advantages?
- Drawbacks? Examples (more or less): TOPS-10, Xerox Alto.
Windows FAT:
- Like linked allocation, except don't keep the links in the blocks themselves.
- Keep the links for all files in a single table called the File Allocation Table
  - Each FAT entry is disk sector number of next block in file
  - Special values for "last block in file", "free block"
- Originally, each FAT entry was 16 bits.
- FAT32 supports larger disks:
  - Each entry has 28 bits of sector number
  - Disk addresses refer to clusters: groups of adjacent sectors.
  - Cluster sizes 2 - 32 KBytes; fixed for any particular disk partition.
Indexed files: keep an array of block pointers for each file.
- Maximum length must be declared for file when it is created.
- Allocate array to hold pointers to all the blocks, but don't allocate the blocks.
- Fill in the pointers dynamically as file is written.
- Advantages?
- Drawbacks?
Multi-level indexes (4.3 BSD Unix):
- File descriptor = 14 block pointers, initially 0 ("no block").
- First 12 point to data blocks.
- Next entry points to an indirect block (contains 1024 4-byte block pointers).
- Last entry points to a doubly-indirect block.
- Maximum file length is fixed, but large.
- Indirect blocks aren't allocated until needed.
- Advantages?

Buffer Cache

Use part of main memory to retain recently-accessed disk blocks.
Blocks that are referenced frequently (e.g., indirect blocks for large files) are usually in the cache.
This solves the problem of slow access to large files.
Originally, buffer caches were fixed size.
As memories have gotten larger, so have buffer caches.
Many systems now unify the buffer cache and the VM page pool: any page can be used for either, based on LRU access.
What happens when a block in the cache is modified?
- Synchronous writes: immediately write through to disk.
  - Safe: data won't be lost if the machine crashes
  - Slow: process can't continue until disk I/O completes
  - May be unnecessary:
    - Many small writes to the same block
    - File deleted soon (e.g., temporary files)
- Delayed writes: don't immediately write to disk:
  - Wait a while (30 seconds?) in case there are more writes to a block or the block is deleted
  - Fast: writes return immediately
  - Dangerous: may lose data after a system crash

Free Space Management

Managing disk free space: many early systems just used a linked list of free blocks.
- At the beginning, free list is sorted, so blocks in a file are allocated contiguously.
- Free list quickly becomes scrambled, so files are spread all over disk.
4.3 BSD approach to free space: bit map:
- Keep an array of bits, one per block.
- 1 means block is free, 0 means block in use
- During allocation, search bit map for a block that's close to the previous block of the file.
- If disk isn't full, this usually works pretty well.
- If disk is nearly full this becomes very expensive and doesn't produce much locality.
- Solution: don't let the disk fill up!
  - Pretend disk has 10% less capacity then it really has
  - If disk is 90% full, tell user it's full and don't allow any more data to be written.

Block Sizes

Many early file systems (e.g. Unix) used a block size of 512 bytes (one sector).
- Inefficient I/O: more distinct transfers, hence more seeks.
- Bulkier file descriptors: only 128 pointers in an indirect block.
Increase block size (e.g. 2KB clusters in FAT32)?
4.3BSD solution: multiple block sizes
- Large blocks are 4 KBytes; most blocks are large
- Fragments are multiples of 512 bytes, fitting within a single large block
- The last block in a file can be a fragment.
- Bit map for free blocks is based on fragments.
- One large block can hold fragments from multiple files.

Disk Scheduling

If there are several disk I/O's waiting to be executed, what is the best order in which to execute them?
- Goal is to minimize seek time.
First come first served (FCFS, FIFO): simple, but does nothing to optimize seeks.
Shortest seek time first (SSTF):
- Choose next request that is as close as possible to the previous one.
- Good for minimizing seeks, but can result in starvation for some requests.
Scan ("elevator algorithm").
- Same as SSTF except heads keep moving in one direction across disk.
- Once the edge of the disk has been reached, seek to the farthest block away and start again.

CS 140: Operating Systems (Winter 2012)

File Systems

File Descriptors

Buffer Cache

Free Space Management

Block Sizes

Disk Scheduling