File Systems

Lecture Notes for CS 140
Spring 2019
John Ousterhout

Readings for this topic from Operating Systems: Principles and Practice: Chapter 11, Section 13.3 (up through page 567).

Problems addressed by modern file systems:
- Disk Management:
  - Fast access to files (minimize seeks)
  - Sharing space between users
  - Efficient use of disk space
- Naming: how do users select files?
- Reliability: information must survive OS crashes and hardware failures.
- Protection: isolation between users, controlled sharing.
File: a named collection of bytes stored on durable storage such as disk.
File access patterns:
- Sequential: information is processed in order, one byte after another.
- Random Access: can address any byte in the file directly without passing through its predecessors. E.g. the data set for demand paging, also databases.
- Keyed (or indexed): search for blocks with particular contents, e.g. hash table, associative database, dictionary. Usually provided by databases, not operating system.
Issues to consider:
- Most files are small (a few kilobytes or less), so per-file overheads must be low.
- Most of the disk space is in large files.
- Many of the I/O operations are for large files, so performance must be good for large files.
- Files may grow unpredictably over time.

Inodes

Operating system data structure with information about a particular file
- Stored on disk along with file data.
- Kept in memory when file is open.
Info in inode:
- File size
- Sectors occupied by file
- Access times (last read, last write)
- Protection information (owner id, group id, etc.)
How should disk sectors be used to represent the bytes of a file?
Contiguous allocation (also called "extent-based"):
- Allocate files like segmented memory (contiguous run of sectors).
- Inode contains number of first sector, file length in sectors.
- User must specify length when creating a file.
- Keep a free list of unused areas of the disk.
- Advantages:
  - Simple
  - Easy access, both sequential and random
  - Few seeks for I/O
- Drawbacks:
  - Fragmentation will make it hard to use disk space efficiently; large files may be impossible
  - Must predict needs at file creation time
  - Can't extend files
Example: IBM OS/360.
Linked files:
- Divide disk into fixed-sized blocks (4096 bytes?)
- Keep a linked list of all free blocks.
- In inode, just keep pointer to first block.
- Each block of file contains pointer to next block.
- Advantages?
- Drawbacks?
Examples (more or less): TOPS-10, Xerox Alto.
Windows FAT:
- Like linked allocation, except don't keep the links in the blocks themselves.
- Keep the links for all files in a single table called the File Allocation Table
  - Table is memory resident during normal operation
  - Each FAT entry is disk sector number of next block in file
  - Special values for "last block in file", "free block"
  - Inode stores number of first block in file, size
- Originally, each FAT entry was 16 bits.
- FAT32 supports larger disks:
  - Each entry has 28 bits of sector number
  - Disk addresses refer to clusters: groups of adjacent sectors.
  - Cluster sizes 2 - 32 KBytes; fixed for any particular disk partition.
- Advantages?
- Disadvantages?
Multi-level indexes (4.3 BSD Unix):
- Files divided into blocks of 4 Kbytes.
- Blocks of each file managed with multi-level arrays of block pointers.
- Inode = 14 block pointers, initially 0 ("no block").
- First 12 point to data blocks (direct blocks).
- Next entry points to an indirect block (contains 1024 4-byte block pointers).
- Last entry points to a doubly-indirect block.
- Maximum file length is fixed, but large.
- Indirect blocks aren't allocated until needed.
- Advantages?

Block Cache

Use part of main memory to retain recently-accessed disk blocks.
LRU replacement.
Blocks that are referenced frequently (e.g., indirect blocks for large files) are usually in the cache.
This solves the problem of slow access to large files.
Originally, block caches were fixed size.
As memories have gotten larger, so have block caches.
Many systems now unify the block cache and the VM page pool: any page can be used for either, based on LRU access.
What happens when a block in the cache is modified?
- Synchronous writes: immediately write through to disk.
  - Safe: data won't be lost if the machine crashes
  - Slow: process can't continue until disk I/O completes
  - May be unnecessary:
    - Many small writes to the same block

Free Space Management

Managing disk free space: early Unix systems just used a linked list of free blocks.
- Each block holds many pointers to free blocks, plus a pointer to the next block of pointers.
- At the beginning, free list is sorted, so blocks in a file are allocated contiguously.
- Free list quickly becomes scrambled, so files are spread all over disk.
4.3 BSD approach to free space: bit map:
- Keep an array of bits, one per block.
- 1 means block is free, 0 means block in use
- During allocation, search bit map for a block that's close to the previous block of the file.
- If disk isn't full, this usually works pretty well.
- If disk is nearly full this becomes very expensive and doesn't produce much locality.
- Solution: don't let the disk fill up!
  - Pretend disk has 10% less capacity than it really has
  - If disk is 90% full, tell users it's full and don't allow any more data to be written.

Block Sizes

Many early file systems (e.g. Unix) used a block size of 512 bytes (the size of a sector for many years).
- Inefficient I/O: more distinct transfers, hence more seeks.
- Bulkier inodes: only 128 pointers in an indirect block (pointers will occupy 1% of disk space).
Increase block size (e.g. 4 KB)?
4.3BSD solution: multiple block sizes
- Large blocks are 4 KBytes; most blocks are large
- Fragments are multiples of 512 bytes, fitting within a single large block
- The last block in a file can be a fragment.
- One large block can hold fragments from multiple files.
- Bit map for free blocks is based on fragments.

Disk Scheduling

If there are several disk I/O's waiting to be executed, what is the best order in which to execute them?
- Goal is to minimize seek time.
First come first served (FCFS, FIFO): simple, but does nothing to optimize seeks.
Shortest seek time first (SSTF):
- Choose next request that is as close as possible to the previous one.
- Good for minimizing seeks, but can result in starvation for some requests.
Scan ("elevator algorithm").
- Same as SSTF except heads keep moving in one direction across disk.
- Once the edge of the disk has been reached, seek to the farthest block away and start again.

CS 140: Operating Systems (Spring 2019)

File Systems

Inodes

Block Cache

Free Space Management

Block Sizes

Disk Scheduling