Directories and Links

Lecture Notes for CS 140
Winter 2012
John Ousterhout

  • Readings for this topic from Operating System Concepts: Section 11.3.
  • Naming: how do users refer to their files? How does OS find file, given name?
  • First step: file descriptor has to be stored on disk, so it will persist across system reboots.
  • Early UNIX versions: all descriptors stored in a fixed- size array on disk.
  • Originally entire descriptor array was at the outer edge of the disk. Result: long seeks between descriptors and file data.
  • Later improvements:
    • Place descriptor array mid-way across disk.
    • Many small descriptor arrays spread across disk, so descriptors can be near to file data.
  • Space for descriptors is fixed when the disk is initialized, and can't be changed.
  • Unix/Linux terms:
    • File descriptor is called an i-node
    • Index of i-node in the descriptor array: i-number. Internally the OS uses the i-number as an identifier for the file.
  • When a file is open, its descriptor is kept in main memory. When the file is closed, the descriptor is stored back to disk.
  • File naming: users want to use text names to refer to files. Special disk structures called directories are used to map names to descriptor indexes.
  • Early approaches to directory management:
    • A single directory for the entire disk:
      • If one user uses a particular name, no-one else can.
      • Many early personal computers worked this way.
    • A single directory for each user (e.g. TOPS-10):
      • Avoids problems between users, but still makes it hard to organize information.
  • Modern systems support hierarchical directory structures. Unix/Linux approach:
    • Directories are stored on disk just like regular files (i.e. file descriptor with 14 pointers, etc.) except file descriptor has special flag bit set to indicate that it's a directory.
    • On some systems user programs can read directories just like regular files.
    • Only the operating system can write directories.
    • Each directory contains <name, i-number> pairs in no particular order.
    • The file pointed to by the i-number may be another directory. Hence, get hierarchical tree structure. Names have slashes separating the levels of the tree.
    • There is one special directory, called the root. This directory has no name; it has i-number 2 (i-numbers 0 and 1 have other special purposes).

Working directories

  • Cumbersome constantly to have to specify the full path name for all files.
  • Have OS remember one distinguished directory per process, called the working directory.
  • If a file name doesn't start with "/" then it is looked up starting in the working directory.
  • Names starting with "/" are looked up starting in the root directory.

Links

  • Hard links:
    • It is possible for more than one directory entry to refer to a single file.
    • UNIX uses reference counts in file descriptors to keep track of hard links referring to a file.
    • Files are deleted when the last directory entry goes away.
  • Symbolic links:
    • A file whose contents are another file name.
    • Stored on disk like regular files, but with special flag set in descriptor.
    • If a symbolic link is encountered during file lookup, switch to target named in symbolic link, continue lookup from there.