Virtual Machines

Lecture Notes for CS 140
Spring 2019
John Ousterhout

  • Readings for this topic from Operating Systems: Principles and Practice: Section 10.2.
  • What is the abstraction provided by an OS to a process?
    • (Virtual) memory
    • A subset of the instruction set of the underlying machine
    • Most (but not all) of the hardware registers
    • A set of kernel calls with particular arguments for file I/O, etc.
    • Overall: a subset of the facilities of the underlying machine, augmented with extra mechanisms implemented by the operating system.
  • What if we implemented a different abstraction for a process, which looks exactly like the underlying hardware:
    • The complete instruction set of the underlying machine
    • Physical memory
    • Memory management unit (page maps, etc.)
    • I/O devices
    • Traps and interrupts
    • No predefined system calls
  • This abstraction is called a virtual machine:
    • To a "process", it appears that it has its own private machine.
    • Multiple "processes" can share a single machine, each thinking it's running on its own private machine.
    • The operating system for this is called a virtual machine monitor.
    • Can run a complete operating system inside a virtual machine: called a guest operating system.
    • Each virtual machine can run a different guest operating system.

Implementing virtual machine monitors

  • One approach: simulation
    • Write program that simulates instruction execution, like Bochs.
    • Simulate memory, I/O devices also.
    • Examples:
      • Use one large file to hold contents of a "disk"
      • Simulate kernel/user bit, interrupt vectors, etc.
    • Problem: too slow
      • 100x slowdown for CPU/memory
      • 2x slowdown for I/O
  • Better approach: use CPU to simulate itself.
    • Run virtual machine guest OS like a user process (in unprivileged mode).
    • Most instructions execute at the full speed of the CPU.
    • Anything "unusual" causes a trap into the virtual machine monitor, which simulates the appropriate behavior.
  • Special cases:
    • Privileged instructions (e.g. HALT):
      • Since virtual machine runs in user mode, these cause "illegal instruction" traps into hypervisor.
      • Hypervisor catches these traps, simulates appropriate behavior.
    • Kernel calls in guest OS:
      • User program running under guest OS issues kernel call instruction.
      • Traps always go to hypervisor (not guest OS).
      • Hypervisor analyzes trapping instruction, simulates system call to guest OS:
        • Move trap info from hypervisor stack to stack of guest OS
        • Find interrupt vector in memory of guest OS
        • Switch simulated mode to "kernel"
        • Return out of hypervisor to interrupt handler in guest OS.
      • When guest OS returns from system call, this traps to hypervisor also (illegal instruction in user mode); hypervisor simulates return to guest user level.
    • I/O devices:
      • Guest OS writes to I/O device register
      • Hypervisor has arranged for the containing page to fault
      • Hypervisor takes page fault, recognizes address as I/O device register
      • Hypervisor simulates instruction and its impact on the simulated I/O device
      • When actual I/O operation completes, hypervisor simulates interrupt into the guest OS
      • For better performance, write new device drivers that call directly into the hypervisor (using system calls): paravirtualization.
    • Virtual memory: hypervisor uses page maps to simulate virtual memory mapping in guest OS.
      • Three levels of memory:
        • Guest virtual address space
        • Guest physical address space
        • Machine physical memory: hypervisor must have total control over this
    • Software development:
      • Tests may corrupt the state of the machine
      • Solution:
        • Run tests in a VM
        • Always start tests from a saved VM configuration
        • Discard VM state after tests
        • Results: reproducible tests
  • Security: can monitor all communication into and out of VM.
  • Heavily used in cloud computing (e.g. Amazon Web Services, Google Cloud).