Virtual Machines

Optional readings for this topic from Operating Systems: Principles and Practice: Section 10.2.

What is the abstraction provided by an OS to a process?

  • (Virtual) memory
  • A subset of the instruction set of the underlying machine
  • Most (but not all) of the hardware registers
  • A set of kernel calls with particular arguments for file I/O, etc.
  • Overall: a subset of the facilities of the underlying machine, augmented with extra mechanisms implemented by the operating system.

What if we implemented a different abstraction for a process, which looks exactly like the underlying hardware:

  • The complete instruction set of the underlying machine
  • Physical memory
  • Memory management unit (page maps, etc.)
  • I/O devices
  • Traps and interrupts
  • No predefined system calls

This abstraction is called a virtual machine:

  • To a "process", it appears that it has its own private machine.
  • Multiple "processes" can share a single machine, each thinking it's running on its own private machine.
  • The operating system for this is called a hypervisor.
  • Can run a complete operating system inside a virtual machine: called a guest operating system.
  • Each virtual machine can run a different guest operating system.

Implementing hypervisors

One approach: simulation

  • Write program that simulates instruction execution.
  • Simulate memory, I/O devices also.
  • Examples:
    • Use one large file to hold contents of a "disk"
    • Simulate kernel/user bit, interrupt vectors, etc.
  • Problem: too slow
    • 100x slowdown for CPU/memory
    • 2x slowdown for I/O

Better approach: use CPU to simulate itself.

  • Run guest OS in user mode.
  • Most instructions execute at the full speed of the CPU.
  • Anything "unusual" causes a trap into the hypervisor, which simulates the appropriate behavior.

Special cases:

  • Privileged instructions (e.g. HALT):
    • Since virtual machine runs in user mode, these cause "illegal instruction" traps into hypervisor.
    • Hypervisor catches these traps, simulates appropriate behavior.
  • Kernel calls in guest OS (both guest user and guest OS run in user mode):
    • User program running under guest OS issues kernel call instruction.
    • Traps always go to hypervisor (not guest OS).
    • Hypervisor analyzes trapping instruction, simulates system call to guest OS:
      • Move trap info from hypervisor stack to stack of guest OS
      • Find interrupt vector in memory of guest OS
      • Switch simulated mode to "kernel"
      • Return out of hypervisor to interrupt handler in guest OS.
    • When guest OS returns from system call, this traps to hypervisor also (illegal instruction in user mode); hypervisor simulates return to guest user level.
  • I/O devices:
    • Guest OS reads/writes virtual I/O device register
    • Hypervisor has arranged for the containing page to fault
    • Hypervisor takes page fault, recognizes address as I/O device register
    • Hypervisor simulates instruction and its impact on the simulated I/O device
    • When actual I/O operation completes, hypervisor simulates interrupt into the guest OS
    • For better performance, write new device drivers that call directly into the hypervisor (using system calls): paravirtualization.
  • Virtual memory: hypervisor uses page maps to simulate virtual memory mapping in guest OS.
    • Three levels of memory:
      • Guest virtual address space
      • Guest physical address space
      • Machine physical memory: hypervisor must have total control over this
    • Today's solution: extended page maps:
      • Another layer of address translation.
      • Translates from physical addresses (guest-specific) to machine addresses (real memory)
      • Hypervisor controls all of the extended page maps, while guest OS controls normal page maps.
      • Much simpler and more efficient than shadow page maps.
    • Original solution: shadow page maps
      • Guest OS creates page maps, but these aren't used by actual hardware.
      • Hypervisor manages the real page maps; these are called shadow page maps.
      • Hypervisor traps instruction to set the page map base, records info about the guest OS page maps.
      • On page faults, hypervisor updates shadow page maps using info from guest OS pages tables and its knowledge of physical memory.
      • When guest OS modifies its page maps, guest OS must trap the updates and reflect the changes in the shadow page maps.
      • Two kinds of page faults:
        • Page not in guest physical memory: hypervisor must pass through to guest OS
        • Page in guest physical memory, but not in machine physical memory: hypervisor just updates shadow page map (fault invisible to guest OS)
      • Quite tricky, and potentially slow.

Potential problem:

  • Hypervisor must trap any behavior that requires simulation.
    • Special memory locations (e.g. page maps)? Use page faults.
    • Special instructions? Must trap
  • Pathological case:
    • Instruction that is valid in both user mode and kernel mode
    • But, behaves differently in user mode
    • Example: "read processor status" (where kernel/user mode bit is in the status word)
  • Virtualizable: a machine with no such special cases
  • Until recently, very few machines were completely virtualizable (e.g. x86 wasn't until recently)

Dynamic binary translation: solution for older machines that are not virtualizable:

  • Hypervisor analyzes all code executed in virtual machine
  • Replaces non-virtualizable instructions with traps
  • Very tricky: how to find all code?
  • Can use this to run hypervisor as a user-level program

In practice, how much overhead do hypervisors add?

  • CPU-bound applications: < 5%
  • I/O-bound applications: ~30%

History/usage of virtual machines

Invented by IBM in late 1960's

Original usage:

  • One VM per user
  • Each user ran a different single-user guest OS
  • Single shared hardware platform

Interest waned in the 1980's and 1990's:

  • Each user had a private machine

Reinvented, made practical by Mendel Rosenblum and graduate students at Stanford, formed VMware.

Software development:

  • Need to test software on different OS versions:
  • Keep one VM for each OS version.
  • Use a single machine to test all versions.

Datacenters:

  • Problem: many machines, each running a single application
    • Need separate machines for isolation: application crash could bring down the entire machine
    • Most applications only need a fraction of machine's resources.
  • Solution: datacenter consolidation
    • One VM per application
    • Run several VM's on a single machine
    • Reduce # of machines

Encapsulation, restart:

  • Hypervisor can encapsulate entire state of a VM in a file.
  • Can save, continue, restore old state.
  • Datacenter example:
    • Can migrate VM's between machines to balance load
  • Software development:
    • Tests may corrupt the state of the machine
    • Solution:
      • Run tests in a VM
      • Always start tests from a saved VM configuration
      • Discard VM state after tests
      • Results: reproducible tests

Heavily used in cloud computing (e.g. Amazon Web Services, Google Cloud).