Virtual Machines

Optional readings for this topic from Operating Systems: Principles and Practice: Section 10.2.

What is the abstraction provided by an OS to a process?

(Virtual) memory
A subset of the instruction set of the underlying machine
Most (but not all) of the hardware registers
A set of kernel calls with particular arguments for file I/O, etc.
Overall: a subset of the facilities of the underlying machine, augmented with extra mechanisms implemented by the operating system.

What if we implemented a different abstraction for a process, which looks exactly like the underlying hardware:

The complete instruction set of the underlying machine
Physical memory
Memory management unit (page maps, etc.)
I/O devices
Traps and interrupts
No predefined system calls

This abstraction is called a virtual machine:

To a "process", it appears that it has its own private machine.
Multiple "processes" can share a single machine, each thinking it's running on its own private machine.
The operating system for this is called a hypervisor.
Can run a complete operating system inside a virtual machine: called a guest operating system.
Each virtual machine can run a different guest operating system.

Implementing hypervisors

One approach: simulation

Write program that simulates instruction execution.
Simulate memory, I/O devices also.
Examples:
- Use one large file to hold contents of a "disk"
- Simulate kernel/user bit, interrupt vectors, etc.
Problem: too slow
- 100x slowdown for CPU/memory
- 2x slowdown for I/O

Better approach: use CPU to simulate itself.

Run guest OS in user mode.
Most instructions execute at the full speed of the CPU.
Anything "unusual" causes a trap into the hypervisor, which simulates the appropriate behavior.

Special cases:

Privileged instructions (e.g. HALT):
- Since virtual machine runs in user mode, these cause "illegal instruction" traps into hypervisor.
- Hypervisor catches these traps, simulates appropriate behavior.
Kernel calls in guest OS (both guest user and guest OS run in user mode):
- User program running under guest OS issues kernel call instruction.
- Traps always go to hypervisor (not guest OS).
- Hypervisor analyzes trapping instruction, simulates system call to guest OS:
  - Move trap info from hypervisor stack to stack of guest OS
  - Find interrupt vector in memory of guest OS
  - Switch simulated mode to "kernel"
  - Return out of hypervisor to interrupt handler in guest OS.
- When guest OS returns from system call, this traps to hypervisor also (illegal instruction in user mode); hypervisor simulates return to guest user level.
I/O devices:
- Guest OS reads/writes virtual I/O device register
- Hypervisor has arranged for the containing page to fault
- Hypervisor takes page fault, recognizes address as I/O device register
- Hypervisor simulates instruction and its impact on the simulated I/O device
- When actual I/O operation completes, hypervisor simulates interrupt into the guest OS
- For better performance, write new device drivers that call directly into the hypervisor (using system calls): paravirtualization.
Virtual memory: hypervisor uses page maps to simulate virtual memory mapping in guest OS.
- Three levels of memory:
  - Guest virtual address space
  - Guest physical address space
  - Machine physical memory: hypervisor must have total control over this
- Today's solution: extended page maps:
  - Another layer of address translation.
  - Translates from physical addresses (guest-specific) to machine addresses (real memory)
  - Hypervisor controls all of the extended page maps, while guest OS controls normal page maps.
  - Much simpler and more efficient than shadow page maps.
- Original solution: shadow page maps
  - Guest OS creates page maps, but these aren't used by actual hardware.
  - Hypervisor manages the real page maps; these are called shadow page maps.
  - Hypervisor traps instruction to set the page map base, records info about the guest OS page maps.
  - On page faults, hypervisor updates shadow page maps using info from guest OS pages tables and its knowledge of physical memory.
  - When guest OS modifies its page maps, guest OS must trap the updates and reflect the changes in the shadow page maps.
  - Two kinds of page faults:
    - Page not in guest physical memory: hypervisor must pass through to guest OS
    - Page in guest physical memory, but not in machine physical memory: hypervisor just updates shadow page map (fault invisible to guest OS)
  - Quite tricky, and potentially slow.

Potential problem:

Hypervisor must trap any behavior that requires simulation.
- Special memory locations (e.g. page maps)? Use page faults.
- Special instructions? Must trap
Pathological case:
- Instruction that is valid in both user mode and kernel mode
- But, behaves differently in user mode
- Example: "read processor status" (where kernel/user mode bit is in the status word)
Virtualizable: a machine with no such special cases
Until recently, very few machines were completely virtualizable (e.g. x86 wasn't until recently)

Dynamic binary translation: solution for older machines that are not virtualizable:

Hypervisor analyzes all code executed in virtual machine
Replaces non-virtualizable instructions with traps
Very tricky: how to find all code?
Can use this to run hypervisor as a user-level program

In practice, how much overhead do hypervisors add?

CPU-bound applications: < 5%
I/O-bound applications: ~30%

History/usage of virtual machines

Invented by IBM in late 1960's

Original usage:

One VM per user
Each user ran a different single-user guest OS
Single shared hardware platform

Interest waned in the 1980's and 1990's:

Each user had a private machine

Reinvented, made practical by Mendel Rosenblum and graduate students at Stanford, formed VMware.

Software development:

Need to test software on different OS versions:
Keep one VM for each OS version.
Use a single machine to test all versions.

Datacenters:

Problem: many machines, each running a single application
- Need separate machines for isolation: application crash could bring down the entire machine
- Most applications only need a fraction of machine's resources.
Solution: datacenter consolidation
- One VM per application
- Run several VM's on a single machine
- Reduce # of machines

Encapsulation, restart:

Hypervisor can encapsulate entire state of a VM in a file.
Can save, continue, restore old state.
Datacenter example:
- Can migrate VM's between machines to balance load
Software development:
- Tests may corrupt the state of the machine
- Solution:
  - Run tests in a VM
  - Always start tests from a saved VM configuration
  - Discard VM state after tests
  - Results: reproducible tests

Heavily used in cloud computing (e.g. Amazon Web Services, Google Cloud).