Error Handling

Lecture Notes for CS 190
Spring 2015
John Ousterhout

Errors and exceptions come from many sources:
- From above:
  - Bad input from user or client
  - Misconfiguration and operator errors
- From below: underlying system facilities don't work as desired:
  - Disk I/O error
  - Can't open file (wrong permissions, missing directory, etc.)
  - Out of memory space
  - Network socket already in use
- From peers in a distributed system:
  - Server crashes
  - Slow communication
  - Lost network packets
- From ourselves: internal bugs
Errors and exceptions are a major source of complexity and bugs
- They account for a lot of code in large systems
- They disrupt normal code flow:
  - They happen in the middle of other activities
  - Something didn't work like you expected
- Hard to figure out how to handle them
  - May not be able to complete work in progress
- Language support is clunky
  - Verbose
  - Makes code hard to read
- Hard to test
- Don't occur very often in running applications
  - May not work when needed
Programmers often make the exception problem worse:
- Defensive programming: throw exceptions for anything that looks a tiny bit suspicious. More errors are better?
- Expediency: rather than figure out how to solve a problem, just throw an exception, punt it to the next level
- Result: even more exceptions, many of which no-one really knows how to handle.

Key idea: reduce the number of exceptions that must be handled. Specific techniques:
- Whenever possible, define errors out of existence:
  - Deleting variables in Tcl
  - File deletion in Windows
  - Bounds checks in Java substring method
- Mask errors (recover automatically so the error doesn't have to be reported)
  - E.g., if a server crashes, automatically fetch data from a backup server
  - Or, if a server crashes, wait until it restarts
- Collapse errors (handle several different cases with the same code)
  - Promote one error to another (in RAMCloud, many errors get promoted to "server crash").
    - Reuse existing handler
    - May not work for exceptions that happen frequently
  - Defer reporting to a place where other exceptions could already happen.
    - Example: in RAMCloud, report RPC errors only on wait, not send.
  - Advantages of collapsing:
    - Simplifies code (fewer cases).
    - Remaining handlers get invoked more often (will get debugged).
- Just panic (crash app)?
  - If there's not a viable way to handle it
  - Example: running out of memory in malloc
  - Or, throw an error, which isn't handled except at the very top level.
Before throwing an error, think about how the caller will handle it.
- If you can't visualize how the caller will handle it, rethink the error
- Choose between throwing an error and returning a value.
  - If the caller will almost always care about the error, might as well report it with a return value.
Make lots of information available after errors:
- Include in message in exception?
- Or, output to system log

CS 190: Software Design Studio (Spring 2015)

Error Handling