Managing Complexity

Lecture Notes for CS 190
Spring 2015
John Ousterhout

Fundamental Philosophy

  • Programs evolve continuously:
    • Can't get the architecture right the first time
    • Additional feature needs arise over time
  • It's not good enough to write code that works
    • Code must also be "beautiful"
    • Why? Real goal is to enable continual improvements over a 10-20 year lifetime
    • It must be easy to make these improvements, even for people who weren't involved in the original construction
      • Original authors are gone, or can't remember what they did
  • Goal for design: allow changes to made easily
    • How much work has to be done to accomplish a task?
    • How much information does a programmer need in his/her mind to accomplish a task?
    • How easy is it to find the required information?
  • Complexity accretes
    • No one thing makes a system complicated
    • It's an accumulation of thousands of small things over time
    • Once complexity arises, hard to eliminate
    • To prevent complexity, must sweat the small stuff
    • Typical (wrong) developer philosophy: "as long as I don't make things much more complicated, it's OK"
    • Must adopt a zero-tolerance mindset: everything matters.
  • Real-world pressures encourage complexity
    • Fastest way to make progress in the short term is not to worry about complexity.
    • To reduce complexity, must invest extra time now, but the biggest benefits don't come until the future.
    • Must compromise: zero tolerance for complexity probably not viable.
    • Focus on most important things:
      • Good interface design
      • Documentation more important for interfaces than internals
    • Create a budget for refactoring and cleanup
    • Find ways to teach new employees how to write simple code (e.g. code reviews)
    • Investment to reduce complexity pays for itself relatively quickly (6-12 months?)
      • Without care, complexity builds up very fast
      • Once this happens, development becomes much more expensive, would have been cheaper to invest early on
  • For this class: zero tolerance for complexity
  • Goal for this class: teach you how to make things simple

Modular Design

  • Divide system into modules that are relatively independent
  • Ideal: each module completely independent of the others
    • System complexity = complexity of worst module
  • In reality, modules are not completely independent
    • Some modules must invoke facilities in other modules
    • Design decisions in one module must be known to other modules
    • Can't change one module without understanding parts of other modules

Abstraction

  • Technique for dealing with complexity: find a simple way to think about and manipulate a complex entity
  • Separate essential elements from details that can be ignored
  • Divide each module into two parts:
    • Interface of a module: anything about that module that must be known to other modules
      • Formal aspects: method signatures, public variables, etc.
      • Informal aspects: side effects, algorithms that affect behavior of methods, etc.
    • Implementation: code that enforces the promises made by the interface
  • Goal for interface design: maximize functionality/interface complexity (a sweet interface or module)

Parnas paper

Information Hiding

  • Each module (class) encapsulates certain knowledge or design decisions:
  • No other class should need to understand these details
  • The interface does not expose internal implementation details

Classes Should be Thick

  • Thin class:
    • Not much functionality
    • Short methods that don't do much
    • It's almost as much work to invoke a method as it would take to type in the body of the method
    • Classic example: linked list
    • Thin classes can't hide much information
  • Thick class:
    • Lots of functionality, yet simple interface
    • Hides lots of information
  • Classitis: too many classes
  • Rule of thumb: 200-2000 lines is a good size for classes
    • Below 200 lines: probably pretty thin
    • Above 2000 lines: internal complexity of the class can become unmanageable. See if it can be subdivided cleanly.
    • However, size itself isn't the most important metric: it's functionality/(interface complexity)

Simplicity

  • Must decide what's important, design the interface around that
    • But, how to know what's important?
    • Focus on the things that are done most frequently
    • Technique #1: if a particular task is invoked repeatedly, design an API around that task (or do it automatically, with no explicit feature).
    • Technique #2: if a collection of tasks are not identical, look for common features shared by all of them; design APIs for the common features.
    • It's OK to provide APIs for infrequently-used features, but design them in a way that you don't need to be aware of them when using the common features.
  • Bad example: Java I/O
  • Good example: device-independent I/O in UNIX/Linux:
    • Before UNIX: different kernel calls for opening and accessing files vs. devices.
      • Different kernel calls for each device: terminal, tape, etc.
      • Different naming mechanisms for each device
    • UNIX emphasized commonality across devices:
      • Devices have names in the file system: special device files
      • All devices have same basic access structure: open, read, write, seek, close
      • Handle device-specific operations with one additional kernel call:
        int result = ioctl(int fd, int request,
                void* inBuffer, int inputSize,
                void* outBuffer, int outputSize);
        
  • High- and low-level APIs are most amenable to simplicity:
    • Primitives (hash table, file block cache, etc.)
      • Should do one thing well; can often restrict functionality to enhance simplicity.
      • If they try to do several things at once they get too confusing.
    • High-level abstractions (distributed transactions)
      • Encapsulate entire tasks with a ridiculously simple interface.
      • Typically conflate a whole bunch of things in their implementation.
    • Intermediate-level APIs: hard to make these simple.

Generality

  • How general-purpose should a module be?
    • E.g., "Should I implement extra features beyond those that I need today?"
    • If the module is a basic building block likely to be used in multiple places, then design for generality
      • Focus on clean orthogonal features that are easy to use, can be combined together
      • Plan ahead for uses that aren't necessarily needed today
    • If the module is only used in one place, make it specific
      • Specialize its API to make it simpler.
      • Leave out features not currently needed
    • When in doubt, take the more specialized approach
    • If you build a module for a single purpose, then discover it's being reused, refactor to generalize it.

The Martyr Principle

  • Module writers should embrace suffering:
    • Take on hard problems
    • Solve completely
    • Make solution easy for others to use
    • Take more pain for yourself, so that others have less
  • Push complexity down into modules:
    • Let a few module developers suffer, rather than thousands of users
  • Solve, don't punt:
    • Handle error conditions rather than throwing exceptions
    • Minimize "voodoo constants" (configuration parameters)
      • If you don't know the right value, how will a user or administrator ever figure it out?

Applying These Ideas

  • May be hard initially to apply these ideas when writing code.
  • Make 2 designs and compare
  • Pick one and write some code
  • Review this topic to look for potential problems
  • Revise code
  • Take advantage of code reviews
  • Red flags to look for:
    • Thin classes
    • Information leakage
    • Very deep call stacks (especially if one interface calls another that looks similar)
    • Lint: little bits of unnecessary complexity
    • Repeated pieces of code (DRY)