Managing Complexity

Lecture Notes for CS 190
Spring 2015
John Ousterhout

Fundamental Philosophy

Programs evolve continuously:
- Can't get the architecture right the first time
- Additional feature needs arise over time
It's not good enough to write code that works
- Code must also be "beautiful"
- Why? Real goal is to enable continual improvements over a 10-20 year lifetime
- It must be easy to make these improvements, even for people who weren't involved in the original construction
  - Original authors are gone, or can't remember what they did
Goal for design: allow changes to made easily
- How much work has to be done to accomplish a task?
- How much information does a programmer need in his/her mind to accomplish a task?
- How easy is it to find the required information?
Complexity accretes
- No one thing makes a system complicated
- It's an accumulation of thousands of small things over time
- Once complexity arises, hard to eliminate
- To prevent complexity, must sweat the small stuff
- Typical (wrong) developer philosophy: "as long as I don't make things much more complicated, it's OK"
- Must adopt a zero-tolerance mindset: everything matters.
Real-world pressures encourage complexity
- Fastest way to make progress in the short term is not to worry about complexity.
- To reduce complexity, must invest extra time now, but the biggest benefits don't come until the future.
- Must compromise: zero tolerance for complexity probably not viable.
- Focus on most important things:
  - Good interface design
  - Documentation more important for interfaces than internals
- Create a budget for refactoring and cleanup
- Find ways to teach new employees how to write simple code (e.g. code reviews)
- Investment to reduce complexity pays for itself relatively quickly (6-12 months?)
  - Without care, complexity builds up very fast
  - Once this happens, development becomes much more expensive, would have been cheaper to invest early on
For this class: zero tolerance for complexity

Goal for this class: teach you how to make things simple

Modular Design

Divide system into modules that are relatively independent
Ideal: each module completely independent of the others
- System complexity = complexity of worst module
In reality, modules are not completely independent
- Some modules must invoke facilities in other modules
- Design decisions in one module must be known to other modules
- Can't change one module without understanding parts of other modules

Abstraction

Technique for dealing with complexity: find a simple way to think about and manipulate a complex entity
Separate essential elements from details that can be ignored
Divide each module into two parts:
- Interface of a module: anything about that module that must be known to other modules
  - Formal aspects: method signatures, public variables, etc.
  - Informal aspects: side effects, algorithms that affect behavior of methods, etc.
- Implementation: code that enforces the promises made by the interface
Goal for interface design: maximize functionality/interface complexity (a sweet interface or module)

Parnas paper

"On the Criteria To Be Used in Decomposing Systems into Modules"
- More than 40 years old, some parts dated (e.g. predates classes)
- Still one of the most important papers in all of systems.
What is the key idea of this paper?

Information Hiding

Each module (class) encapsulates certain knowledge or design decisions:
No other class should need to understand these details
The interface does not expose internal implementation details

Classes Should be Thick

Thin class:
- Not much functionality
- Short methods that don't do much
- It's almost as much work to invoke a method as it would take to type in the body of the method
- Classic example: linked list
- Thin classes can't hide much information
Thick class:
- Lots of functionality, yet simple interface
- Hides lots of information
Classitis: too many classes
Rule of thumb: 200-2000 lines is a good size for classes
- Below 200 lines: probably pretty thin
- Above 2000 lines: internal complexity of the class can become unmanageable. See if it can be subdivided cleanly.
- However, size itself isn't the most important metric: it's functionality/(interface complexity)

Simplicity

Must decide what's important, design the interface around that
- But, how to know what's important?
- Focus on the things that are done most frequently
- Technique #1: if a particular task is invoked repeatedly, design an API around that task (or do it automatically, with no explicit feature).
- Technique #2: if a collection of tasks are not identical, look for common features shared by all of them; design APIs for the common features.
- It's OK to provide APIs for infrequently-used features, but design them in a way that you don't need to be aware of them when using the common features.
Bad example: Java I/O
Good example: device-independent I/O in UNIX/Linux:
- Before UNIX: different kernel calls for opening and accessing files vs. devices.
  - Different kernel calls for each device: terminal, tape, etc.
  - Different naming mechanisms for each device
- UNIX emphasized commonality across devices:
  - Devices have names in the file system: special device files
  - All devices have same basic access structure: open, read, write, seek, close
  - Handle device-specific operations with one additional kernel call:
```
int result = ioctl(int fd, int request,
        void* inBuffer, int inputSize,
        void* outBuffer, int outputSize);
```
High- and low-level APIs are most amenable to simplicity:
- Primitives (hash table, file block cache, etc.)
  - Should do one thing well; can often restrict functionality to enhance simplicity.
  - If they try to do several things at once they get too confusing.
- High-level abstractions (distributed transactions)
  - Encapsulate entire tasks with a ridiculously simple interface.
  - Typically conflate a whole bunch of things in their implementation.
- Intermediate-level APIs: hard to make these simple.

Generality

How general-purpose should a module be?
- E.g., "Should I implement extra features beyond those that I need today?"
- If the module is a basic building block likely to be used in multiple places, then design for generality
  - Focus on clean orthogonal features that are easy to use, can be combined together
  - Plan ahead for uses that aren't necessarily needed today
- If the module is only used in one place, make it specific
  - Specialize its API to make it simpler.
  - Leave out features not currently needed
- When in doubt, take the more specialized approach
- If you build a module for a single purpose, then discover it's being reused, refactor to generalize it.

The Martyr Principle

Module writers should embrace suffering:
- Take on hard problems
- Solve completely
- Make solution easy for others to use
- Take more pain for yourself, so that others have less
Push complexity down into modules:
- Let a few module developers suffer, rather than thousands of users
Solve, don't punt:
- Handle error conditions rather than throwing exceptions
- Minimize "voodoo constants" (configuration parameters)
  - If you don't know the right value, how will a user or administrator ever figure it out?

Applying These Ideas

May be hard initially to apply these ideas when writing code.
Make 2 designs and compare
Pick one and write some code
Review this topic to look for potential problems
Revise code
Take advantage of code reviews
Red flags to look for:
- Thin classes
- Information leakage
- Very deep call stacks (especially if one interface calls another that looks similar)
- Lint: little bits of unnecessary complexity
- Repeated pieces of code (DRY)

CS 190: Software Design Studio (Spring 2015)

Managing Complexity

Fundamental Philosophy

Modular Design

Abstraction

Parnas paper

Information Hiding

Classes Should be Thick

Simplicity

Generality

The Martyr Principle

Applying These Ideas