Raft Project 1 Review/Discussion

Lecture Notes for CS 190
Winter 2018
John Ousterhout

Not many hefty deep classes
Biggest overall challenge: when to bring together, when to keep apart?
- Related things should be located near each other (e.g., same class)
- Unrelated things should be kept apart
Reasons to bring together:
- Shared information
- Duplicated code
- Can simplify the interface
Reasons to pull apart:
- Unrelated information
- Separate general-purpose code from special-purpose uses.
Example: you know how a class will be used, so info specific to that usage gets embedded in the class. Result: class over-specialized, higher cognitive load for readers.
Just because you know something doesn't mean you should embed that knowledge in every class!
Rule of thumb: separate general-purpose code from special-purpose code; avoid putting specialized info in classes
Example: tried to divide state machine up into multiple classes, but this didn't work well
Rule of thumb: if two pieces of code access the same information, bring them together
Rule of thumb: try to do each logical task in one place; don't split up parts of a task unless they are relatively independent.
Red flags for things that should be brought together: information leakage, flipping back and forth, code duplication.
Pull related things together, even within a class (Examples 2, 3, 4)

Threads and Synchronization

Other synchronization notes:
- Keep it as simple as possible
- Coarse-grain is best, unless its performance is intolerable
- Use simple monitor-style
  - Lock on object
  - Acquire lock on method entry, release on exit
- Finer-grain locks are very hard to get right (Example T1)
- Find a way to document your overall synchronization strategy (no good place? See Example T2)
- Persistence has issues similar to synchronization: unsafe to persist term and vote separately.
Timers added a lot of complexity
- Starting and stopping is verbose
- Separate election and heartbeat timers also adds complexity
- Hard to keep track of which is running (state isn't obvious)
Alternative approach: don't start and stop timers
- One repeating timer
- Pick frequency that is a fraction of election timeout (~1/10th?)
- Keep state variables:
  - Last time AppendEntries requests were sent
  - Last time we heard from a valid leader
- When timer fires, check state variables to see whether time-related actions need to happen
- Eliminates various race conditions that happen with timers.
- Or don't even have timer; just specify time limit on waiting operations, such as selector.select.

In general, hard to convince myself that exception handling was correct.
- Problem: code split. Throw in one place, catch in another.
Have a plan
First think about how to handle (categories?)
- State file doesn't exist
- State file cannot be read or parsed
- Can't write persistent state (Example E1)
- Incoming message is malformed
- I/O error receiving request
- I/O error sending response
- I/O error in outbound communication (request or response)
- Can't open socket to listen on
The same exception (IOException) may have to be handled differently in different situations
Report exceptions in terms that make sense to handler
- Example: disk error writing persistent state: IOException?
- Ask what is the right abstraction
- May need to define new exception classes

Most groups implemented messages, not RPCs
- A remote procedure call consists of paired messages: request and response
- Response is paired with request, typically synchronous (wait for respones)
- RPC system will typically retry if no response received
- With message-based approach, each message independent, no obvious association between requests and responses.
For Raft servers, don't really need RPC
- Can process response without knowing which request it came from
- No need for retry at this level: higher-level timers for elections and heartbeats are sufficient
- Messages are simpler to implement (especially if you can create a new socket for each one!)
- But, what will you do for client communication in Project 2?
Lots of corner cases that weren't considered:
- Only keep count of votes, rather than list of servers?
- Trying to do I/O without blocking:
  - Only part of a message arrives
  - Immediately wait on new socket for message: will block if sender didn't send a message.
  - Can block on writing as well as reading, if socket backs up.
- Barrier-style broadcasts.

Avoid Pairs (Example M1): expedient but poor abstraction, obscure.
- Define small container class instead
Getters and setters: when is it better to make variables public?
- If every instance variable has a getter.
- If class is shallow and likely to stay that way.
ByteBuffer: interface both shallow and obscure (Example M2)
- Constantly have to read the documentation?
- Essentially no information hiding