Raft Project 2 Review/Discussion (Winter 2022)

Everyone made improvements from Project 1.

  • Deeper classes (simpler APIs, less specialization & information leakage)
    • Network communication pretty good in every project
  • Better error detection and logging (but still more work to do)

Exceptions

How many new exception types to define?

  • Just use std::runtime_error?
    • Too generic (useful to distinguish exceptions into categories that might be useful when catching)
  • Define new types RecoverableException and FatalException?
    • Still too broad
    • Fatal vs. recoverable is determined by the catcher, not the thrower
  • Instead, define a few class-specific exception types:
    • StorageFailure, or NetworkError
    • Start with 1-2 per major class? Can add more later if needed.
  • How to define exceptions? See slides.
  • Must document exceptions in the interfaces

Class Design: Together vs. Apart

Given various pieces of functionality, which belong together in the same class/method and which should be separated in different classes/methods?

Key considerations:

  • Separate general-purpose and special-purpose code
    • Over-specialization creates information leakage
  • Combine things that are related, separate things that are not related
    • Do one thing at a time
  • Do the whole job in one place

Examples:

  • Raft server contains state machine for shell?
  • Client main program also has code to communicate with Raft cluster?
  • Communication libraries for server-server and client-server communication?
  • Raft server also manages communication with clients?
  • Log class also manages other persistent state such as term and vote?
  • Separate code for sending heartbeats and AppendEntries requests (no log entries in heartbeats)?

Performance Issues

Log overheads that scale with size of the log:

  • Scan entire log at startup
  • Keep index for entire log in memory
  • Rewrite entire index for each modification
  • These shouldn't be necessary: only the most recent entries are likely to be accessed.

Lots of copying of messages

How to approach performance in general:

  • Learn what operations are unusually expensive
  • Design to avoid these big problems
  • For everything else, design for simplicity.

Smaller Stuff

Name of parameter that determines whether to reset persistent state:

  • notExistOK?
  • When something is important, the name should convey that.
  • Also doesn't have the right semantics.

Distinct methods for opening and closing connections:

  • Not needed: can do automatically when sending messages
  • "Just do the right thing"