Raft Project 1 Review/Discussion (Winter 2024)

Click here for .cc file containing examples.

Class Design

Interesting opportunities for API design:

  • Communication
  • Persistence
  • Raft server state machine
  • Client-side communication
  • State machine (shell command execution)

Most common problems:

  • Specialization:
    • API/implementation tailored to Raft, limits usage for other things
    • Special-purpose code combined with general-purpose
  • Too many classes (shallow)
  • Fuzzy division of responsibility
    • A class handles part of a problem, but not all of it.
    • Or, multiple implementations of the same thing (e.g., for clients and servers)

Network Communication

  • Most common issues:
    • Poor encapsulation (shallow classes, too many classes, leakage)
    • Raft specialization (e.g. server IDs)
  • Connection topology alternatives:
    • Requester opens connections, responses sent back on the same connection as request.
    • Sender opens connections: all outgoing traffic (requests and responses use sender's connection).
    • One connection between each pair of servers.
  • Threading architecture:
    • Must allow independent operation of each socket
      • If one socket blocks, this must not prevent communication to/from other sockets
    • Choice #1: single thread:
      • Simple and clean from synchronization standpoint
      • Use epoll or select to wait for any socket to become ready
      • But, must use nonblocking I/O:
        • Reads may return only part of a message (must save it until the rest arrives).
        • Writes may send only part of the message (must save the remainder to try again later).
    • Choice #2: separate thread(s) for each socket
      • Threads use blocking read/write operations
      • Introduces synchronization issues
      • If there are many connections, this becomes inefficient
      • Does the multi-threaded approach increase server throughput?
    • Observation: the sockets streaming API is awkward for RPCs
  • Unifying client-client and client-server communication:
    • What problems motivated the differences?

Persistence

  • In most projects this was specialized for Raft:
    • No class: persistence implemented by Raft state machine
    • Separate class, but APIs reflect Raft details such as term and vote

Raft State Machine

  • Collect all of this code into a single class
  • Very simple API:
    • Constructor
    • run method
  • Decomposition choice #1: separate code for each state:
    • One method or class per state
  • Decomposition choice #2: separate code for each message type
  • Threading alternatives:
    • Match network module (e.g. execute commands on per-connection threads)
    • Hybrid (many threads in networking, only one thread in Raft server)
  • Keep synchronization simple!

Raft Client

  • Must do two things:
    • Read commands from stdin (and write results to stdout)
    • Communicate with the Raft cluster
  • Most projects combined this in a single class or file
  • Better to separate them:
    • Reading commands from stdin is just one possible approach
    • For Project 2, design general-purpose class for clients to communicate with a Raft cluster

Application State Machine

  • Most projects hard-wired shell state machine into Raft server
  • This specializes the Raft server
  • For Project 2, create APIs that allow Raft server to support any state machine with string commands and string results.
  • Implement shell state machine as one specific instance (separate from Raft server classes)
  • Note: commands may take a long time to execute

Exception Handling

Common problems:

  • Not enough error checks
  • Not enough info in log messages
  • Exceptions not handled in the best way

Must check results of every kernel call

In general, unsafe to assume anything about information coming from outside the process

  • Contents of files holding persistent data (e.g. std::stoi).
  • Messages
  • Command-line arguments

Logging is essential:

  • Log as often as you can possibly afford
  • Include as much information in the log message as possible
  • Log at the scene of the crime, where the most information is available (or, incorporate the info into an exception).

What to do when an error occurs? First, think about how it is likely to be handled.

Don't exit in low-level methods

  • Limits generality
  • Bad for unit testing
  • Instead, throw exception

Define specific exception types: don't just use std::exception or std::runtime_error (consider likely usage)

All threads should have top-level exception handlers: catch, log, exit

Thoughts for Project 2

Make classes general-purpose

Avoid specializations and restrictions

  • Just because you know something doesn't mean you should use that information
  • Delay specialization: push it up to the highest layers of the application

Think about solving big problems, not little ones

  • Design top-down?