Stnf
Stanford University
Computer Science 244b: Spring 2009

Assignment #2: Distributed Replicated Files

Dates

Assigned: Wednesday May 6, 2009
Due: Thursday, May 28, 2009 at 23:59

The goal of this assignment is to implement client and server prototypes for a distributed file system in which the files are replicated. The purpose of this assignment is to explore a service specific protocol, relying on transactions for reliable delivery rather than conventional transport techniques.

For this assignment, you will be working individually.

Machine Compatibility

Your program must run on the Linux machines (pod*.stanford.edu) in the Terman Engineering Computer Cluster. You may use multicast packets to disseminate state as with the previous Mazewar assignment. You are not allowed to use physical broadcast packets to disseminate state. You can re-use the multicast setup code from your first asssignment. The multicast group address is same as first assignment, namely 0xe0010101.

Structure, Interfaces, and Implementation

Your goal is to provide us with the following:

We are providing you with:

The framework code we provide is in /usr/class/cs244b/replFs.
Applications which wish to use the distributed filesystem link to the client-side library and use the client-side interface to make write calls to a replicated file. Of course, to this application, the replication and use of the network must be entirely transparent. Your most important tasks are to design and implement the client/server protocol.

Required Client API(Application Program Interface)

The application interface to the client MUST include the following:

int InitReplFs(int portNum,int packetLoss);
int AddServer(char *id);
int OpenFile(char *name);
int WriteBlock(int fd, void *buffer, int byteOffset, int blockSize);
int Commit(int fd);
int Abort(int fd); (Use int instead of void)
int CloseFile(int fd);

Conspicuous by its absence is the lack of a ReadBlock() call in this API. Reading is not required for this assignment.

Protocol Skeleton

In providing the API listed above, the library code at the client and the file servers conspire to provide distributed replicated files. While we have provided an outline of how you should do this, your first job should be to flesh out this skeleton.

A brief outline of the steps involved with accessing a file:

  1. The client opens the file by multicasting to the servers and collecting responses. All reachable & running servers must be accounted for.
  2. The client multicasts writes across the net. Upon receiving these writes, the server writes to a copy of the file. Note specifically that there are no ACKs at this stage.
  3. If the application commits, two phase commit is used to ensure that all updates from the previous step were received. (If not received, this information must somehow be retransmitted). Upon preparing to commit changes to a file, the client needs to identify to the servers in some way the list of updates to the server to commit, making sure that every server has all of the changes before sending a commit message. If all of the servers return an ``OK to commit'' message, the client then sends out a commit message. Alternatively, the client may send an abort message whereupon the servers revert the file to its previous state.

Assignment Assumptions

For this assignment, you are allowed and required to assume the following:

System-Wide Details

Server Details

Client Details

Testing Criteria

We will be using a test app to test your filesystem. It will not be supplied to you in advance. It will attempt to stress your system in a number of ways. We will run tests with varying transaction size, with 1-3 servers, etc.

Report

The writeup is intended to be an insightful explication and analysis of your work. As always, we encourage you to employ point form in your answers and focus on protocol issues. Please put the write-up in a README file.

The following sections should be included:

  1. Protocol Specification: Document your protocol by specifying packet formats, sequencing and semantics of packets and protocol events.
  2. Evaluation: Discuss the merits and disadvantages of this approach to replication versus using conventional reliable transport.
  3. Future Directions: Discuss extensions, refinements, and modifications to your protocol and implementation that would be required for real deployment. An answer to this discussion necessarily includes consideration of large scale systems and files.

What to turn in

Run submit script /usr/class/cs244b/bin/submit from the directory that contains all the source files, the makefile and the README file. The README can be in pdf form if you wish. This is assignment/project #2.

Tentative Rubric

50% of your grade will be based on the tests run on your implementation. The other 50% of your grade is based upon the report.