CS110 Winter 2012

Assignment 1: Read Files from a Version 6 Unix Disk

Due: Friday, January 27, 2012 at 11:59PM

Assignment 1 FAQ

Overview

An archaeological expedition in Murray Hill, New Jersey has uncovered some magnetic disk drives from the mid-1970s. After considerable effort, the dig's technical expert read the contents of the drives; for each drive that was uncovered, she created a file on her computer which contains a bit-for-bit copy of the data on the disk.

The archaeologists determined that the data on the disks is stored using the Version 6 Unix file system by examining the disk images. But since none of the staff on hand took CS110, they haven't been able to read the files from the images.

Your task is to write a program which understands the Unix v6 file system and can read all the files on such a disk image.

Purpose

We have four main goals for this assignment.

First, we intend for you to get hands-on experience with the computer systems concepts of layering, naming, and modularity.

Second, we hope you'll learn the basics of how a file system organizes a raw disk device into logical files.

Third, we want you to hone your low-level programming skills. Like all of the assignments in CS110, you'll write this one in C, which provides little to no safety net when you make an error. C can be painful for even the most experienced programmers, but practice helps.

Finally, this assignment will serve as a basis for assignments 3 and 4 in this course. Therefore it is highly recommended that you get it working perfectly now. Otherwise, you will have to spend time allotted for assignment 3 to get this assignment working first.

Grading

We will grade your assignment based on a combination of test results, code quality, and answers to discussion questions:

First, we'll run your code on a set of test disk images and check that your program gives the expected output. You'll have access to all of the images we'll use for grading and the correct output for each of them.

If this were a real archaeological dig, you would have been given only the disk image and a computer. But to make things easier, we've provided you with starter code which tries to read all the files on the image and outputs checksums of the inodes' and files' contents. We've also computed these checksums using a working implementation of the assignment. If your checksums match ours, then your code is correctly reading the data off the disk.

Next, we'll read over your answers to the discussion questions.

Finally, we'll read through your code and give you comments on style and overall quality. We'll look for places where you inappropriately break the layering of the file system or make other mistakes which don't necessarily cause your program to generate the wrong output. We'll also look for common C programming errors such as memory leaks and use of the heap where stack allocation is more appropriate. Finally, we'll check that your code is easy to read: It should be well-formatted, organized, and be easy to follow.

See our code quality guide for more details, but don't stress out too much about this part.

Getting started

You have the option of either working on a virtual machine or on one of Stanford's cluster machines (corn or myth) for this assignment. Each setup has its own drawbacks. Cluster machines are often overflowing with MATLAB jobs; the VM takes a bit more setup, but allows you to work on the assignments offline. It's really up to you, just use the setup that you're most comfortable with.

Follow these steps to set up our VM on your personal computer. If you have problems, you can always use corn or myth.

Getting the starter code

Once you've completed the above steps, you'll need to get the starter code to start working on the assignment.

If you're on one of the cluster machines, simply copy the assignment files from CS110's AFS directory into the directory in which you will be working on the assignment: cp -r /usr/class/cs110/assn1/readfiles . This should give you the test_me script and the code directory.

If you're using the VM, follow the steps in the Getting the Starter Code section of the VM setup guide. Make sure to also copy the test disks and the new version of the test_me script to your VM, or you won't be able to test your assignment (the VM setup guide explains how to do this).

Building and running

To build the project, cd to the code directory and run make

To run the project, cd to the code directory and run ./diskimageaccess <options> <diskimagePath>

<diskimagePath> should be the path to one of the test disks located in /usr/class/cs110/assn1/testdisks. We currently have three test disks: basicDiskImage, depthFileDiskImage, and dirFnameSizeDiskImage.

diskimageaccess recognizes three flags as valid <options>:

-i
test the inode and file layers
-p
test the filename and pathname layers
-r
remove the specified pathname from the disk image (extra credit)

For example, to run both the inode and filename tests on the basic disk, you could run ./diskimageaccess -ip /usr/class/cs110/assn1/testdisks/basicDiskImage

Expected output

The expected output for running diskimageaccess on each disk image X is stored in the X.gold file inside the testdisks directory.

Symlinking the testdisks directory

You can avoid having to type the full path to the testdisks directory by creating a symbolic link: ln -s /usr/class/cs110/assn1/testdisks disks Then the example above would become ./diskimageaccess -ip disks/basicDiskImage

Test disks and the VM

AFS is somewhat flaky when used with the VM. If you are developing on the VM, we strongly recommend that you copy the test disks over to the VM's local file system. See the appropriate section in the VM setup guide for more information.

Testing

You can run the same test script we'll use when we grade your assignment by executing perl test_me from outside your code directory.

Submitting

From the cluster machine, submit your code by running /usr/class/cs110/bin/submit 1 from your readfile directory.

If you developed on the VM, see the section on submitting in the VM setup guide.

A tour through the source files we provide

The files we provide you fall into three categories.

The Version 6 Unix header files (filsys.h, ino.h, direntv6.h)
These are described below.
The test harness (diskimageaccess.c, chksumfile.[ch], unixfilesystems.[ch], Makefile)
These files provide the infrastructure for building your code and running our tests against it.
File system module (diskimg.[ch], inode.[ch], file.[ch], pathname.[ch])
The test code we give you interfaces with the file system using a layered API that we've already designed. For each of the layers that you'll need to implement for this assignment, there is a header file that defines the interface to the layer and a corresponding .c file that should contain the implementation. The layers are:
Block layer (diskimg.[ch])
This defines and implements the interface for reading and writing sectors on the disk image. We give you an implementation of this layer that should be sufficient for this assignment.
Inode layer (inode.[ch])
This defines and implements the interface for reading the file system's inodes. This includes the ability to look up inodes by the inumber and to get the sector number of the n'th block of the inode's data.
File layer (file.[ch])
This defines and implements the interface for reading blocks of data from a file by specifying its inumber. This is one of the two layers our tests explicitly exercise.
Filename layer (directory.[ch])
This defines and implements the interface for implementing Unix directories on top of files. Its main function is to get information about an entry in a directory.
Pathname layer (pathname.[ch])
This defines and implements the interface to look up a file by its absolute pathname. This is the second of the two layers we explicitly test.

You are welcome to modify any of the files we give you. When making changes in the test harness, do so in a backward compatible way so that our testing scripts continue to work. In other words, it's fine to add testing and debugging code, but make sure that when we run your program with -i and -p, its output has the same format as before your changes, so our automated tests won't break.

Before you try to write any code, you'll want to familiarize yourself with the details of the Unix v6 file system by reading section 2.5 in the course textbook and looking over the supplement to that section in this document. You may also find it helpful to read and understand how the test script works.

Suggested implementation order

The starter code contains a number of unfinished functions. We suggest you attack them in the following order:

  1. inode_iget() and inode_indexlookup() in inode.c.
  2. file_getblock() in file.c. After this step, your code should pass the inode level functionality tests.
  3. directory_findname() in directory.c.
  4. pathname_lookup() in pathname.c. Refer to Section 2.5.6 of the course text for this part. Afterward, all the pathname tests should pass.

Unix v6 file system supplement

Section 2.5 of your textbook contains most of what you need to know about the Unix v6 file system in order to do this assignment. The information below is supplementary to the textbook, so it assumes you've already read and understand the material in the textbook.

Header files and structures

In the starter code we've provided C structs corresponding to the file system's on-disk data structures. These structs have the same layout in memory as the structures have on disk. They include:

struct filsys (filsys.h)
Corresponds to the superblock of the file system. This is a slightly modified copy of the header file from Version 6 Unix.
struct inode (ino.h)
Corresponds to a single inode. Again this comes from Version 6 of Unix, with some small modifications.
struct direntv6 (direntv6.h)
Corresponds to a directory entry. This was copied from section 2.5 in the textbook.

In addition, unixfilesystem.h contains a description of the file system layout, including the sector address of the superblock and of the start of the inode region.

Legacy of an old machine

Back in the 1970s, storage space — both on disk and in main memory — was at a premium. As a result, the Unix v6 file system goes to lengths to reduce the size of data it stores. You'll notice that many integer values in the structs we provided are stored using only 16 bits, rather than today's more standard 32 or 64. (In our code we use the type int16_t from stdint.h to get a 16-bit integer, but back in the '70s, the C int type was 16 bits wide.)

In another space-saving move, the designers of the file system stored the inode's size field as a 24-bit integer. There's no 24-bit integer type in C, so we represent this value using two fields in the inode struct: i_size1, which contains the least-significant 16 bits of the value, and i_size0, which contains the most-significant 8 bits of the value. We provide a function inode_getsize() in inode.c which assembles these two fields into a normal C integer for you.

The first inode

Since there is no inode with a inumber of 0, the designers of the file system decided not to waste the 32 bytes of disk space to store it. The first inode in the first inode block has inumber 1; this inode corresponds to the root directory for the file system. (See unixfilesystem.h for details.)

Be careful not to assume that the first inode has an inumber of 0!

inode's i_mode

The 16-bit integer i_mode in the inode struct isn't really a number; rather, the individual bits of the field indicate various properties of the inode. ino.h contains #defines which describe what each bit means.

For instance, we say an inode is allocated if it points to an extant file. The most-significant bit (i.e. bit 15) of i_mode indicates whether the inode is allocated or not. So the C expression (i_mode & IALLOC) == 0 is true if the inode is unallocated and false if it is allocated.

Similarly, bit 12 of i_mode indicates whether the file uses the large file mapping scheme. So if (i_mode & ILARG) != 0 then the inode's i_addr fields point at indirect and doubly-indirect blocks rather than directly at the data blocks.

Bits 14 and 13 form a 2-bit wide field specifying the type of file. This field is 0 for regular files and 2 (i.e. binary 10, or the constant IFDIR) for directories. So the expression (i_mode & IFMT) == IFDIR is true if the inode is a directory, and false otherwise.

Extra credit

After successfully running your program on some of disk images, the dig's management was shocked to discover some of the files on the disks contained content that was illegal to possess today. They therefore requested that your program let them erase a specified file from the disk. We'll give you extra credit if you implement this.

A correct implementation of this portion of the assignment will:

We don't give you shells of the routines you need to implement for this part of the assignment, so you'll need to figure out what functions to add to which layers yourself.

Note that back in the 1970s, the unlink command removed directory entries by zeroing them out. It didn't compact the directory after removing an entry.

To properly delete a file, you will need to modify the freelist. The file system tracks free space on the disk by maintaining a linked list of structures containing two fields: an array of block numbers and the number of valid block numbers in the array. The first element of the free list is kept in the superblock (fields s_nfree and s_free). s_free is an array of block numbers containing up to 100 free blocks on disk. Field s_nfree is the number of valid block numbers in the s_free array.

The 0th element of the array in a free space element (e.g. s_free[0] in the superblock) acts as a link pointer in that it stores the block number of a block that contains the next free space list element. In other words, the first 16-bit word of that block is the number of free block numbers listed in the block, and the next up to 100 16-bit words contain the block numbers of free blocks. Like the other list elements, the first block number in the array points to the next element in the list. A value of zero for a link pointer meaning there are no more elements in the list.

Setting s_ninode to zero will cause the file system to scan through the inodes to find the unallocated ones, so in some sense, the free list only acts as a hint. But you still need to properly update it in order to get full credit.

Extra extra credit

One of dig team's members would like your help pulling off a practical joke: He'd like to be able to add files to a disk image. We'll give you extra credit if you enhance the diskimageaccess program to accept the following options: -a <existingFile> -n <pathname> where <existingFile> is a an absolute or relative path to a file on your computer's file system and <pathname> is the absolute path into the disk image where the file should be placed.

For example: -a myfile -n /dirA/dirB/newname Should place myfile under /dirA/dirB/ as newname.

As in the other extra credit, your implementation should follow system's layering and maintain a consistent file system.

Discussion questions

Please answer the following questions in a plain-text file called discussion.txt in your submit directory.

    1. We might judge a system to have have good modularity if we can change it without touching unnecessarily many modules or reworking the existing modularization boundaries.

      Of course, when you're first desigining a system, you usually don't know what kinds of changes you'll need to make to it in the future. But in the case of the Unix v6 file system, we do know what enhancements were made over time.

      For each of the enhancements to the Unix v6 file system listed below, state which layer or layers would need to be changed in order to implement the feature, and describe how each layer would need to be changed.

      1. Supporting much larger (1 terabyte) disks
      2. Supporting much larger files
      3. Supporting larger (up to 255 character) file and directory names

      Use the layer names and descriptions from Table 2.2 in the textbook.

    2. It's particularly painful to make a non-backwards-compatible change to the API that programs use to communicate with the operating system, since then potentially every application written for that operating system needs to be updated to reflect the change.

      Consider the API for interacting with the Unix v6 file system, listed in Table 2.1 of the textbook. Which of the enhancements from part (a) would require modifying the API?

      Hint: consider how the Unix commands cp, ls, and rm are implemented.

      To get more information on system calls described in Table 2.1 in the textbook, you can look them up in the manuals on any Linux/Unix machine command prompt. For example, to get more information about the open system call, you can do:

      man 2 open

      (Note that the 2 represents the manual "level" -- this is to differentiate between the open system call and, for example, a utility program called "open"). The manual pages will give you information about the types of the formal parameters. To quit out of a man page, press q.

  1. Every naming scheme has three components: a name space, a name mapping algorithm, and a universe of values. For each of the following names from the Unix v6 file system, describe the three components for that naming scheme:

    1. i-number
    2. Absolute pathname
    3. Block within a file
  2. File systems often print an error message when a call to diskimg_readsector fails due to of a problem with the disk media (e.g. a bad sector on disk). These error messages are often rather cryptic to users since diskimg_readsector has access only to low-level identifiers such as i-numbers and block numbers, rather than to user-facing identifiers such as filenames.

    For example, a message might say "Error reading disk #1, i-number 42, block 2", when the user would have preferred "File /etc/passwd unreadable due to disk error."

    For each call to diskimg_readsector in your code, describe what changes you'd need to make to your program in order to print a user-friendly error message if the call to diskimg_readsector fails.

    1. Did you develop on the VM or a cluster machine?

    2. Optional: Did you run into any problems in your development environment? (For instance, did AFS act up within your VM?) If so, please describe.

    3. Optional: Do you have any feedback for us about this assignment?

    4. Optional: How is the class going for you so far? Anything else you'd like to tell us about the class?

  3. Did you complete any of the extra credit for this assignment? If so, please briefly describe your work.