An archaeological expedition in Murray Hill, New Jersey has uncovered some magnetic disk drives from the mid-1970s. After considerable effort, the dig's technical expert read the contents of the drives; for each drive that was uncovered, she created a file on her computer which contains a bit-for-bit copy of the data on the disk.
The archaeologists determined that the data on the disks is stored using the Version 6 Unix file system by examining the disk images. But since none of the staff on hand took CS110, they haven't been able to read the files from the images.
Your task is to write a program which understands the Unix v6 file system and can read all the files on such a disk image.
We have four main goals for this assignment.
First, we intend for you to get hands-on experience with the computer systems concepts of layering, naming, and modularity.
Second, we hope you'll learn the basics of how a file system organizes a raw disk device into logical files.
Third, we want you to hone your low-level programming skills. Like all of the assignments in CS110, you'll write this one in C, which provides little to no safety net when you make an error. C can be painful for even the most experienced programmers, but practice helps.
Finally, this assignment will serve as a basis for assignments 3 and 4 in this course. Therefore it is highly recommended that you get it working perfectly now. Otherwise, you will have to spend time allotted for assignment 3 to get this assignment working first.
We will grade your assignment based on a combination of test results, code quality, and answers to discussion questions:
First, we'll run your code on a set of test disk images and check that your program gives the expected output. You'll have access to all of the images we'll use for grading and the correct output for each of them.
If this were a real archaeological dig, you would have been given only the disk image and a computer. But to make things easier, we've provided you with starter code which tries to read all the files on the image and outputs checksums of the inodes' and files' contents. We've also computed these checksums using a working implementation of the assignment. If your checksums match ours, then your code is correctly reading the data off the disk.
Next, we'll read over your answers to the discussion questions.
Finally, we'll read through your code and give you comments on style and overall quality. We'll look for places where you inappropriately break the layering of the file system or make other mistakes which don't necessarily cause your program to generate the wrong output. We'll also look for common C programming errors such as memory leaks and use of the heap where stack allocation is more appropriate. Finally, we'll check that your code is easy to read: It should be well-formatted, organized, and be easy to follow.
See our code quality guide for more details, but don't stress out too much about this part.
You have the option of either working on a virtual machine or on one of Stanford's cluster machines (corn or myth) for this assignment. Each setup has its own drawbacks. Cluster machines are often overflowing with MATLAB jobs; the VM takes a bit more setup, but allows you to work on the assignments offline. It's really up to you, just use the setup that you're most comfortable with.
Follow these steps to set up our VM on your personal computer. If you have problems, you can always use corn or myth.
Once you've completed the above steps, you'll need to get the starter code to start working on the assignment.
If you're on one of the cluster machines, simply copy the assignment files
from CS110's AFS directory into the directory in which you will be working on
the assignment: cp -r /usr/class/cs110/assn1/readfiles
. This should give you the test_me script and the
code directory.
If you're using the VM, follow the steps in the Getting the Starter Code section of the VM
setup guide. Make sure to also copy the test disks and the new version of the
test_me script to your VM, or you won't be able to test your
assignment (the VM setup guide explains how to do this).
To build the project, cd to the
code directory and run
make
To run the project, cd to the code directory and run
./diskimageaccess <options> <diskimagePath>
<diskimagePath> should be the path to one of the test disks
located in /usr/class/cs110/assn1/testdisks. We currently have three test disks:
basicDiskImage,
depthFileDiskImage, and
dirFnameSizeDiskImage.
diskimageaccess recognizes three flags as valid <options>:
-i-p-rFor example, to run both the inode and filename tests on the basic disk, you
could run
./diskimageaccess -ip /usr/class/cs110/assn1/testdisks/basicDiskImage
The expected output for running diskimageaccess on each disk
image X is stored in the X.gold file inside the
testdisks directory.
You can avoid having to type the full path to the testdisks
directory by creating a
symbolic link:
ln -s /usr/class/cs110/assn1/testdisks disks
Then the example above would become
./diskimageaccess -ip disks/basicDiskImage
AFS is somewhat flaky when used with the VM. If you are developing on the VM, we strongly recommend that you copy the test disks over to the VM's local file system. See the appropriate section in the VM setup guide for more information.
You can run the same test script we'll use when we grade your assignment by executing
perl test_me
from outside your code directory.
From the cluster machine, submit your code by running
/usr/class/cs110/bin/submit 1
from your readfile directory.
If you developed on the VM, see the section on submitting in the VM setup guide.
The files we provide you fall into three categories.
filsys.h,
ino.h, direntv6.h)diskimageaccess.c,
chksumfile.[ch], unixfilesystems.[ch],
Makefile)diskimg.[ch],
inode.[ch], file.[ch],
pathname.[ch])diskimg.[ch])inode.[ch])file.[ch])directory.[ch])pathname.[ch])You are welcome to modify any of the files we give you. When making changes in the
test harness, do so in a backward compatible way so that our testing scripts continue to
work. In other words, it's fine to add testing and debugging code, but make
sure that when we run your program with -i and -p,
its output has the same format as before your changes, so our automated tests
won't break.
Before you try to write any code, you'll want to familiarize yourself with the details of the Unix v6 file system by reading section 2.5 in the course textbook and looking over the supplement to that section in this document. You may also find it helpful to read and understand how the test script works.
The starter code contains a number of unfinished functions. We suggest you attack them in the following order:
inode_iget() and inode_indexlookup() in
inode.c.file_getblock() in file.c. After this step,
your code should pass the inode level functionality tests.directory_findname() in directory.c.pathname_lookup() in pathname.c. Refer to
Section 2.5.6 of the course text for this part. Afterward, all the pathname
tests should pass.Section 2.5 of your textbook contains most of what you need to know about the Unix v6 file system in order to do this assignment. The information below is supplementary to the textbook, so it assumes you've already read and understand the material in the textbook.
In the starter code we've provided C structs corresponding to
the file system's on-disk data structures. These structs have the same layout in
memory as the structures have on disk. They include:
struct filsys (filsys.h)struct inode (ino.h)struct direntv6 (direntv6.h)In addition, unixfilesystem.h contains a
description of the file system layout, including the sector address of the
superblock and of the start of the inode region.
Back in the 1970s, storage space — both on disk and in main memory —
was at a premium. As a result, the Unix v6 file system goes to lengths to reduce
the size of data it stores. You'll notice that many integer values in the structs
we provided are stored using only 16 bits, rather than today's more standard 32
or 64. (In our code we use the type int16_t from
stdint.h to get a 16-bit integer, but back in the '70s, the C
int type was 16 bits wide.)
In another space-saving move, the designers of the file system stored the
inode's size field as a 24-bit integer. There's no 24-bit integer type in C,
so we represent this value using two fields in the inode struct:
i_size1, which contains the least-significant 16 bits of the value,
and i_size0, which contains the most-significant 8 bits of the
value. We provide a function inode_getsize() in
inode.c which assembles these two fields into a normal C integer
for you.
Since there is no inode with a inumber of 0, the designers of the file
system decided not to waste the 32 bytes of disk space to store it. The first
inode in the first inode block has inumber 1; this inode corresponds to the
root directory for the file system. (See unixfilesystem.h for
details.)
Be careful not to assume that the first inode has an inumber of 0!
i_modeThe 16-bit integer i_mode in the inode struct isn't really a
number; rather, the individual bits of the field indicate various properties of
the inode. ino.h contains #defines which describe
what each bit means.
For instance, we say an inode is allocated if it points to an extant file.
The most-significant bit (i.e. bit 15) of i_mode indicates whether
the inode is allocated or not. So the C expression
(i_mode & IALLOC) == 0
is true if the inode is unallocated and false if it is allocated.
Similarly, bit 12 of i_mode indicates whether the file uses the
large file mapping scheme. So if
(i_mode & ILARG) != 0
then the inode's i_addr fields point at indirect and
doubly-indirect blocks rather than directly at the data blocks.
Bits 14 and 13 form a 2-bit wide field specifying the type of file. This
field is 0 for regular files and 2 (i.e. binary 10, or the constant
IFDIR) for directories. So the expression
(i_mode & IFMT) == IFDIR
is true if the inode is a directory, and false otherwise.
After successfully running your program on some of disk images, the dig's management was shocked to discover some of the files on the disks contained content that was illegal to possess today. They therefore requested that your program let them erase a specified file from the disk. We'll give you extra credit if you implement this.
A correct implementation of this portion of the assignment will:
accept a file to erase with the -r command-line
option.
fully erase the file from the disk so even government officials with forensic "undelete" software will not be able to retrieve the file from the disk image.
only erase normal files; directories and other special files shouldn't be erased.
leave the file system in a consistent state so that a file system consistency checker will not detect problems.
respect the file system's layering. You won't get full credit for a working remove implementation which breaks the layering abstraction willy-nilly.
We don't give you shells of the routines you need to implement for this part of the assignment, so you'll need to figure out what functions to add to which layers yourself.
Note that back in the 1970s, the unlink command removed directory entries by zeroing them out. It didn't compact the directory after removing an entry.
To properly delete a file, you will need to modify the freelist. The file
system tracks free space on the disk by maintaining a linked list of structures
containing two fields: an array of block numbers and the number of valid block
numbers in the array. The first element of the free list is kept in the
superblock (fields s_nfree and s_free).
s_free is an array of block numbers containing up to 100 free
blocks on disk. Field s_nfree is the number of valid block numbers
in the s_free array.
The 0th element of the array in a free space element (e.g.
s_free[0] in the superblock) acts as a link pointer in that it stores
the block number of a block that contains the next free space list element. In other
words, the first 16-bit word of that block is the number of free block numbers
listed in the block, and the next up to 100 16-bit words contain the block
numbers of free blocks. Like the other list elements, the first block number in
the array points to the next element in the list. A value of zero for a link
pointer meaning there are no more elements in the list.
Setting s_ninode to zero will cause the file system to scan
through the inodes to find the unallocated ones, so in some sense, the free
list only acts as a hint. But you still need to properly update it in order to
get full credit.
One of dig team's members would like your help pulling off a practical joke:
He'd like to be able to add files to a disk image. We'll give you
extra credit if you enhance the diskimageaccess program to accept the following
options: -a <existingFile> -n <pathname>
where <existingFile> is a an absolute or relative path to a
file on your computer's file system and <pathname> is the
absolute path into the disk image where the file should be placed.
For example:
-a myfile -n /dirA/dirB/newname Should place
myfile under /dirA/dirB/ as newname.
As in the other extra credit, your implementation should follow system's layering and maintain a consistent file system.
Please answer the following questions in a plain-text file called
discussion.txt in your submit directory.
We might judge a system to have have good modularity if we can change it without touching unnecessarily many modules or reworking the existing modularization boundaries.
Of course, when you're first desigining a system, you usually don't know what kinds of changes you'll need to make to it in the future. But in the case of the Unix v6 file system, we do know what enhancements were made over time.
For each of the enhancements to the Unix v6 file system listed below, state which layer or layers would need to be changed in order to implement the feature, and describe how each layer would need to be changed.
Use the layer names and descriptions from Table 2.2 in the textbook.
It's particularly painful to make a non-backwards-compatible change to the API that programs use to communicate with the operating system, since then potentially every application written for that operating system needs to be updated to reflect the change.
Consider the API for interacting with the Unix v6 file system, listed in Table 2.1 of the textbook. Which of the enhancements from part (a) would require modifying the API?
Hint: consider how the Unix commands cp,
ls, and rm are implemented.
To get more information on system calls described in Table 2.1 in the textbook,
you can look them up in the manuals on any Linux/Unix machine command prompt.
For example, to get more information about the open system call,
you can do:
man 2 open
(Note that the 2 represents the manual "level" -- this is to differentiate
between the open system call and, for example, a utility program called "open").
The manual pages will give you information about the types of the formal parameters.
To quit out of a man page, press q.
Every naming scheme has three components: a name space, a name mapping algorithm, and a universe of values. For each of the following names from the Unix v6 file system, describe the three components for that naming scheme:
File systems often print an error message when a call to
diskimg_readsector fails due to of a problem with the disk media
(e.g. a bad sector on disk). These error messages are often rather cryptic to
users since diskimg_readsector has access only to low-level
identifiers such as i-numbers and block numbers, rather than to user-facing
identifiers such as filenames.
For example, a message might say "Error reading disk #1, i-number 42, block 2", when the user would have preferred "File /etc/passwd unreadable due to disk error."
For each call to diskimg_readsector in your
code, describe what changes you'd need to make to your program in
order to print a user-friendly error message if the call to
diskimg_readsector fails.
Did you develop on the VM or a cluster machine?
Optional: Did you run into any problems in your development environment? (For instance, did AFS act up within your VM?) If so, please describe.
Optional: Do you have any feedback for us about this assignment?
Optional: How is the class going for you so far? Anything else you'd like to tell us about the class?
Did you complete any of the extra credit for this assignment? If so, please briefly describe your work.