Stanford University
Computer Science 249B: Winter 2008 - 2009: Answers

CS249B Midterm 2009

Feb 12th 2008. Closed book exam.

You should complete the exam yourself without assistance from others. It is a violation of the honor code to receive assistance from others or to copy material without proper attribution.

Feel free to use point form. Each question can be answered satisfactorily in a page of writing or less. I am primarily interested in the most key points responsive to each question. Each question is weighted equally.

Class the formatting and wording of question 4 is slightly different than was on the exam. See the cs249bmidterm09 for the exact wording of the questions.

Software Engineering: Ms. Scrabel, a "diversity consultant", has advised your boss that you need to move away from the rigid Cheriton view of following a strict software development process, focusing on hiring discipline "common culture" engineers that fit into this. With the benefit of her liberal arts BA, she claims that diversity brings different points of view, giving people liberty in how they approach problems leads to greater creativity, and there are many ways of solving problems. Describe the key points you would make to counter her position.
Answer:
1. High degree of consistency is necessary for software, given the details really matter and the complexity can otherwise be overwhelming, especially as software evolves and scales.
2. No "solution" to a problem is being banned; it just has to pass review by the team as fitting to the discipline/conventions or warranting changing them.
3. Strict discipline/conventions at the level that the team can agree on frees one's time and mind to solve higher-level problems, leading to more creative solutions and greater productivity. This contrasts with struggling with weirdo code written by Giovanni, who left 2 weeks ago.
  Grading: Full credit for each part if the reason was very close to the above. Partial credit was given to other good reasons.
Audit:
1. Give an example of a bug that would not likely be caught by conventional functional testing, and too expensive to check as part of inline asserts yet would be caught with audit.
2. A procedure signature does not completely specify the situations in which the procedure can effectively do something useful. Some programmers argue for asserts as specification of preconditions to handle this. The alternative is to have the procedure check for such conditions and just return if nothing it can do. Describe the trade-offs between asserts and the "just return" approach.
Answer:
1. One example is: There is a collection of File objects that each (are supposed to) have a pointer back to their corresponding Dir parent object. A bug occurs because some code is setting the parent backptr of its parent, rather than a child object. Functional testing would likely not catch unless some test depended on the corrupted parent pointer so as to cause it. Asserts would likely not catch because it would be expensive to check all children's backpointers in each function, and each object checking its own backpointer would only catch this if the parent was called after this corruption. Audit would catch this because it would run at the end, and checking backptrs is a established part of audit. Another example could be based on allocated + free = total, if one has to count each of the first two in collections.
  A specific bug was required (4 points). Then the reason why it would not be caught by functional and audits were 3 points each.
2. Just-return is attractive because: i) it arises "naturally" in some cases, such as an optimize routine where it cannot be clear to the caller if there is something to do or not, ii) it avoids polluting the caller with tests that might compromise the encapsulation of the procedure behavior/capability, iii) unlike asserts, the code does not change between dev and production. The big negative is one can have many extraneous calls that "do nothing". However, profiling should reveal this and other performance problems, and if it does not, it is not really a performance problem.
  We gave 4 points for the first reason and 3 for the next two reasons. Full credit required discussing both asserts and just-return. Additionally we required at least one negative and one positive. Multiple other reasons were accepted for asserts.
Shared Descriptions:
1. Mr. Knowitall claims that what Cheriton advocates for large value types, namely implementing them as a hidden pointer to a shared description is just one way of dealing with large values. Describe the argument for this being the best way, perhaps only way.
2. PEEVE (pointer equality equals value equality) is challenging to achieve. Describe how to do this, and the costs of doing so.
Answer:
1. If a large value is passed by value, it is an expensive copy and can exceed the stack space, causing failure. Putting it on the heap means, if not shared, allocate and copy when passed, introducing the possibility of copy-constructor failure on out of memory, and incurring copy cost in any case. Moreover, the total memory required for an application is hard to determine if dependent on the calling pattern.
  We gave 4 points for the first reason and 3 for next 2 reasons. Reasons were the three listed above (exceed stack, heap means many copies / CC failing, and predictable memory).
2. As described in class, represent as fixed-sized blocks in a canonical hierarchical structure so that it is easy to do duplicate suppression on blocks and each structure is unique for a given value. The key costs are: i) checking for an existing block on every block "creation",
3. restructuring a value as a result of modification.
4. access to values thru the hierarchical structure compared to direct memory access with a conventional linear rep in memory.
  We gave 4 points for the first reason and 3 for next 2 reasons. Reasons were the ones listed above. We required some mention of the fixed-size block or hierarchical structure such as a directory. The term database alone did not suffice.
Concurrency: Consider a sophisticated security monitoring system that is processing multiple streams of video from security cameras in a building, matching objects in the video to door card reader indications of personnel entry and tracking their path through the building, maintaining a log of activities and generating alarms according to departure from profiles. Assuming you are to implement this on a cluster of shared memory multiprocessors, sketch the design of a parallel implementation of this system reflecting the principles of the course, focusing on providing cycles and fault-tolerance and scalability.
Answer: Each separate module runs as a separate process in the cluster, with some replicated per input, e.g. a video monitoring process per camera. There is also a system database process (with backup process) maintaining the quasi-persistent state of the system. Input from the video cameras may be wired directly to each video monitoring process, to minimize internal system communication cost. The video processing per camera can be separated into two or more modules that run in parallel, either in same process or separate - the former to minimize com. overhead. Similarly, card reader processes update the system database with personnel entries and attempted entries - low bandwidth updates. Separate tracking processes match entries with objects identified by video with those corresponding to entries. Separate diagnosis processes match profiles with personnel in the building, indicating alarm conditions to the system database on detecting anomalies. Thus, the system can scale with cameras, cardreaders and personnel by adding corresponding processes, yet achieves fault-tolerance using a system database, allowing fast restart. The video processing process is the only one with local parallelism, given the cycles required and the bandwidth between low-level and high-level video processing.
4 points were specifically awarded for discussing the camera processes, card reader processes, system database, tracking/alarm/logging, and fast restart/fault tolerance as above. If any of these were missing, additional points were awarded for extra detail in a different sections. Other full credit answers than the one above included a discussion of the agent structure.
Collections and Iterators:
1. Mr. Knowitall argues that non-invasive collections are far superior because: i) they handle objects in other modules that you cannot modify (unlike invasive collections), ii) objects can be in arbitrarily many collections, iii) memory is cheap. Describe the counter-argument to each of these points. Answer: i) you normally need extra information about a remote object, so you end up with a locally defined entry object pointing at that remote one, thus allowing an invasive collection. ii) in theory yes, but in practice only one or two and otherwise often one needs some modification per collection, e.g. PacketDesc and packet queues. iii) yes, but a key challenge is how a system scales, and you need to know much much memory is required and collections times objects grows quadratically in some cases.
  We gave 4 points for the first reason and 3 for next 2 reasons. Reasons were the ones listed above.
2. Cheriton's version approach to detecting a collection has changed during iteration seems pretty questionable at first glance, given that does not guarantee the application a consistent view of the collection directly and may cause livelock if the application retries until no version change during iteration. Describe why this is the best of the feasible solutions in many situations. Answer: You cannot lock a collection during iteration, given that may lock indefinitely and many deadlock the current process if a function it calls performs the modification during iteration. It can be too expensive to make a copy of the collection as with snapshot semantics, and then too requires either locking or retries unless you use a fancy sparse-array structure, as per the String in Chapter 12. And, throwing an exception to allow the application to decide make impose excessive overhead when apps do not need to care. And, many iterations are not sensitive to minor insertions and deletions during the iteration.
  We gave 4 points for the first reason and 3 for next 2 reasons. Reasons were the ones listed above (cannot lock during iteration, too expensive to make a copy, throwing an exception is excessive, and many iterations are not time sensitive).
The End

CS249B Midterm 2009

The End