MS&E 130/231: Information Systems, Autumn 2005-06

Instructor: Ashish Goel

Handout 13 – practice problems for the final

 

A) Final exam from last year

Closed book. All questions worth ten points. Total time: 120 minutes

Please be warned – I am planning to make this year’s exam a little harder.

 

  1. What are the technical design features of KaZaA that make it (at least till now) legal whereas much of Napster’s functionality was deemed illegal?

 

  1. A wants to send a file to B. A and B do not trust each other, and would like the file to pass through an intermediary C who will maintain a record of the transaction. A and B do not want C to be able to recover the file  (since the information in the file is private). However, in case of future litigation, either A or B must be able to give the original file to C and get C to verify that this was indeed the file that was transferred from A to B. Suggest a simple protocol for implementing this scheme. Ignore network intruders and the like – assume a fully secure channel between any pair.

 

  1. Prove that public key cryptography is at least as powerful as shared key cryptography.

 

  1. Consider the following relational schema representing the collection of CDs which have been recorded by a collection of artists:

 

CompactDiscs(artistID, artistName, title, year);

Artists(artistID, artistName, artistAddress);

                 

Here you may assume that artistID uniquely identifies an artist. Explain why the above schema is not a well-designed one. How would you fix the problem? What are the appropriate primary and foreign key constraints for your new schema?

 

  1. A sink component of the World Wide Web is a set of web pages such that none of the web pages in the set points to any of the web pages outside this set. Explain why sink components are a problem for the naďve version of PageRank discussed in class. Explain how PageRank addresses this problem.

 

  1. Which of the three is the largest (in total size) out of

 

a)        A corpus (i.e. collection) of documents,

b)       A set of indices built on each document in the corpus,

c)        A set of reverse indices built on each word that occurs somewhere in the corpus.

 

Which is the smallest? Explain your answer. We are not interested in small differences in size, but in big “orders-of-magnitude” differences.

 

B) Some additional problems

1.        Suppose p = 7, q = 5, and n = pq = 35

a.        What is f(n)?

b.        Suppose e = 11. By trial and error, find d in the range 1…f(n)-1 such that ed = 1 (mod f(n)).

c.        Suppose the RSA public key is (n,e) and the RS private key is (n.d). Apply the public key to the number 6.

2.        Suppose you have a basic AIMD (additive increase, multiplicative decrease) implementation of TCP – each time all the packets in a window get successfully acked, the window size increase by one. When there is a packet drop, the window size is halved. If a TCP connection suffers a packet loss every 10th packet, what is the rate at which it transmits? Assume a fixed RTT of 100ms, a packet size of 1KB, and assume that no ACKs get lost.

3.        You are given a database with the following four tables:

Class(instructorSSN, year, quarter, classNumber);

EnrolledIn(studentSSN, enrolledYear, enrolledQuarter, enrolledClassNumber);

Instructors(name, SSN);

Students(name, SSN);

The first table stores information for all the classes in the history of a University. The second has enrollment data for all the classes. The third stores the name and the SSN of every past and present instructor, and the last does the same for students.

    1. What are the appropriate key constraints for each of these tables?
    2. Write an SQL query (forget about efficiency etc.) that lists the names of all the students who took a class with someone named “Ashish Goel”. No name should be listed twice.

a.        Write an SQL query that lists the name of the instructor with SSN 123456789. What index should you build to efficiently support this query?

4.        You are given N + M web pages, N of which are maintained by men and the other M by women. Every web page has a hyperlink to every web page maintained by someone of the opposite sex. What is the PageRank of a web page maintained by a man? Assume that the reset probability is 0.

  1. Consider our running example database:

                  movie(title, year, studioName);
                  movieStar(name, address, gender, birthDate);
                  starsIn(starName, movieTitle, movieYear);


Write SQL statements for the following two queries:

a.        Who were the male stars in the movie titled “Serendipity”?

b.        Who were the stars (names only) in the movies produced by MGM in 1995? Each name should be output at most once.

 

  1. How would you write 5(a) using nested SELECTs to guarantee that the query is executed efficiently? Why would this help? Also, what index, if any, should you build to further speed up queries which are the same as 5(a) but with different movie names?