STANFORD CS 124 / LINGUIST 180   -     Winter 2009
Homework 5: Exercises and Search Engine Comparison
Due: Feb 17 before the start of class

This assignment takes the form of a problem set. Many of the questions are similar to the ones that might show up on the final. Answer them fully but succinctly. Please turn in your problem sets IN CLASS, ON PAPER the day they're due. All remember to staple and write your name on your homework. (On every page is highly advised.) Late submissions may be slipped under Jenny/David's office in Gates 232 or handed in during office hours.

Questions

  1. IIR Exercise 1.1

  2. IIR Exercise 1.4

  3. IIR Exercise 2.1

  4. IIR Exercise 2.3

  5. Describe how Pang, Lee, and Vaithyanathan (2002) dealt with negation words like "not", "no", or "didn't" in their sentiment analysis. Make up two review sentences that show different kids of negation that their method would fail on. Propose an improved algorithm.

  6. Using Levenshtein distance, is ride closer to shard or to bier? (Give the two Levenshtein distances.)

  7. Finish the computations in Figure 6.10 of Speech and Language Processing, and tell us the final viterbi value v3(end) (also called v3(qf)). Also as I mentioned in class note that there is a typo in the computation of v2(2), so you'll have to fix that along the way. Show your work.

  8. Describe some differences between the snippets (the "keywords in context") that Google and Yahoo! return. For example, you might comment on how "approximate query matching" (in which the word in the document may not be exactly in the same form as in the query) relates to the snippets produced by the two. Give two queries and the results provided by each search engine. Explain what you think the search engines are doing differently.

  9. Some search engines seem to do more complex morphological analysis (stemming) than others. Give example queries (and results for the queries) that demonstrate how differences in the approach to morphology/stemming result in different pages returned by Google and Yahoo!. Give at least one example where Google's approach seems better, and at least one where Yahoo!'s approach seems better. Explain your reasoning.