 |
CS 124 / LINGUIST 180 - Winter 2009
Homework 5: Exercises and Search Engine Comparison |
| Due: Feb 17 before the start of class |
This assignment takes the form of a problem set. Many of the questions are
similar to the ones that might show up on the final. Answer them fully but
succinctly. Please turn in your problem sets IN CLASS, ON PAPER the day
they're due. All remember to staple and write your name on your
homework. (On every page is highly advised.) Late submissions may be slipped
under Jenny/David's office in Gates 232 or handed in during office hours.
Questions
- IIR Exercise 1.1
- IIR Exercise 1.4
- IIR Exercise 2.1
- IIR Exercise 2.3
- Describe how Pang, Lee, and Vaithyanathan (2002) dealt with negation
words like "not", "no", or "didn't" in their sentiment analysis.
Make up two review sentences that show different kids of
negation that their method would fail on. Propose an improved
algorithm.
- Using Levenshtein distance, is ride closer to shard
or to bier? (Give the two Levenshtein distances.)
-
Finish the computations in Figure 6.10 of Speech and Language Processing, and
tell us the final viterbi value v3(end) (also called v3(qf)). Also as I
mentioned in class note that there is a typo in the computation of v2(2), so
you'll have to fix that along the way. Show your work.
-
Describe some differences between the snippets (the "keywords in context") that
Google and Yahoo! return. For example, you might comment on how "approximate
query matching" (in which the word in the document may not be exactly in the
same form as in the query) relates to the snippets produced by the two. Give
two queries and the results provided by each search engine. Explain what you think
the search engines are doing differently.
-
Some search engines seem to do more complex morphological analysis (stemming)
than others. Give example queries (and results for the queries) that
demonstrate how differences in the approach to morphology/stemming result in
different pages returned by Google and Yahoo!. Give at least one example
where Google's approach seems better, and at least one where Yahoo!'s approach
seems better. Explain your reasoning.