DATA HOMEWORK 3

  1. Skim this for background: Mann, W.C. and S.A. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8(3) (243-281).

  2. Look in this directory of the RST-annotated corpus from the LDC:
    /afs/ir/data/linguistic-data/mnt/mnt19/rst_discourse_treebank/data/RSTtrees-WSJ-main-1.0/TEST/wsj_0655.out.rst
    
    and look at the labeling of the following sentence (we've given the discourse segmentation):
    1.  Mr. Baker's assistant for inter-American affairs, Bernard Aronson, 
    2. while maintaining
    3. that the Sandinistas had also broken the cease-fire,
    4. acknowledged:
    5. "It's never very clear who starts what."
    

    Draw the RST tree for this sentence. Just do it on paper if that's easier. It has been argued that this particular example is better modeled not as a tree but as a DAG. The argument has to do with a relation that needs to be captured between segments 2-3 and 4-5, and the relation betweeen segments 1 and 4. See if you can reconstruct this argument and see what you think. After you have turned in the homework (not before), you may want to check the relevant paper on this DAG/tree question, which is

    Wolf, Florian and Edward Gibson. 2005.  Representing Discourse Coherence:
    A Corpus-Based Study. Computational Linguistics 31:2.
    

  3. Now look at the rest of the LDC RST-annotated corpus and write a paragraph on something interesting you notice in the data. Again, the data is here:

    /afs/ir/data/linguistic-data/RST_discourse_treebank
    
    We recommend you look in this directory
    data/RSTtrees-WSJ-main-1.0/TRAINING
    
    and look at the .lisp.name files in the various .out.rst directories.