GETING STARTED

You can download a 14-day evaluation copy of SPSS for Windows from
www.spss.com.  Or you can use copies installed on the Leland systems
computers (the elaines) or the Linguistics department student computer
cluster.  This tutorial will assume that you've already installed SPSS.

1) Start SPSS.

2) Load the data file.  

  a) Go to File | Read Text Data.  Choose the right file name.
 
  b) You'll enter the Text Import Wizard.  In step 1 of 6, it asks
    "Does your text file match a predefined format?"  Choose
    "No". Then press "next".

  c) In step 2, where it asks "How are your variables arranged?"
    choose "Delimited".  Where it asks "Are variable names included at
    the top of your file?" choose "no".  Then press "next".

  d) In steps 3 and 4, leave everything as is and press
     "next".

  e) In step 5, you get to name your variables.  In turn, click on
     each column, go to the box that says "Variable name", and enter a
     name for the variable.  In our case, I used the following names:

       Column 1: Deletion
       Column 2: POS (part of speech)
       Column 3: Environment
       Column 4: class (for social class)

     After you name a variable, you can also choose the data format --
     basically the relevant choices will be whether they are numbers
     or strings.  In our case, you should make sure that "class" has
     data format "Numeric" so that SPSS interprets its values (1
     through 4) as numbers.

     Hit "next".

  f) In step 6 you have the option of saving this set of choices so
  that you can quickly load another file with exactly the same data
  format.  We'll ignore this for now.  Then hit "finish".


3) You should now have a spreadsheet-type view of the data file, with
   the first four columns being "Deletion", "POS", "Environmnent",
   "class". You can do a lot of manipulations (editing cells, cutting
   and pasting cells or groups of cells) directly on your data in
   Excel-type fashion.  Take this opportunity to save your data to
   disk.  You can then re-open the file directly from the 

     File | Open | Data

   menu selection, without going through the whole "Read Text Data"
   process again.

4) You're now ready to do all sorts of analysis on the dataset.  Most
   of the important options are under the "Analyze" menu in Windows
   (it's the "Statistics" menu in UNIX).  For example, click on
   
     Analyze | Descriptive Statistics | Frequencies

   and then move all the variables from the left side to the right
   side by clicking on them one by one and pressing the button with
   the right-pointing arrow.  Then click "OK".  An "Output" window
   will pop up with tables for frequencies of each value for each
   variable.  From these tables you can see things like:

     o 57.6% of the tokens have /-s/ deletion.

     o few of the tokens are verbs (1.3%)

     o the majority of the tokens precede a consonant (63.3%)

5) Since we're interested in the effects of the second-, third-, and
   fourth-column variables (part of speech, following environment, and
   social class) on the first variable (/-s/ deletion), we need to
   look at the frequency of deletion for different values of the other
   variables.  From the menu, select

     Analyze | Descriptive Statistics | Crosstabs

   and move Deletion to the box under "Columns" and move POS to the
   box under "Rows". (I think you get the most readable layout for
   this data if you choose to list the independent variable values
   down the side of the table, and the dependent variable values
   across the top, especially because the dependent variable has only
   two values.)

   By default, the table will include only absolute counts, but it's
   nicer to see frequencies too, so click on "Cells" and then
   click on "Rows" under "Percentages" in the window that comes up,
   then click Continue.  

   Finally, click OK, and in the Output window you will get a table of
   final /-s/ deletion frequency as it varies with the word's part of
   speech.  You can see that deletion is most frequent (70.3%) for
   nouns, least frequent (33.6%) for verbs.

   If you follow all the steps above but use Environment or Class
   instead of POS, you get similar tables for these independent
   variables.  You can see that "pause" is the following environment
   most favoring deletion, and that deletion is much less common
   (28.5%) in the highest social class than in all the other social
   classes (all > 50%).  Personally, the biggest surprise to me so far
   is that deletion is less common for a following-consonant
   environment than for a following-vowel environment, but hey, I'm
   not a phonologist.  Also, we can look to see whether this effect is
   maybe an artifact...

6) We can also create nested tables, which helps us look at the data
   in a bit more detail.  Try this: choose Crosstabs again, and then
   put Environment in the "Rows" box, then put POS in the "Layer" box.
   When you click OK, you will get a nested table.  For every part of
   speech, you can conveniently compare the percentages of deletion
   for each environment.  You can see that for plural markers on
   adjectives and nouns, and for monomorphemic words, following
   consonants seem to discourage deletion, but for verbs and
   determiners, following environment doesn't seem to have much of an
   effect.  In the next few lectures we'll learn how to test more
   rigorously whether differing environment makes a difference for
   each part of speech, and vice versa.