Linguist 278: Programming for Linguists Stanford Linguistics, Fall 2009 Christopher Potts Assignment 2 - Using egrep Distributed 2009-09-28 Due 2009-10-04 by 17:00 PDT ====================================================================== 1 ENGLISH WORDS a. Find all and only the words in the words-english.txt that begin with exactly two consonants, end with exactly two consonants, and have exactly three vowels (a,e,i,o,u) in between. SUBMIT: Your command b. Find all and only the words in words-english.txt that end with a string of 3 or more consonants, excluding y (to make it more interesting). Have egrep print out just the total number of lines that match. SUBMIT: Your command c. Same as (b), but with words-german.txt. SUBMIT: Your command d. What is the longest sequence of vowels (a,e,i,o,u) in words-english.txt? Write a series of egrep commands to find out. SUBMIT: Your commands e. Find all and only the words in the word list that contain exactly two NON-ADJACENT z characters. (Consider Z and z to be z characters.) Number your output and store it in a file. SUBMIT: Your command ====================================================================== 2 UNRULY TEXT Download this small collection of literature: https://www.stanford.edu/class/linguist278/restricted/data/literature.tgz (Change tgz to zip if you're on Windows and have issues.) a. Write a regular expression for all vowel-less words 4 letters or longer (where the vowels are just a, e, i, o, and u) and then use egrep to search through all the files in the literature directory, printing only the matching words, without filenames, line numbers, etc. Store your results in a file called vowelless.txt SUBMIT: Your command b. Same as (a), but now print just the total number of counts for all files, so that you get something like: file1:count1 file2:count2 ... Append this to vowelless.txt preceded by the string "Individual file counts". SUBMIT: Your command c. Same as (a), but now print just the total number of matches for all files, so that the output is a single number. (This might take a very different form than your solution to (b).) Append this to vowelless.txt preceded by the string "Total". ====================================================================== 3 FILE-SYSTEM INFO a. Go into the word-lists directory and use ls and egrep to print just the days on which the files were modifed, sorted from newest to oldest. SUBMIT: Your command b. Same as (a), but now print just the set of dates (duplicates from the above list removed). SUBMIT: Your command c. In the word-lists directory, print out just the file names that contain the strings "english" or "words" (or both). ====================================================================== ======================================================================