Linguist 278: Programming for Linguists Stanford Linguistics, Fall 2009 Christopher Potts Assignment 1 - Command-line basics (navigation, pipes, filters) Distributed 2009-09-21 Due 2009-09-27 by 17:00 PDT ====================================================================== 1 In the beginning was the command line Download "In the beginning was the command line", by Neal Stephenson: http://www.cryptonomicon.com/beginning.html Search for the string THE HOLE HAWG OF OPERATING SYSTEMS and read at least that section and up through the end of the OS SHOCK section, so that you have the proper Unix mindset going forward. ====================================================================== 2 Course required installs Head to http://www.stanford.edu/class/linguist278/resources.html and run through the required installs. Please come to office hours after class on Thursday if you encounter any problems that you can't surmount yourself. Note: NLTK's Python link is currently not working. Just head to http://www.python.org/download/ to get Python. Get version 2.6.2. ====================================================================== 3 Getting Choose Your Own Career in Linguistics Download the archive at https://www.stanford.edu/class/linguist278/restricted/data/choose_your_career_in_linguistics.tgz Navigate to the directory to which you have downloaded the file choose_your_career_in_linguistics.tgz and issue the following commands to unpack the archive: gunzip choose_your_career_in_linguistics.tgz tar xvf choose_your_career_in_linguistics.tar Comments: * gunzip creates the tar file (it can also be used to unpack zip files) * tar unpacks to a directory x = extract from an archive f = read from the file specified v = verbose output, i.e., a listing of the contents * If you have trouble with these commands, then just double-click the file (Windows users might do better with https://www.stanford.edu/class/linguist278/restricted/data/choose_your_career_in_linguistics.zip ). However, try to get the commands to work, so that you stay in the command-line mindset. What you have after unpacking is a text-only version of Trey Jones's Choose Your Own Career in Linguistics, a choose-your-own-adventure website (http://specgram.com/choose/). You might want to play around with the Web version before proceeding, to get a feel for what it's all about. ====================================================================== 4 Following a career path Navigate to the root of the choose_your_career_in_linguistics directory and list the contents with ls. The file to start with is the_start. View that with fmt or cat. At the bottom of this and subsequent files, there are directions for how to navigate. If the instruction says "Down into D", then you move down to daughter directory D. If the instruction says, "Up one level", then you move up to the parent directory. And so forth. When you arrive at a new directory, type ls to see what's there. Navigate around in this manner until you have completed your professional journey. -------------------------------------------------- OPTIONAL EXTENSION Run through another career path, but without ever leaving the root directory. You can do this with just ls and cat or fmt. -------------------------------------------------- ====================================================================== 5 Recording a career path In the root directory of the game, create a new, empty file called my_career_path.txt. Now run through another career path, but this time, whenever you view a file, append its name and then its contents to my_career_path.txt. When you are finished, you should have a file containing all the steps you took. -------------------------------------------------- OPTIONAL EXTENSION Work on making the file readable and informative --- think about including dates, times, file sizes, and other things that Unix commands can give you. Try: man fmt man date man cal man ls man cat man finger for ideas. -------------------------------------------------- SUBMIT: my_career_path.txt ====================================================================== 6 Hiding your career details Let's suppose you are shy about your professional choices, so you want to keep my_career_path.txt out of sight. Rename your file to .my_career_path.txt that is, the same name as before, but with a dot in front of it. Now type ls. Your file should be invisible. Use ls with the "show hidden" option to reveal even these hidden files. At this point, make your file visible again. SUBMIT: the sequence of commands you used to hide your file, view its name, and make it visible again. ====================================================================== 7 Clean-up a. Create a directory called "tmp" as a sister directory to choose_your_career_in_linguistics. SUBMIT: the command for creating this directory b. Move my_career_path.txt to that directory. SUBMIT: the command for moving this file ====================================================================== 8 Advanced ls Use "man ls" to learn more about the options for ls. Then, move to the root directory of choose_your_career_in_linguistics. Now formulate commands for the following: a. List all directory names and files in the whole game. SUBMIT: your command b. List all directory names and files in the whole game, but with a / marking directory names. SUBMIT: your command c. List all and only the files (no directory names). SUBMIT: your command d. List all and only the files (no directory names), sorted so that the largest file is listed first. SUBMIT: your command ====================================================================== 9 Getting the word lists Download the word-lists archive: https://www.stanford.edu/class/linguist278/restricted/data/word_lists.tgz (Change tgz to zip if you're on Windows and have issues.) This is a collection of lists of words, one per line. ====================================================================== 10 Basic sort Use the unadorned sort command to sort the file words-english.txt. What is unintuitive about the results? SUBMIT: A description of why the results seem off. Now explore the man page for sort, then formulate a command using it that sorts words_english.txt more intuitively. If you don't succeed in getting what you are after, try to articulate what the problem is, with reference to the man page. SUBMIT: your command, possibly with commentary ====================================================================== 11 Pipe and sort Formulate a command that prints out the filename of just the most recently modified file in the directory word-lists. SUBMIT: your command ====================================================================== 12 Pipe and uniq The followcding files are lists of English first names, one per line: word_list_mobywords_given_names_english_female.flat.txt word_list_mobywords_given_names_english_male.flat.txt Use sort, uniq, cat, and the pipe to determine whether either of these list contains any duplicates. Assume case-insensitivity, so that, e.g., "Sam" and "sam" would count as the same name. SUBMIT: your commands and a description of the output ====================================================================== 13 Overlap? How many names are on both the male and female lists? SUBMIT: the number, and the commands you used to find out ====================================================================== 14 Formatting Stephenson Navigate to your copy of Neal Stephenson's 'In the beginning was the command line'. Then: a. Print out just the word count for this file. SUBMIT: your command b. Print out just the number of lines in this file. c. Calculate the average line length. (Use whatever you like for this. You could even dive into Python: type python, hit return, then type out your numerical statement, using / for division, and adding .0 to the end of your numbers so that Python returns a real/float.) d. Observe that the lines are very long. e. Write out a command that prints just the first 6 lines in a readable format. SUBMIT: your command f. Write out a command that prints just the first 6 lines in a readable format with line numbers. This should be an extension of your command for (e). The printed numbers will run much higher than 6 due to the way fmt works. SUBMIT: your command -------------------------------------------------- OPTIONAL EXTENSION Create a file called command-fmt.txt that contains a version of command.txt with the following tweaks: a. The title and author's name are centered at the top. b. Everything else is nicely formatted using fmt. Notes: * Whatever command you write will be tailored to the specific spacing of this file.) * This can be done with head, tail, fmt, and >>. (There might be other ways using more advanced utilities too. Just get the job done!) -------------------------------------------------- ====================================================================== 15 curlmirror You should have all your advisors' papers! You shouldn't, though, waste any of your time clicking links. So: a. Download curlmirror.pl: http://curl.haxx.se/programs/curlmirror.txt Change the .txt extension to .pl. This is a perl script. You don't need to worry about it's innards, though, because it has a nice interface. When you have it, type perl curlmirror.pl --help to see the options it provides. b. Use curlmirror to download your advisors' websites to directories named in their honor. Take care to ensure that you have the download-size limitations set high enough to accommodate even very prolific researchers. ====================================================================== ======================================================================