Assignment 1

Logistics

Due Date

Monday, April 20 at 11:59 PM

Policies

Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have four free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.

Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Datasets

This assignment makes use of the following two datasets. These datasets are also available on the Datasets page.

  • Titanic
    • Download link: Titanic.csv
    • Description: Data on passengers of the RMS Titanic. Entries include the name, age, class, fare, gender, and whether or not the passenger survived
    • Notes:
      • A blank entry for age means that the age is unknown
      • Fare can have more than two digits because money was not base-10 at that time Titanic Fare Data
  • World Cup
    • Download links: Players.csv, Teams.csv
    • Description: 2010 World Cup data including last name, team, position, minutes played, and game statistics for each player (Players.csv) as well as world ranking, games played in tournaments, and game statistics for each team (Teams.csv)
    • Notes:
      • Statistics, including yellowCards and redCards, are for the entire tournament (excluding final game).
      • Team ranking is the world ranking going into the tournament, so it may not be 1-32 even though there are only 32 teams

Setup Instructions

Please use Google Sheets to complete the problems. If you use Excel or another spreadsheet program we may not be able to help you with special features, and you should explain any nonstandard features that you use.

  1. Download the two datasets by clicking on the links above.
  2. Go to the Google Drive website. You may use your personal Gmail account to log in, or you can use your SUNet ID by entering your Stanford email address -- Google should automatically redirect you to WebLogin.
  3. To upload the CSV files, drag them from your desktop or file browser to the main Google Drive page.
  4. To open a newly uploaded CSV file, right-click the CSV, select "Open with" and choose "Google Sheets". After you open your CSV file in Google Sheets once, it will save a new version that opens automatically in Sheets from then on.
  5. You're now ready to solve the problems -- have fun!

If you want more info on how to use Google Sheets, check out the Google Spreadsheets references page.

Submission Instructions

You must create one PDF file containing all of your answers for this assignment, then submit it on Gradescope.

Formatting your submission

  • Your answers will consist of text for Problems 1-10 and a combination of text and graphics for Problems 11-19.
  • Your final submission should be a PDF file. You can use any software you want to create your submission. You can use the following template for Google Docs.
    • If you'd like to use the Google Docs template locally (e.g., with Microsoft Word), open the template and click on File > Download as > Microsoft Word (.docx).
    • If you'd like to use the template through Google Docs, open the template and click on File > Make a copy. In the dialog box, rename the document and ensure "Share it with the same people" is not checked. The template should now be copied to and editable on your Google Drive account.
  • We expect each question to be on its own page and properly tagged on Gradescope, mistagged pages or submissions on the same page will result in a 5 point deduction.
  • To add graphics to your document, you can either take a screenshot of the chart or copy the chart directly from Google Sheets. To do so, click on your chart, click the three dots at the top right corner of the chart, and select "Copy chart" from the menu.

Submitting to Gradescope

  1. Save your final document as a PDF and check it over. Make sure that each answer is on its own page. If you mistag questions or answer multiple on the same page, you will receive a 4-point deduction.
  2. Log into the Gradescope website (gradescope.com) using your Stanford email address. If you haven't used Gradescope before, make an account using your Stanford email and full name. Gradescope auto-syncs with Canvas, which syncs from Axess. So, as along as you are enrolled in the course, you should find CS102 in your Gradescope dashboard.
  3. Click on the CS 102 class and click on Assignment 1.
  4. Upload your PDF and tag the pages corresponding to each question.
  5. You're done! You may submit as many times as you like before the submission deadline, and we will use your latest submission both for grading and for the late policy.

For more detailed instructions on submitting homework, take a look at the Gradescope FAQ.



Part 1: Spreadsheet Data Analysis (2 points per problem)

For each of these problems, submit an answer obtained by performing spreadsheet operations on the provided data and a description of how you manipulated the spreadsheet(s) to obtain your answer. We do not require that you write down the exact formula, just describe the procedure and functions used to come to your answer.

World Cup Data

Problem 1. Which team has the highest ratio of goalsFor/goalsAgainst?

Problem 2. What is the average number of passes made by defenders? By forwards?

Problem 3. What player on a team with "ia" in the team name played less than 200 minutes and made more than 100 passes?

Problem 4. Which team has the highest average number of passes per minute played (and what is that average)?

Problem 5. How many players on a team ranked under 10 (i.e. ranks 1 to 9) played more than 350 minutes?

Titanic Data

Problem 6. What characteristics are shared by all passengers whose fare is 0?

Problem 7. How many married women over age 50 embarked in Cherbourg? (Married women are denoted by "Mrs.")

Problem 8. Which embarkation city had the highest-paying passengers on average?

Problem 9. What is the most common last name among passengers? What is the average number of passengers per last name?

Problem 10. What's the survival rate for passengers in the three different classes, i.e., what fraction of passengers in each class survived? Find the answer using spreadsheet functions only - don't perform any arithmetic by hand!


Part 2: Data Visualization (2 points per problem)

For each of these problems, submit a graphic: a bar graph, pie chart, scatterplot, or other visualization. You may either copy and paste visualizations from Google Sheets into your document, or do a screenshot - either is fine. Also include a description of what the graphic is showing and how you used the data to create it.

World Cup Data

Problem 11. Create a bar chart showing the average number of minutes played by players in each position.

Problem 12. Create a stacked bar chart showing teams that played more than 4 games, with their total number of games divided into wins, draws, and losses.

Problem 13. Create a pie chart showing the relative percentage of teams with 0, 1, and 2 red cards.

Problem 14. Create a scatterplot of players showing passes (x-axis) versus minutes (y-axis). (If you know anything about the World Cup you might think about why there are lines of dots.)

Problem 15. Create a map of countries colored light to dark blue based on how many goals their team made ("goalsFor").

Titanic Data

Problem 16. Create a bar chart showing the average fare paid by passengers in each class. The three bars should be labeled "first", "second", "third".

Problem 17. Create a stacked bar chart showing the number of passengers in each class, divided into male and female (three bars). Then reverse roles and show the number of passengers of each gender, divided into class (two bars).

Problem 18. Create a pie chart showing the relative number of male survivors, male non-survivors, female survivors, and female non-survivors (four slices).

Problem 19. Let "youth" denote passengers whose age is under 18, "adult" denote passengers age 18-59, and "senior" denote passengers whose age is 60 and above. Create a pie chart with four slices showing the relative number of youth, adult, senior, and those whose age is unknown. Hint: consider using function =ifs().


FAQ


Why does my MATCH function return unexpected value?
TLDR: If the column in match() is not sorted, you need add a third parameter 0 to the match function.
Long answer: The MATCH function takes an argument match_type. The default value is 1, which means MATCH will find the largest value that is less than or equal to lookup_value and the values in the lookup_array argument must be placed in ascending order. Is that what we want? Check out the documentation here for more details.


How to avoid the blank line in my pivot table?
Instead of selecting all the cells using the top left button, use Cmd+A (Mac OS) or Ctrl + A to select all the data cell for the pivot table. Make sure your cursor is in the data cell when you select them.