1. Look at the source data in Excel
2. Start EG
3. Name the process flow Island Reviews
4. Add a note to the project to indicate the details on the source file
5. Import the location sheet (call the dataset islands)
6. Make a new dataset called cntLocation this will have the island chains that have more than 3 islands
a. Add the location variable
b. Add the location variable again
c. Set the second variable to summarize with the count
d. Run the query
e. Modify the query so that the filter on the summarized data is set to require more than 3 islands in the chain.
7. Import the rating sheet which in Excel is called Sheet3 (call the dataset rating)
8. Notice what variable has the island name in the islands dataset and rating dataset
9. Make a dataset called rating2 based on rating that changes the spelling of St. Maarten to be St. Maarten/St. Martin the tweaked island name variable should be called island
10. Add the island chain location to the rating2 file (call the dataset ratingChain). Only keep the ratings if you know the island chain.
a. The resulting dataset should rename the island chain variable as chain
11. Use and except clause in SQL to who is in the rating2 but not in the ratingsChain file. The resulting dataset should be called extraRating2
12. Take the SQL from step 10 and tweak it to run off of islands and to use the left aligned version of the island names. Have it produce a dataset called matched.
13. Add the average rating of beaches scenery and shopping to the matched dataset. Call it analysis
14. Import the Airport sheet into a dataset called airport
15. Select the island names and airport codes from island in chains with more than 3 islands the dataset should be called big chain airports