ED161 Winer 2000 Start-up Problem Solutions --------- note: Alex drafted these solutions and his method of obtaining the data for these two schools is elegant. More basic method is accessing the data via browser or from command-line leland display and then you can cut-and-paste the data sets to your desktop. some key things to note: 1. the use of the "read" command which you will also see in the various course examples. If you had only cut-and-pasted the data for the two schools you would be reading in much smaller data sets. 2. The use of the Manip...Subset Worksheet menu item to select out these schools is quite elegant (and something I hadn't thought of). drr --------- These problems were done on Minitab 12.1. After enabling the command language From the 'Editor' menu, the code and output presented below was either typed directly or created through using the dialouge boxes. The commands are included so you can reproduce the output and learn the simplicity of the command lagauge. After FTPing the data files to my computer, I used the File...Other Files...Special text menu options to import the data into Minitab. This is translated by the program into the command language below. Some may find it easier just to type the commands at the prompt. MTB > Read "C:\My Documents\ED161\Hsb1.dat" c1-c5; SUBC> Decimal ".". Entering data from file: C:\My Documents\ED161\Hsb1.dat 7185 rows read. MTB > Read "C:\My Documents\ED161\Hsb2.dat" c1-c6; SUBC> Decimal ".". Entering data from file: C:\My Documents\ED161\Hsb2.dat 160 rows read. Problem 1) First, look at HSB2.dat to determine which school is the first public school = 0, and which is the first Catholic school = 1. These turn out to be schools with ID# 1224 and 1308 respectively. There are many ways to just get the appropriate subsets of data to work with. I used the handy Manip...Subset Worksheet menu item that produced separate datasets for each of the schools. For example, this is the code to get the public school 1224 worksheet: Current worksheet: hsbstudent.MTW MTB > Subset; SUBC> Where "ID=1224"; SUBC> Name "Subset of hsbstudent.MTW"; SUBC> NoMatrices; SUBC> NoConstants; SUBC> Include. Subset worksheet 'Subset of hsbstudent.MTW' created. Now we are ready to do the problems. With the public school subset the active worksheet simply type the follwing commands to get a stem and leaf of math achievement MTB > Stem-and-Leaf 'mathach'. Character Stem-and-Leaf Display Stem-and-leaf of mathach N = 47 Leaf Unit = 1.0 1 -0 2 2 -0 1 7 0 00001 13 0 222233 18 0 44455 23 0 66666 (5) 0 88999 19 1 01 17 1 33 15 1 4 14 1 6667 10 1 99 8 2 0000111 1 2 3 You could also do a boxplot with this command: MTB > GStd. * NOTE * Character graphs are obsolete. MTB > BoxPlot 'mathach'. Boxplot --------------------------- -------------I + I-------------- --------------------------- --------+---------+---------+---------+---------+--------mathach 0.0 5.0 10.0 15.0 20.0 Now we do the same for the Catholic School. Current worksheet: Subset of hsbstudent.MTW[W3] MTB > Stem-and-Leaf 'mathach'. Character Stem-and-Leaf Display Stem-and-leaf of mathach N = 20 Leaf Unit = 1.0 1 0 2 1 0 2 0 6 3 0 9 4 1 0 8 1 3333 10 1 55 10 1 667 7 1 7 2 11 5 2 2233 1 2 4 MTB > GStd. MTB > BoxPlot 'mathach'. Boxplot ------------------ ----------------------I + I------ ------------------ +---------+---------+---------+---------+---------+------mathach 0.0 5.0 10.0 15.0 20.0 25.0 What can we say about these plots? First, school 1224 is slightly positively skewed, with a median about 9. School 1308 has a median about 16. School 1308 appears to be and less variable than school 1224. Both are somewhat bimodal. Problem 2) To create a numerical discriptive summar of our two subsets, the 'describe' command is handy. For school 1224, we get this: MTB > Describe 'mathach'. Descriptive Statistics Variable N Mean Median TrMean StDev SE Mean mathach 47 9.72 8.30 9.67 7.59 1.11 Variable Minimum Maximum Q1 Q3 mathach -2.83 23.58 3.15 16.41 And for school 1308, we get this: MTB > Describe 'mathach'. Descriptive Statistics Variable N Mean Median TrMean StDev SE Mean mathach 20 16.26 16.02 16.53 6.11 1.37 Variable Minimum Maximum Q1 Q3 mathach 2.51 24.99 13.36 22.17 You can also get this with the Stat...Basic Stats....Display discriptive Statistics menu option.What do these summaries tell us? It looks like school 1224 has lower measures of central tendancy, and slightly more variablity than school 1308. It is interesting to note that some of the students in 1224 had negative math achievement scores (possible typos [or odd coding]??). Because there were over twice as many observations in school 1224 as in 1308, the se of the mean is smaller in school 1224. Problem 3) For this problem, we want the data for the 2 schools to be in the same data set. This can be done by cutting and pasting, or by creating another subset from the big dataset. ---------------- note: you can use the "stack" command (or from menu) to create 1 column containing mathach and another the school indicator) MTB > Stack C5 C15 c25; SUBC> Subscripts c26. ------------------ We want to do a 2 sample t-test and get a .95CI for the difference in sample means. This can be done with the following commands: MTB > TwoT 95.0 'mathach' 'ID'; SUBC> Alternative 0. Two Sample T-Test and Confidence Interval Two sample T for mathach ID N Mean StDev SE Mean 1224 47 9.72 7.59 1.1 1308 20 16.26 6.11 1.4 95% CI for mu (1224) - mu (1308): ( -10.1, -3.0) T-Test mu (1224) = mu (1308) (vs not =): T = -3.72 P = 0.0006 DF = 44 Problem 4) Again, separate data sets are useful for this problem. For each school, we want a simple plot of mathach against SES, a sample correlation coefficient, and value for beta, the regression coefficient. For school 1224: MTB > GStd. MTB > Plot 'mathach' 'ses'; SUBC> Symbol 'x'. Plot mathach - - x - x 2 20+ x x xx x x - x - x x x - x xx - x 10+ x x x x - x x - x x x x x x x - x x xx x - x x x x x 0+ x x2 - x x - - ------+---------+---------+---------+---------+---------+ses -1.50 -1.00 -0.50 0.00 0.50 1.00 MTB > Correlation 'ses' 'mathach'; NoPValues. Correlations (Pearson) Correlation of ses and mathach = 0.207 Stop. Worksheet size: 100000 cells Retrieving project from file: C:\My Documents\ED161\ed160introexcercise.MPJ MTB > Regress 'mathach' 1 'ses'; SUBC> Constant; SUBC> Brief 2. Regression Analysis The regression equation is mathach = 10.8 + 2.51 ses Predictor Coef StDev T P Constant 10.805 1.337 8.08 0.000 ses 2.509 1.765 1.42 0.162 S = 7.510 R-Sq = 4.3% R-Sq(adj) = 2.2% Analysis of Variance Source DF SS MS F P Regression 1 113.90 113.90 2.02 0.162 Residual Error 45 2538.01 56.40 Total 46 2651.92 Unusual Observations Obs ses mathach Fit StDev Fit Residual St Resid 24 0.97 2.06 13.24 2.71 -11.18 -1.60 X X denotes an observation whose X value gives it large influence. For school 1308, we get the following: Current worksheet: Subset of hsbstudent.MTW[W3] MTB > GStd. * NOTE * Character graphs are obsolete. MTB > Plot 'mathach' 'ses'; SUBC> Symbol 'x'. Plot - x 24.0+ x x - x x mathach - x x - - x 16.0+ x xx x - x x x - x - x - x 8.0+ - x - - x - 0.0+ --------+---------+---------+---------+---------+--------ses -0.40 0.00 0.40 0.80 1.20 MTB > GPro. MTB > Correlation 'ses' 'mathach'; SUBC> NoPValues. Correlations (Pearson) Correlation of ses and mathach = 0.010 MTB > Regress 'mathach' 1 'ses'; SUBC> Constant; SUBC> Brief 2. Regression Analysis The regression equation is mathach = 16.2 + 0.13 ses Predictor Coef StDev T P Constant 16.189 2.118 7.64 0.000 ses 0.126 3.003 0.04 0.967 S = 6.281 R-Sq = 0.0% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression 1 0.07 0.07 0.00 0.967 Residual Error 18 710.23 39.46 Total 19 710.30 Unusual Observations Obs ses mathach Fit StDev Fit Residual St Resid 13 0.10 2.51 16.20 1.90 -13.69 -2.29R 20 -0.57 21.12 16.12 3.58 5.00 0.97 X R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence. School 1224 (public) has a greater slope value (2.51 vs.13) for the regression of math on SES than school 1308 (catholic). --------------- Do catholic schools do a better job of recucing class inequality? drr ---------------------- &