From regina@u.washington.edu Thu May 6 14:26:44 1993 Return-Path: Received: from carson.u.washington.edu by scrp.Stanford.EDU (4.1/inc-1.0) id AA00421; Thu, 6 May 93 14:26:38 PDT Received: by carson.u.washington.edu (5.65/UW-NDC Revision: 2.22 ) id AA04611; Thu, 6 May 93 14:26:38 -0700 Date: Thu, 6 May 1993 14:24:27 -0700 (PDT) From: Regina Rushe Sender: Regina Rushe greetings. john asked me to email a file to you, but when he gave the floppy he also gave me your FAX to him. so, i'll pop the floppy in the mail today with the original wordstar 6.0d file, and the readme and ascii files i created. you should get it within the week. i've also taken the liberty to email you the ascii verson in this message. alternatively, i could ftp everything in binary mode, but you would have to tell me a machine name, login and password. here at the u washington, we can also send email with mime attachments, but i don't know if you can use that standard -- i've used it locally with no problems. Regina Rushe regina@u.washington.edu University of Washington Department of Psychology NI-25 Seattle WA 98195 phone: 206/685-3110 ========================================================================== Myths about Longitudinal Research* David Rogosa This chapter is concerned with methods for the analysis of longitudinal data. Longitudinal research in the behavioral and social sciences has been dominat- ed, for the past 50 years or more, by a collection of damaging myths and misunderstandings. The development and application of useful methods for the analysis of longitudinal data have been impeded by these myths. In debunking these myths the chapter seeks to convey "right thinking" about longitudinal research; in particular, productive statistical analyses require the identifi- cation of sensible research questions, appropriate statistical models, and unambiguous quantities to be estimated. The heroes of this chapter are sta- tistical models for collections of individual growth (learning) curves. The myths to be discussed are: 1. Two observations a longitudinal study make. 2. The difference score is intrinsically unreliable and unfair. 3. You can determine from the correlation matrix for the longitudinal data whether or not you are measuring the same thing over time. 4. The correlation between change and initial status is (a) negative (b) zero (c) positive (d) all of the above 5. You can't avoid regression toward the mean. 6. Residual change cures what ails the difference score. 7. Analyses of covariance matrices inform about change. 8. Stability coefficients estimate (a) the consistency over time of an individual (b) the consistency over time of an average individual (c) the consistency over time of individual differences (d) none of the above (e) some of the above 9. Casual analyses support causal inferences about reciprocal effects. The most prevalent type of longitudinal data in the bessshavioral and social sciences is longitudinal data in the behavioral and social sciences is longitudinal panel data. Longitudinal panel data consist of observations on many individual cases (persons) on relatively few (two or more) occasions (waves) of observation. An observation on a variable X at time t, for indi- vidual rho is written as Xi rho where i=1,..., T, and rho=1,...,n. (For sta- tistical methods based on individual growth curves, observations need not be made at the same times for all individuals. But as this is necessary for the standard methods that predominate in the behavioral and social sciences, in my examples all individuals have the same values of ti, which means everyone is measured at the same times.) The Xi rho are presumed to be composed of a true score xi rho(ti) and an error of measurement xi i rho according t the classical test theory model: Xi rho=xi rho(ti)+xi i rho. Many of the examples are in terms of the xi rho(ti) and thus assume good measurement. The justification is that perfect measurement is clearly not attractive. Estimation of individual growth curves is not jeopardized by the presence of measurement error within reasonable bounds, but measurement errors cause more severe problems for methods based on the covariance matrix of the Xi (e.g., regression-based procedures). The individual growth curves are functions of true score over time, xi rho(t). Research questions about growth, development, learning, and the like center on the systematic change in an attribute over time, and thus the indi- vidual growth curves are the natural foundation for modeling the longitudinal data. The growth curve models are kept relatively simple because the basic ideas and approaches remain valid for more complex growth models. The sim- plest and most widely used example will be straight-line growth, which speci- fies a constant rate of change denoted by theta. A second growth curve exam- ple is exponential growth to an asymptote. The straight-line growth curve for individual rho is written: xi rho(t)=xi rho(0)+theta rhot. A collection of straight-line growth curves is shown in Figure 5-1; the indi- vidual growth curves have different values of rate of change O rho and level xi rho(0). The value of the growth curve at a discrete time t, yields the xi rho(ti) and the Xi rho are formed by the addition of measurement error. (In particular, for the many examples based on the collection of growth curves in Figure 5-1 the numerical values are obtained for a population of growth curves illustrated by the 15 growth curves in Figure 5-1, not for a sample or popula- tion of size 15.) In some variables, such as attitudinal measures, the volatility over time may be far more important in the data than a systemic trend. The myth about stability over occasions will address this, using measures of consisten- cy over time based on growth curve models. The discussion of each myth is based on simple numerical examples, using either the xi rho(ti) or Xi rho. Although these examples are construct- ed to illustrate a particular message, each message is supported by technical results from my papers on statistical methods for the analysis of longitudinal data. This chapter is intended to serve as a less formal, and more accessi- ble, exposition of the key ideas in those publications. In fact, the exposi- tion deliberately avoids the presentation of mathematical results; citations throughout the text and the "reference notes" section at the end of each myth locate the relevant technical presentations. A partial listing of the papers that serve as primary sources for this chapter are: Rogosa, D.R. (1980). A critique of cross-lagged correlation. Psychological Bulletin, 88, 245-258. Rogosa, D.R. (1985). Analysis of reciprocal effects. In T. Husen & N. Post- lethwaite (Eds.), International Encyclopedia of Education (pp. 4221- 4225). London: Pergamon Press. Rogosa, D.R. (1987). Causal models do not support scientific conclusions. Journal of Educational Statistics, 12, 185-195. Rogosa, D.R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92, 726-748. Rogosa, D.R., Floden, R. E., & Willett, J. B. (1984). Assessing the stability of teacher behavior. journal of Educational Psychology, 76, 1000-1027. Rogosa, D.R., & Willett, J.B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335-343. Rogosa, D.R., & Willett, J.B. (1985a). Satisfying a simplex structure is sim- pler than it should be. Journal of Educational Statistics, 10, 99-107. Rogosa, D.R., & Willett, J.B. (1985b). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228. MYTH 1: TWO OBSERVATIONS A LONGITUDINAL STUDY MAKE Strictly speaking, two repeated observations do constitute a longitudinal study. A more exact statement of the myth would be that two observations are presumed to be adequate for studying change. This misunderstanding is in- spired by the dominance of pre-test, post-test longitudinal designs in the methodological and empirical work of the behavioral and social sciences. Two observations do provide some information about change over time, but this design has many critical limitations. In Rogosa et al. (1982, p. 744), I expressed this by the motto: "Two waves of data are better than one, but maybe not much better." Longitudinal designs with only two observations may address some research questions marginally well-but many others rather poorly. Two Observations Permit Estimation of the Amount of Change but Not of the Individual Growth Curve Strictly speaking, two repeated observations do constitute a longitudinal study. A more exact statement of the myth would be that two observations are presumed to be adequate for studying change. This misunderstanding is in- spired by the dominance of pre-test, post-test longitudinal designs in the methodological and empirical work of the behavioral and social sciences. Two observations do provide some information about change over time, but this design has many critical limitations. In Rogosa et al. (1982, p. 744), I expressed this by the motto: "Two waves of data are better than one, but maybe not much better." Longitudinal designs with only two observations may address some research questions marginally well-but many others rather poorly. Two Observations Permit Estimation of the Amount of Change but Not of the Individual Growth Curve Consider two observations on true score xi for a single individual plotted against time; that is, time is on the horizontal axis and true score is on the vertical axis. With just two observations over time, what can be learned about an individual? Although it is statistically shaky, a growth curve can be fit to two points in time. A straight line passing through the two points is the most complex functional form that can be fit. Even then, the data contain no information on the adequacy of the straight-line functional form for growth or on the amount of scatter n the data. Furthermore, two points in time provide no basis for distinguishing among alternative growth curves; for example, a variety of exponential or logistic growth curve were known (e.g., exponential), two observations are not sufficient to provide any estimates of the parameters of the growth curve. Although the investigation of the func- tional form of growth will offer require far more than two points in time, two observations do allow estimation of the amount of change between t1 and t2. These remarks are obvious, and this discussion would be of little import if it were not for the preponderance of two-wave panel designs in methodological discussions and empirical studies of change and development. The formulation of Coleman (1968) founded an alternative tradition for the study of change, mainly among sociologists. In this formulation the parameters of the growth function do not differ over individuals. This tradi- tion assumes that "the process is identical for all persons" (p.437) and allows the estimation of complex growth curves (e.g., exponential, logistic) where "the data may be two waves of a panel with two observations on many individuals or many observations on the same individual" (p.432). Additional examples of this tradition are Nielson and Rosenfeld (1981), Salemi and Tauchen (1982), and Tuma and Hannan (1984, chap. 11). In order to estimate complex growth curves from only two observations on each individual, observa- tions from many individuals must be "combined" into a single growth curve. Individual differences in growth pre clude the validity of this approach unless some exogenous individual characteristics can be used to completely account for the individual differences. That is, violations of the assumption that the parameters of the growth function are the same for all individuals can be extremely consequential. The Amount of Change Will Often Be Deceptive The amount of change over a specified time interval is a natural quantity to estimate from longitudinal data. Define delta rho(t, t+c)=xi rho(t+c)-xi rho(t) as the amount of true change for individual rho over the time interval starting at time t and extending c units. For straight-line growth, delta rho(t,t=c)=theta rho c. The amount of change between times t and time t+c depends on t for growth curves having a nonconstant rate of change (i.e., growth curves other than straight-line) and will often be a complex function of t and c. Thus, in a two-wave study, choices of time 1-time 2 measurements are likely to be especially deceptive in comparing growth among individuals because observations over alternative time intervals may yield contradictory information. Below is an example showing that the amount of change is no guide to individual differences in growth. Consider a collection of six individual growth curves, for individuals labeled A, B, C, D, E, F. Each growth curve has the form of exponential growth toward a ceiling or asymptote governed by the equation xi rho(ti)=lambda rho-(lambda rho-xi rho(0))e-gamma rhoti Table 5-1 gives the parameter values for these six growth curves. The indi- viduals differ on the asymptote lambda, on the starting level xi(0), and on the curvature parameter gamma. These growth curves also produce individual differences in the amount of change. Table 5-2 presents the amount of change delta (tI, tI+1) for individuals A, B, C, D, E, F for initial observation at time tI and final observation at tI+1, with tI=0,4,10. For tI=0 the individu- al ranking on delta is A, B, C, D, E, F (A improves the most in the interval [0,1], B the next most, C the next, and so on), with the largest delta nearly double the smallest. If instead tI=10, the ranking for the amount of change is reversed, with the largest delta nearly three times the smallest. So two different studies might obtain exactly the opposite results for individual differences in change depending on the choice of initial time of measurement. Furthermore, for tI=4, the delta values are nearly equal (smaller individual differences in change) with yet a different ranking of individuals. The reversals of individual standing on the amount of change may be most consequential for studies of the correlation of change with an exogenous background variable. Such a correlation might be found to be big, positive for a study using tI=0; big, negative for tI=10; and about zero for tI=4, even if all three studies had perfect measurement. The example above illustrates the danger of characterizing the growth of individuals by the amount of change over a specific time interval. Even with perfect measurement, the pre-post longitudinal design provides meager information. Two-wave designs permit at best the study of individual differ- ences in delta or, equivalently, in some sort of average rate of change. consequently, designs with only two observations are usually inadequate for the study of individual growth and individual differences in growth. Reference Notes The limitations of two-wave designs for the measurement of change are examined in Rogosa et al. (1982) and Rogosa and Willett (1985b). Mathematical results corresponding to the example in Table 5-1 are given in Section 1.4 of Rogosa and Willett (1985b). The advantages of multiwave data for the estimation of individual change are enumerated in Rogosa et al. (1982, pp. 741-743). MYTH 2: THE DIFFERENCE SCORE IS INTRINSICALLY UNRELIABLE AND UNFAIR An impressive amount of psychometric literature over the last 50 years has sought to demonstrate deficiencies in the difference score. with only two observations, the difference score, D=X2-X1, is a natural estimate of the amount of true change, delta (t1, t2), regardless of the form of the growth curve. For a straight-line growth curve model the difference score estimates the (constant) rate of change times the time interval. In general, the dif- ference score divided by the time elapsed estimates an average rate of change over the time interval. Unreliability of the Difference Score The traditional tabulation of the reliability of the difference score is shown by Table 5.3, which also appears in Linn and Slinde (1977) and in various forms in many other publications. The pre-test-, post-test correlation of observed scores and the reliability of observed scores look reasonable, and for most combinations the difference score has little reliability. This type of numerical demonstration supports the assertions by Lord that "differences between scores tend to be much more unreliable than either" (1963, p.32). The untold story is the limited and constrained nature of this table. The table employs the constraints of equal reliabilities rho(X1)= rho(X2)= rho(X) and equal variances sigma2x1=sigma2x2=sigma2x for the fallible observed scores, X1and X2. Also, these constraints imply equal true-score variances at times 1 and 2 and also a negative value of rhoxi1delta, the correlation bet- ween true change and true initial status. The most prominent feature of Table 5-3 is that the time 1-time 2 true-score correlation rho xi1 xi2 is very large in almost all regions; this can be seen from the standard disattenuation formula rho xi1 xi2= rho x1x2/ rho(x). In particular, rho xi1 xi2 is 1.0 along the diagonal of zero reli- ability for the difference score. What are the implications for individual growth of the table's restriction to this small portion of the parameter space? A collection of growth curves that exhibit high time 1-time 2 correla- tion and equal variances at times 1 and 2 will have all the growth curves nearly parallel. Thus all individuals are growing at nearly the same rate which translates into almost no individual differences in true change. (Figure 1 of Rogosa et al., 1982, shows such a collection of straight-line growth curves with time 1-time 2 correlation about .95.) If there are no individual differences in true change, the difference score cannot be expected to detect them. So after building into the traditional tabulations the con- straints that there be almost no individual differences in growth, the low reliability of the difference score should be no surprise. If, instead, a moderate correlation, rho xi1 xi2 = .4, is used in conjunction with the other constraints in Table 5-3, the difference score appears much stronger. The quantity rho (D)/ rho (X) values .7, .8, and .9, respectively. Thus, even with the other constraints the difference score is nearly as reliable as the measure X. A moderate time 1-time 2 correlation corresponds to numerous crossings of the growth curves and considerable indi- vidual differences in change. (Rogosa et al. 1982, Figure 2, shows a time 1- time 2 correlation of about .5 for a collection of straight-line growth curves.) Table 5-4 presents a slightly different tabulation of the reliability of the difference score in terms of the time 1-time 2 true-score correlation and the reliability of X. The reliability of X2 is set to .9, and the reli- ability of X1 is varied. (Setting rho(X2)> rho(X1) maintains approximately equal error variances at times 1 and 2.) The correlation between true change and initial status is set to zero, which is a useful benchmark case, also known as the Overlap Hypothesis. For these parameter values, the difference score is more reliable than the average reliability of the measures. Even for a high correlation, the difference score does rather well compared to reli- ability of X, and in absolute terms, rho(D) is also substantial. In sum, when there are individual differences in change, the difference score has decent reliability. The message that debunks this myth is that the difference score is reliable when individual differences in true change exist. After all, the reliability of the difference score is the variance of true change divided by the sum of the variance of true change and the variance of the difference of the errors. For parameter configurations that require all individuals to grow at about the same rate, the low reliability of the difference score properly reveals that you can't detect individual that ain't there. Unfairness of the Difference Score The belief that the difference score is somehow not a "fair" measure of change is reflected in the statements that difference scores "give an advantage to persons with certain values of the pretest score" (Linn & Slinde, 1977, p. 125) and "the correlation between change and initial status made it inappro- priate to use change [difference] scores to evaluate individuals with differ- ent initial scores" (O'Connor, 1972, p. 78). The difference score is an unbiased estimate of true change. How can an unbiased estimate be inequi- table? That is a question to which I have no answer. The confusion is bound up with misunderstandings about the correlation between change and initial status and with misguided motivations for the use of residual change measures. These will be untangled in subsequent myths. Reference notes A presentation of the reliability of the difference score in terms of individ- ual differences in growth is given in Rogosa et al. (1982, pp. 731-734). The nontechnical exposition of Rogosa and Willett (1983b) provides numerical examples demonstrating the reliability of the difference score when individual differences in growth exist. Statistical properties of D rho for estimation delta rho are described in Rogosa et al. (1982); in particular, the construc- tion and properties of "improved difference score" (Kelley-type, Lord-McNemar, and empirical Bayes) estimates, which use information from all n individuals in the estimation of delta rho are examined in detail in Rogosa et al. (1982, pp. 735-738, 742-743, and the Appendix). MYTH 3: YOU CAN DETERMINE FROM THE CORRELATION MATRIX FOR THE LONGITUDINAL DATA WHETHER OR NOT YOU ARE MEASURING THE SAME THING OVER TIME A typical statement of the third myth is that with low correlations over time "it is questionable whether one is measuring the same thing on both occasions, and consequently the notion of change becomes questionable" (Bond, 1979). A very serious question in studies of development (whether it be in early child development or later in the aging process) is whether measures "change out from under you" in the sense of measuring something different on different occasions of observation. The important issue is whether asking about quanti- tative change in the measures over time is meaningful. The assumption that the psychological variable or dimension being studied retains the same meaning over the occasions of observation is a logical prerequisite for the measure- ment of quantitative change. This view is reflected by Lord (1958), who discussed an instructional setting in which "the test no longer measures the same thing when given after instruction. If this is asserted, then the pre- test and posttest are measuring different dimensions and no amount of statis- tical manipulation will produce a measure of gain or of growth" (p. 440). Similarly, Bereiter (1963) wrote: "Once it is allowed that the pretest and posttest measure different things it becomes embarrassing to talk about change. There seems no longer any way to answer the question, change on what?" (p. 11). (See also Cronbach & Furby, 1970, p. 76; Linn & Slinde, 1977, p. 24; Lord, 1963, p. 21). In many situations these concerns may preclude the study of quantita- tive change. Nonetheless, valid and answerable questions about change should be pursued. Thus, the myth addresses a very important consideration; the misunderstanding is in thinking that this issue can be resolved by the bet- ween-wave correlation matrix. The truth is that much more and very different information may be required to resolve this issue. Consider the picture of a collection of straight-line growth curves in Figure 5-1. Table 5-5 presents the corresponding correlation matrix, with entries of the correlation between xi i and xi rho for ti, t rho = 0,1...,8. Now, between times 5 and 7 the correlation between true scores is very big, .94; even with some measurement error there would be a healthy correla- tion. Should we "conclude" that the same thing is being measured over this time interval? If, instead, the interval is from time 1 to 5, the correlation is .385. Should this correlation be taken to indicate that different things are being measured at times 1 and 5? Furthermore, for the interval with end points at times 1 and 7 (the concatenation of the two time intervals above) the correlation is .056. Are unrelated quantities being measured at times 1 and 7? According to the myth, the above three questions receive affirmative answers. Furthermore, the correlation between times 0 and 8 is -.24; should this correlation be taken to indicate that opposite attributes are being measured at times 0 and 8? The correlations in Table 5-5 correspond to the collection of straight-line growth curves in Figure 5-1. As each individual has a constant rate of change on the attribute xi, it is hard to imagine a configuration of individual growth that shows less discontinuity. Clearly, a way of thinking that indicates that different things are measured by xi i and xi i, has deep flaws. In the same vein, large correlations cannot "prove" that the same thing is being measured at both ends of the observation interval, only that the ordering of individuals in the initial measure is similar to the ordering of individuals in the final measure. Whether or not the same thing is being measured over time simply cannot be answered from the correlation matrix on a couple of occasions of measurement, and it is dangerous to do so. Even plot- ting the individual growth curves cannot completely resolve this question, although large discontinuities in individual growth would be cause for con- cern. A sidenote message to this myth is that large individual differences in growth lower the between-wave correlations. Myth 3 serves to discourage the study of change for variables that have sizable individual differences in growth on the grounds that these variables do not retain the same meaning over time. Thus, variables that are chosen for study have high time 1-time 2 correlations, which often results in low rho2 delta (i.e., not much individual differences in change). In reference to Myth 2, if there are little individu- al differences in change, what will the difference score show? Low reliabili- ty. Reference notes The results of Rogosa and Willett (1985b) can be used to obtain the between- wave covariance and correlation functions for different forms of individual growth; the results for straight-line growth were used in constructing the example in Table 5-5. Rogosa et al. (1982, pp. 731-733) discuss the conse- quences for the reliability of the difference score of limiting studies of change to variables with high between-wave correlations (stability). MYTH 4: THE CORRELATION BETWEEN CHANGE AND INITIAL STATUS IS (a) negative (b) zero (c) positive (d) all of the above Myth 4 is a multiple choice myth whose distractors have long-standing substan- tive interpretations. A negative correlation between change and initial status is best known as the Law of Initial Values (Lacey & Lacey, 1962; Wild- er, 1957). The negative correlation is also bound up with Regression Toward the Mean, as will be seen in Myth 5. A zero correlation between change in initial status is known as the Overlap Hypothesis, which dates back to Ander- son (1939) an was prominent in Bloom (1964). One interpretation of the Over- lap Hypothesis is that growth occurs via independent increments (similar to the formulation of simplex models in Humphreys, 1960). A positive correlation between change and initial status corresponds to "fanspread" where variances increase over time. The positive correlation can be described as "them that has, gets". The correct answer is (d), "all of the above," because the correlation between change and initial status depends crucially on the choice of tI, the time at which initial status is measured. For straight-line growth, the correlation between change and initial status is monotonically increasing, having a lower asymptote of -1.0 for tI=infinity. For almost any collection of growth curves, a very different correlation between true change and true initial status will be obtained, depending on whether the time of initial status is chosen to be later, earlier, or in between-a likely reason that studies of academic growth obtain disparate estimates of the correlation between true change and true initial status. One sidenote to the myth is that with fallible scores, the correlation between observed change and observed initial status is a poor estimate of the correlation between true change and true initial status. The estimate is negatively biased in addition to the attenuation (see, e.g., Rogosa et al., 1982, Eq. 11). Thus, because of the poor properties of this estimate, nega- tive correlations between observed change and observed initial status are often obtained when the true-score correlation is zero or positive. The myth is stated and discussed in terms of true scores because these are of primary substantive interest; although of less interest, a similar dependence on time of initial status also holds for the observed score correlation. Table 5-6 gives values of the correlation between the amount of true change delta(tI, tI+c) and true initial status xi(tI) for tI=0,..., 7, using the collection of straight-line growth curves for true scores shown in Figure 5-1. The correlation does not depend on c. For each choice of tI, a differ- ent value for the correlation between change and initial status will be ob- tained. In this example, if initial status is chosen to be time 1, the corre- lation is big and negative. If initial status is time 3, the correlation is zero. And if initial status is time 5, the correlation is positive. Time 3 is the only time of initial status that would satisfy Anderson's Overlap Hypothesis. The Law of Initial Values would be satisfied for any tI < 3. Table 5-6 also gives values of the correlation between observed ini- tial status Xi and observed change Xi+c - Xi for c = 1,3. The Xi are based on the xi(ti) for this example, with the addition of measurement error (having equal error variance over the ti), producing reliabilities of the Xi between .74 and .87. The difference between the c = 1 and c = 3 values is attributable to the larger reliability of the difference score for c = 3; except for tI = 2, the c = 3 observed score correlation is closer to the true- score correlations is somewhat complex. For tI > or equal to 2 the observed- score correlation is always less than the true-score correlation, especially for non-negative values of the true-score correlation, the attenuation and negative bias in the observed-score correlation may offset each other. Table 5-7 repeats the example for a different type of growth curve: exponential growth to an asymptote lambda instead of straight-line growth. This collection of growth curves is illustrated in Figure 5-2. The exponen- tial growth curves have the form of equation (5.2) with gamma rho = gamma. This collection of growth curves was constructed to have a between-wave corre- lation structure similar to that for the straight-line growth example (with a translation of the time scale by 3 units). The correlation between change and initial status is monotone increasing in tI, and like straight-line growth the correlation is no longer symmetric about the zero value, which for this exam- ple is tI = 6. Reference Notes Mathematical results for the form of rho xi (t) delta are obtained in Rogosa and Willett (1985b) for straight-line growth, exponential growth, and the simplex model (Eqs. 9, 16, and 13, respectively). In terms of the notation and parameters of Rogosa and Willett (1985b), for the straight-line growth example the parameter specifications are t omicron = 3, kappa = 3. for the exponential growth example in Figure 5-2 and Table 5-7, the parameter specifications are t omicron = 6; gamma rho = gamma = .23; mu lambda = 30; rho squared lambda = 1.4, and rho squared xi (6) = .437. Rogosa and Willett (1985b) also obtain the form of the regression of change on ini- tial status. Rogosa et al. (1982, pp. 734-735) examine the bias of the corre- lation between observed change and observed initial status. Blomqvist (1977, Eq. 3.2) using straight-line growth and a linear representation of individual differences in growth as a function of initial status (Eq. 3.1), obtains maximum likelihood estimates of the elements of the covariance matrix of xi (0) and theta. The results of Rogosa and Willett (1985b, Section 2) for straight-line growth allow the construction of maximum-likelihood estimates of the correlation between change and initial status or the regression of change on initial status for tI other than tI = 0. MYTH 5: YOU CAN'T AVOID REGRESSION TOWARD THE MEAN Typical statements of this myth are Furby (1973, p. 172), "Regression toward the mean is ubiquitous in developmental psychological research" and Lord (1963, p. 24), "The regression effect is one of the two main reasons why studies of growth may become confusing or confused." What is nearly ubi- quitous about regression toward the mean is the absence of explicit, defensi- ble definitions of the phenomenon. That is, regression toward the mean is the absence of explicit, defensible definitions of the phenomenon. That is, regression toward the mean is often talked about but rarely explicitly stated. Intuitively, regression toward the mean says that on the average you are going to be closer to the mean at time 2 than you were at time 1. The few formal statements of regression toward the mean in the literature define it in stan- dard deviation units; for example, Furby (1973, p. 174) and Nesselroade, Stigler, and Baltes (1980, p. 623). Thus, in the population, regression toward the mean for true scores at times t1 and t2 is said to occur when Because this inequality is satisfied whenever rho xi (t1) xi (t2) < 1, regres- sion toward the mean is thought to be unavoidable. The formulation in Eq. (5.3) is best thought of as a harmless mathematical tautology and one which provides little insight for the study of change. A more realistic definition of regression toward the mean uses the actual metric of xi to express closer to the mean at time 2 than at time 1. The alternative formulation of regression toward the mean is Epsilon{Xi(t2) Xi(t1) = C} - Mu epsilon(t2) < C - Mu xi(t1). Only if sigma xi(t1) and sigma xi(t2) are constrained to be equal, as is done in Lord (1963, p.21) and in Furby (1973, p. 173), is Eq. (5.4) is satisfied only when rho epsilon(t1) delta < 0 where delta = delta(t1 t2)). So, for the formulation in Eq. (5.4), regression toward the mean is not ubiquitous; re- gression toward the mean pertains only when the correlation between change and initial status is negative. Myth 4 discusses conditions for this to hold. The formulation in (5.4) corresponds to the original notion of Galton (1886) much more closely than does Eq. (5.3). Specifically, Galton would indicate no regression toward the mean if the time 2 on time 1 regression coefficient beta xi(t2) xi(t1) is greater than or equal to one. This is equivalent to rho xi(t1) delta > or equal to 0, for which the inequality in Eq. (5.4) is not satisfied. By expressing the severity of the regression effect as the ratio the correspondence of Eq. (5.4) to Galton's formulation is seen. The standard textbook representation of regression toward the mean employs a picture of the time 2 on time 1 plot with an ellipse representing the bivariate data (e.g., Nesselroade et al., 1980, Figure 1). For a choice of a time 1 value C, the time 2 on time 1 regression line gives the expected value at time 2. The peculiar aspect of this standard picture is that it is always drawn to show equal variances at time 1 and time 2, making Eq. (5.4) equivalent to Eq. (5.3). An alteration of the standard picture in Figure 5-3 allows variance to increase over time. Figure 5-3 shows that the expected value is farther away from the mean at time 2 than at time 1. Thus, regres- sion toward the mean does not hold. Another example is seen in the collection of straight-line growth curves in Rogosa et al. (1982, Figure 3). Reference notes Healy and Goldstein (1978), Rogosa et al. (1982, p. 735), and Rogosa and Willett (1985b, Section 2.5) provide similar discussions of regression toward the mean with reference to collections of individual growth curves. Rogosa and Willett (1985b) define explicitly the conditions for Eq. (5.4) to hold. Nesselroade et al. (1980) examine the structure of regression toward the mean for multioccasion data. In the Nesselroade et al. paper, regression toward the mean is analyzed in terms of correlation structures. Consequently, some regression toward the mean will always pertain because of the standardization involved in the correlation matrix. Nesselroade et al. use the term "egres- sion from the mean" to describe a regression toward the mean that is less severe between t1 and t2 (even though there is regression toward the mean between t1 and t2). Perhaps a better use of this term would be egression from the mean as the opposite of regression toward the mean, which would exist over the time interval [t1, t2] if and only if the correlation between xi(t1) and delta(t1, t2) is positive. MYTH 6: RESIDUAL CHANGE CURES WHAT AILS THE DIFFERENCE SCORE What ails the difference score, according to the psychometric literature, is low reliability and negative correlation with initial status. The discussion of previous myths has shown such deficiencies of the difference score to be more illusory than real. Nonetheless, these concerns have motivated the use of residual change scores. In terms of true scores, residual change is a deviation of true outcome at time 2 from the regression prediction using time 1 information; using xi(t1) as the time 1 information yields a true residual change of the form Xi rho(t2) - Mu xi(t2) - Beta xi2 xi1[Xi rho(t1) - Mu sigma(t1)]. With fallible measures, the usual sample estimate of residual change is the residual from the observed-score time 2 on time 1 regression which is denoted by R. A look at the properties of R is not pretty. Bias? Yes; R may be a badly biased estimate of true residual change. Precision? Not much; the sampling variability is rather large because R contains uncertainty both from measurement error and from finite sample size in the regression adjustment. Reliability? At best, not much better than the relaibility of the difference score. Various modifications of R, mainly intended to ameliorate the effects of measurement error on the regression adjustment, do little to mend its severe deficiencies. The demonstrations in the literature of superior reliability for residual change use time 1-time 2 true-score correlations niar one and equal true-score and observed-score variances across time (Linn & Slinde, 1977, Table 2). Then, the reliability of the difference score is near zero, yet the reliability of residual change (even assuming an infinite sample size for making the regression adjustment is .09 for rho(X) = .8 and .05 for rho(X) = .9. Outside the extreme limitations of that comparison, not even the slight advantage for the residual change score holds up. Table 5-8 presents the reliability of the residual change score for different values of t1 and t2 - t1 using the same Xi configuration as described for Table 5-6. The reliabili- ty of residual change increases with t2 - t1 and depends strongly on the choice of t1. Compare these entries with the reliability of the difference score of .133 for t2 - t1 = 1 and .58 for t2 - t1 = 3; this reliability does not depend on t1 as sigma squared delta does not change. The values given in Table 5-8 are obtained from of Rogosa et al. (1982, Eq. 20), which is the squared correlation between the true residual change and R; this formula inflates the actual reliability of R as all available formulas for the reli- ability of residual change assume an infinite sample size for the regression adjustment (i.e., Beta x2 x1 known). The logical problems of the residual-change approach dwarf its techni- cal shortcomings. Instead of addressing the relatively simple question-how much did individual p change on the attribute Xi?-residual change attempts to assess how much individual p would have changed on Xi if all individuals had started out "equal." The obvious questins is, equal on what-true initial status, observed initial status, true initial status and other background characteristics? The correct answer is unknown, and it depends on the correct specification for the prediction of change. The difficulties with residual change are analogous to those with statistical comparisons of treatment ef- fects in nonequivalent groups. Residual change is one example of attempts to statistically adjust for preexisting differences, which the literature on the analysis of quasi-experiments has shown to be doomed to failure. A major use of residual change measures is to detect correlates of change. Questions about correlates of change are of the type, "What kind of people are improving or gaining the most"? When the potential correlate is a variable defining membership in an experimental group, the question is whether people getting care or treatment are improving more than people who are not. Questions about correlates of change can be expressed in terms of systematic individual differences in growth. Individual differences in growth exist when parameters of individual growth curves (e.g., the theta p) differ across individuals, (i.e., some people grow faster that others). Individual differ- ences in growth are systematic if individual differences in a growth parameter can be linked with one or more exogenous characteristics. A common analysis consists of correlating the observed residual change with an exogenous, individual characteristic denoted by W. Tucher, Damarin, and Messick (1966) formed estimates of the correlation between the exogenous variable and the true residual-change score. Lord (1963) presented a slightly different measure, which is equivalent to a partial correlation instead of the part correlation in Tucker et al. The failure of these measures to assess systematic individual differ- ences in growth is demonstrated by an example using the collection of straight-line growth curves illustrated in Figure 5-1. The example includes two cases. Case 1 is no systematic individual differences in growth; that is, the correlation rho theta = .7. Table 5-10 repeats the display for the par- tial correlation from Lord (1963) rho sigma(t2) w multiplied by sigma(t1). When there are no systematic individual differences in growth, the correla- tions may be large positive or large negative depending on the choice of t1. Even large systematic individual differences in growth may result in near zero or even negative values of these correlations. Thus, neither of these corre- lations can be counted on to assess correlates of change. Residual change correlations, whether partial or part correlations, are based on an adjustment for the effects of initial status. And this ad- justment naturally depends on the choice of time at which initial status is measured. Thus, the attempt to purge initial status from the measure of change fails. The fatal flaw of the residual change procedures is the attempt to assess correlates of change by ingoring individual growth. Questions about systematic individual differences in growth cannot be answered without refer- ence to individual growth. Yet these time 1-time 2 correlation procedures valiantly attempt to do so. Reference Notes Rogosa et al. (1982, pp. 738-741, p. 743, Appendix) enumerate the statistical, psychometric, and logical shortcomings of the residual-change score as a measure of individual change for both two-wave and multiwave longitudinal data. Rogosa and Willett (1985b, Section 3) obtain the mathematical forms for the Tucker et al. (1966) and Lord (1963) correlations and demonstrate the failure of these procedures for the assessment of correlates of change. The values in Tables 5-9 and 5-10 were obtained from Rogosa and Willett (1985b, Eqs. 23 & 24, respectively) for a collection of straight-line growth curves with parameter values t degrees = 3, kappa = 3; for case 1, Rho w sigma(t degrees) = .91, and for Case 2 Rho theta w = .7, Rho w sigma (t degrees) = .6, t mu = 6.5, and tl = .43. With multiwave data, an estimate of Rho theta W can be obtained by correcting the observed correlation between Theta and W for attentuation using a maximum-likelihood estimate of the reliability of Theta constructed by substituting estimates from Blomqvist (1977) into Equation 22 of Rogosa et al. (1982). MYTH 7: ANALYSES OF COVARIANCE MATRICES INFORM ABOUT CHANGE This myth serves as an umbrella for illustrations of the unattractiveness of three related approaches to the analysis of longitudinal data; path analysis, structural regressions, and simplex models. These three procedures all use the between-wave covariance matrix as the starting point for the statistical analysis. The main message of this myth is that the between-wave covariance matrix provides little information about change or growth. The examples illustrate this message. Path Regressions Inform About Change? Path analysis models for longitudinal data use the temporal ordering of the measurements to delimit the possible paths between the variables. Consider the example of a three-wave design with measures on X at times t1, t2, t3. The path regressions for the unstandardized variables are X2 = Alpha2 + Beta1 X1 + e2 X3 = Alpha3 + Beta2 X2 + Beta3 X1 + e3 Thus, the path analysis model includes direct paths from X1 to X2 and to X3 (parameters Beta1 and Beta3, respectively) and from X2 to X3 (parameter Beta2). The path coefficients are functions of the entries of the between- wave covariance matrix. An example of the use of this model is Goldstein (1979), in which X is a reading test score obtained on a nationwide British sample with measurements of ages 7, 11, and 16. Goldstein obtains the follow- ing estimates: Beta1 = .841, Beta2 = 1.11, Beta3 = -.147. The negative es- timate for Beta, causes considerable discomfort, as summarized by Goldstein: This is difficult to interpret and may indicate that non-linear or interaction terms should be included in the model, or perhaps that the change in score between seven and 11 years is more important than the seven-year score itself. However, the addition of non-linear terms does not change this picture to any extent. (p. 139) (Although not central to the present discussion, Goldstein's analysis employs complex transformations of the measures to straighten the Xi, Xi', scatter- plots and disattenuation of the sample regression coefficients.) Compare those path analysis results with the following simple facts. Let the true scores Sigma(ti) (i = 1, 2, 3) be determined by a straight-line growth curve for each individual (cf. Figure 5-1. Then the partial regression coefficients are Remarkably, the parameters depend only on the times at which the observations were taken, and thus neither regression coefficient contains any information about growth! Estimates of either parameter are totally independent of the information in the data. The implications of Eq. (5.7) for the path analysis in Eq. (5.6) are devastating. The first parameter in Eq. (5.7) corresponds to Beta3 in Eq. (5.6) and agrees with Goldstien's negative value of Beta3, with the magnitude affected by the data transformations and the success of the disattenuation procedures. The second parameter corresponds to Beta2 and is consistent with Goldstein's positive value for Beta2. Different results for the coefficients in Eq. (5.7) will be obtained for different forms of the individual growth curve. The comparison of the path analysis with the mathematical results for straight-line growth attempts to illustrate some of the perils of summarizing the longitudinal data by the analysis of the bet- ween-wave covariance matrix of the Xi or even the Sigma (ti), thereby ignoring the analysis of individual growth. Structural Regression Models Inform about Change? Structural regression models are a more sophisticated but equally flawed approach to the analysis of longitudinal data. These models incorpo- rate regression relations among latent variables (i.e., Sigma (ti)), with measurement models relating the observed indicators (Xi) to the latent varia- bles. Estimation of these models is based on fitting the covariance structure implied by the structural equation model to the between-wave covariance matrix of the observations. Consider the simple structural regression model shown in Figure 5-4 with one latent variable Sigma observed at times t1 and t2 and a latent background measure, W. Each latent variable has two indicators. This model is equivalent to the model for change in alienation that appears fre- quently as an example in Joreskog's papers. The path from W to Sigma2 repre- sents the exogenous influence on change. The structural parameter for that path is the regression coefficient for the latent variable at time 2 on the exogenous variable, where Sigma is alienation and W is socioeconomic status (SES), a negative estimate of this parameter is interpreted as indicating that high SES reduces alienation. What does the structural parameter Beta sigma (t2) w multiplied by sigma (t1) reveal about exogenous influences on growth? Not very much. For the simple case of a collection of straight-line growth curves, this structur- al parameter has a complicated functional form that depends strongly on the time chosen for the initial measurement. The time span that pertains to a particular study is unknown and depends on the particular substantive problem. For a specified relation between the exogenous and variable and individual change, the structural parameter may be positive, negative, or zero, depending on the choice of time of initial status. Also, the structural parameter increases with the length of the interval between measurements. consider two numerical examples based on the collection of growth curves in Figure 5-1: (1) large influences of the exogenous variable (Rho w theta = .7) and (2) no relation between the exogenous variable and rate of change. Tables 5-11 shows values of the structural parameter for these two cases, wit t2 - t1 of 5 units. The entries in the Rho theta w = 0 column should be compared with the zero value of the corresponding regression coefficient Beta delta (t, t + 5)w = 5 Beta theta w = .77. Thus, for both cases the structural regression coef- ficient may badly mislead influences on growth. Structural Regression Models Inform about Change? Structural regression models are a more sophisticated but equally flawed approach to the analysis of longitudinal data. These models incorpo- rate regression relations among latent variables (i.e., Xi(ti)), with measure- ment models relating the observed indicators (Xi) to the latent variables. Estimation of these models is based on fitting the covariance structure im- plied by the structural equation model to the between-wave covariance matrix of the observations. Consider the simple structural regression model shown in Figure 5-4 with one latent variable Xi observed at times t1 and t2 and a latent background measure, W. Each latent variable has two indicators. This model is equivalent to the model for change in alienation that appears fre- quently as an example in Joreskog's papers. The path from W to Xi2 represents the exogenous influence on change. The structural parameter for that path is the regression coefficient for the latent variable at time 1 partialed out. In Joreskog's example, where Xi is alienation and W is socioeconomic status (SES), a negative estimate of this parameter is interpreted as indicating that high SES reduces alienation. What does the structural parameter Beta xi(t2) w multiplied by xi (t1) reveal about exogenous influences on growth? Not very much. For the simple case of a collection of straight-line growth curves, this structural parameter has a complicated functional form that depends strongly on the time chosen for the initial measurement. The time span that pertains to a particular study is unknown and depends on the particular substantive problem. For a specified relation between the exogenous and variable and individual change, the struc- tural parameter may be positive, negative, or zero, depending on the choice of time of initial status. Also, the structural parameter increases with the length of the interval between measurements. Consider two numerical examples based on the collection of growth curves in Figure 5-1: (1) large influences of the exogenous variable (Rho w theta = .7) and (2) no relation between the exogenous variable and rate of change. Tables 5-11 shows values of the struc- tural parameter for these two cases, with t2-t1 of 5 units. The entries in the Rho theta w = 0 column should be compared with the zero value of the corresponding regression coefficient Beta delta (t, t + 5) w = e Beta theta w = .77. Thus, for both cases the structural regression coefficient may badly mislead about exogenous influences on growth. Simplex Models Describe Most Longitudinal Data? A third example of longitudinal analyses based on the between-wave covariance matrix is the simplex model, which specifies a first-order autore- gressive process for true scores. The numerical example in this section seeks to caution against the propensity to base many analyses of longitudinal data on a simplex structure without careful consideration of the longitudinal data or of alternative growth models. Expositions of covariance structure analyses have encouraged such thinking. Moreover, Werts, Linn, and Joreskog (1977) assert "The simplex model appears to be particularly appropriate for studies of academic growth" (p. 745). Well, maybe, maybe not. Consider the 5 x 5 correlation matrix for observed scores Xip over five occasions of observation in Table 5-12. To the eye, this correlation matrix corresponds extremely well to a simplex. Correlations decrease away from the diagonals, and on each subdiagonal the correlations are nearly equal. A covariance structure analysis of the corresponding covariance matrix, using LISREL with a quasi-simplex covariance structure, is exceptionally successful. The reproduced covariance and correlation matrices are almost perfect; the root mean square residuals are .003 and .006, respectively. The median dis- crepancy for the 10 fitted correlations is .003. The chi-square fit statis- tic, which has five degrees of freedom, is 2.13 (figured for 500 observations) with a p-value of .831. so it seems LSREL is very successful in fitting a simplex model to this example. Guttman's (1954) condition for a simplex specifies that the partial correlation between earlier and later true scores with an intervening time partialed out is zero. This is the first-order Markov assumption. Straight- line growth turns out to be maximally "unsimplex" in that this partial corre- lation is -1 instead of 0. (For exponential growth the partial correlation is also -1.) The example in Table 5-12 actually was generated from straight-line growth in the true scores. Thus, the example shows that a simplex covariance structure marvelously fits a covariance matrix from growth curves that are maximally insimplex. The consequences are far from benign because even when the simplex model fits wonderfully, the results of the covariance structure analysis can badly mislead. The covariance structure analyses usually go on to compute growth statistics and reliability estimates based on the simplex model, and these growth statistics (such as the correlation between true change and true initial status), estimated from the LISREL analysis, can differ markedly from the actual values. Covariance structure analyses provide very limited information about growth, in the sense that covariance matrices arising from very different collections of growth curves can be indistinguish- able. Therefore, analyses of covariance structures cannot support conclusions about growth. To reiterate my central message, analysis of the collection of growth curves cannot be ignored. Reference Notes Rogosa and Willett (1985b, Section 3.2.2) gives mathematical results for the form of the structural regression parameter examined in "Structural Regression Models Inform about Change"? (pp.193). In their notation the example in Table 5-11 used a collection of straight-line growth curves wit parameter values t degrees = 3, kappa = 3. For Equation 27 of Rogosa and Willett with Rho theta w = 0, Rho w sigma (t degrees) = .91: sigma squared w = 1, tau = 5, and sigma squared xi (t degrees) = .438. For Equation 26, with Rho theta w = .7: Rho w xi (t degrees) = .6, t mu = 6.5, and tl = .43. The simplex example is excerpted from the more extensive discussion in Rogosa and Willet (1985a). MYTH 8: STABILITY COEFFICIENTS ESTIMATE (a) the consistency over time of an individual (b) the consistency over time of an average individual (c) the consistency over time of individual differences (d) none of the above (e) some of the above The absence or obscurity of definitions of stability, along with the prolifer- ation of stability coefficients, results in considerable ambiguity as to what a particular stability coefficient is supposed to be estimating. Thus, it is fitting that this multiple choice myth possess a lack of clarity in the iden- tification of the correct answer. for some stability coefficient is supposed to be estimating. Thus, it is fitting that this multiple choice myth possess a lack of clarity in the identification of the correct answer. For some stability coefficients (d) is most correct; for others (e) is most appro- priate, it is not always clear which of (a), (b), (c) would be identified. A coefficient corresponding to choice (a) would be based on an assessment of the heterogeneity (or lack thereof) in an individual's data over time. One proce- dure corresponding to choice (b) would be inferences about the average growth curve flat? Regarding choice (c), correlation coefficients are often used as measures of consistency of individual differences. Rogosa et al. (1984) formulated two kinds of questions about stabili- ty, with application to the stability of behavior. The first question-is an individual consistent over time?-is rarely investigated. Unfortunately, substantive questions about the heterogeneity of an individual's data over time or about individual differences in heterogeneity rarely are addressed. The second question-are individual differences consistent over time?- has been the focus of most empirical investigations and the major use for the menagerie of stability coefficients. Among the methods used for assessing stability of individual differences are time 1-time 2 correlations, intraclass correlations and generalizability coefficients, repeated-measures ANOVA, path analysis regression, and structural equation models with exogenous variables. The path analysis and structural regression coefficients are described in Wheaton, Muthen, Alwin, and Summers (1977, figures 1, 2). The intraclass correlation approach fits a correlation model will yield poor results. An example for science education question-asking in Rosenshine (1973), in which a zero intraclass coefficient is obtained because the between-wave correlation matrix contains both big positive and big negative entries. The most attractive approach to assessing consistency of individual differences is the indices of tracking from the biometric literature, which assess maintenance of individual differences over time. Figure 5-5 depicts collections of growth curves displaying perfect maintenance of individual differences over time; in Figure 5-5 individual differences are consistent across time whether the criterion is maintenance of rank order or of absolute distance. The index of tracking, gamma, presented by Foulkes and Davis (1981) assesses maintenance of rank order over time; this index is the probability of two growth curves not crossing in the specified time interval. Intersections of the individual growth curves are thus evidence against tracking. No track- ing is said to exist for gamma > or equal to .5, the "chance level" for the probability of no crossings. As the time interval is lengthened, gamma tends to decrease, as it is more difficult to maintain individual differences over a longer interval. Data on physical growth are used to illustrate the assessment of stability of individual differences. Measurements of the height (in millimet- ers) of the mandibular ramus bone on a sample of 20 boys at four half-year intervals from 8.0 to 9.5 years of age are given in Goldstein (1979, Table 4.1) and have been used as an illustrative example in many papers on the analysis of growth curves. Each individual's data are very well described by a straight-line to the four observations is .95 for this sample, with upper and lower quartiles of .99 and .91. figure 5-6 plots the 20 fitted straight- line growth curves. The estimate of the Foulkes-Davis gamma index of tracking is .826, with an estimated standard error of .032. Thus, these data show strong, but not perfect, maintenance of individual differences over the 18- month interval. Whereas the index of tracking provides a useful quantifica- tion of the consistency of individual differences, the stability coefficients widely used in the behavioral and social sciences mainly provide confusion. Numerical examples based on the collection of straight-line growth curves in Figure 5-1 are used to illustrate the properties of some of the stability coefficients. The coefficients for measurements over a time interval [tI, tF] are: Gamma(tI, tF) the estimated index of tracking from Foulkes and Davis (1981), Rho xi (tI) xi (tF) the product-moment correlation, Beta xi (tF) xi (tI) the regression coefficient for later on earlier consecu- tive waves of measurement proposed by Heise (1969), Beta xi (tF) xi (tI) multiplied by w the structural regression coefficient for later on earlier latent variables, with an exogenous variable partialed out, used by Wheaton et al. (1977). Tables 5-13 and 5-14 are structured to show the effects to different [tI, tF] intervals on the coefficients. The values of all coefficients except gamma are determined by formulas using the population moments of the collection of growth curves; only gamma is based on the gamma is based on the Xi (ti) values for the 15 growth curves and has an estimated standard error less than .05 for all time intervals in the tables. All coefficients are computed in terms of true scores; only gamma will be relatively unaffected by errors of measure- ment. The coefficients differ among themselves for a given [tI, tF] interval and differ, often in strange ways, over different intervals. Using the cri- terion gamma - 2 [s.e. (gamma)] > .50, tracking exists for [tI, 7] in Table 5- 13 for tI > or equal to 3, and for [0, tF] in Table 5-14, tracking exists for tF < or equal to 4. None of the other stability coefficients has an easily interpretable scale. In fact, for the same degree of consistency of individu- al differences (as assessed by gamma) the other coefficients vary wildly. Table 5-15 displays two sets of [tI, tF] intervals with matching on the values of gamma. For the intervals [0,4] and much larger for [3,7]. The second set of intervals [5,7] and [0,3] show stronger tracking and similar discordance in the regression coefficients. Reference Notes Wohlwill (1973, Chap. 12) provides a lucid discussion and illustration f research questions about stability arising in developmental research. Foulkes and Davis (1981) propose indices of tracking to assess consistency of individual differences. Rogosa and Willett (1983a) provide empirical compari- sons of the two indices. Rogosa and Willett (1983a) proved empirical compari- sons of the two indices. Rogosa et al. (1984) formulate research questions about the stability of behavior; they also develop and illustrate statistical procedures for the assessment of stability. The parameter values displayed in the tables are obtained from results in Rogosa and Willett (1985b). MYTH 9: CASUAL ANALYSES SUPPORT CAUSAL INFERENCES ABOUT RECIPROCAL EFFECTS The best-known procedure associated with Myth 9 is cross-lagged correlation. A remarkable statement of the myth is provided by Crano and Mellon (1978): "With the introduction of the cross-lagged panel correlation method..., causal inferences based on correlational data obtained in longitudinal studies can be made to enjoy the same logical status as those derived in the more standard experimental settings" (p.41). In other words, the use of cross-lagged corre- lation dispenses with the need for experiments, statistical models or careful data analysis; a quick comparison of a few correlation coefficients is all that is required to study reciprocal effects. Well, I suppose that would be wonderful if it were true. The important thing to keep in mind is that questions about reciprocal effects are very, very complex and difficult. A hierarchy of research ques- tions about longitudinal data might start with describing how a single attrib- ute-say, aggression-changes over time. A next step would be questions about individual differences in change of aggression over time, especially corre- lates of change in aggression. Only after such questions are well understood does it seem reasonable to address a question about feedback or reciprocal effects, such as how change in aggression relates to change in exposure to TV violence or, does TV violence cause aggressive behavior? Despite the complex- ity of research questions about reciprocal effects, empirical research has attempted to answer the oversimplified question, does X cause Y or does Y cause X? by casually comparing a couple of correlations. The mathematical and numerical demonstrations of the failures of cross-lagged correlation in Rogosa (1980) had the following simple, limited structure. Start with a basic path-analysis regression model for two varia- bles, X and Y, measured in times 1 and 2 (the popular two-wave, two-variable panel design) X2 = Beta0 + Beta1 X1 + Gamma2 Y1 + u, Yx = Gamma0 + Beta2 X1 + Gamma1 Y1 + v. In the context of the statistical model in Eq. (5.8) the parameters Beta1 and Gamma1 represent the influence of a variable on itself over time. The parame- ters Beta2 and Gamma2 represent the lagged, reciprocal causal effects between X and Y; thus, the relative magnitudes of Beta2 and Gamma2 indicate the nature of the reciprocal causal effects. In Rogosa (1980) combinations of Beta2 and Gamma2 values are compared with the results of the method of cross-lagged correlation. Three examples from Rogosa (1980) are shown in Figure 5-7. In the first frame, the cross-lagged correlations are equal (.63), which indi- cates the conclusion of "spuriousness," no direct causal influences between X to Y (Beta2 = .42) is twice the effect from Y to X (Gamma2 = .21). In the second frame the model stipulates lagged influences of equal magnitude, yet cross-lagged correlation identifies X as the causal winner. In the third frame the model stipulates an effect from Y to X nearly double the effect from X to Y. Yet the attribution of causal predominance by cross-lagged correla- tion is the opposite-X would be chosen the causal winner. These examples are simplified by the assumption of equal variances for X and Y; when variances change over time, equations in Rogosa (1980) show that the comparison of the cross-lagged correlations is even more unsatisfactory. The major (and perhaps only virtue of the path analysis model Eq. (5.8) is the identification of specific parameters believed to represent the reciprocal effects. If this model of the reciprocal influences between X and Y were valid, then estimation of Beta2 and Gamma2 would inform about recipro- cal effects. Perhaps the best way to think about (Eq. 5.8) and the related structural regression models is that these comprise a simple statistical model for reciprocal effects which, however, may be a far from satisfactory scien- tific model of the psychological (etc.) process. The real moral about the analysis of reciprocal effects is that you can't estimate something without first defining it, and statistical models are a good way of defining the key parameters. But this does not imply that all statistical models are sensible. The progress that has been made, especially in the use of structural equation modeling, is to move from no model at all to some statistical model. But having a statistical model does not mean it is an adequate scientific model. Regrettably, the seductive simplicity of cross- lagged correlation has inhibited serious work on the complex question of reciprocal effects. Reference notes Rogosa (1980) was only one in a tradition of papers, starting with Goldberger (1971) and Heise (1970), sharply critical of cross-lagged correlation. Even Cook and Campell (1979, chap. 7) are unenthusiastic about the usefulness of cross-lagged correlation, yet most advocates and users of this procedure remain undaunted. Rogosa (1980) exposits a number of simple statistical models for reciprocal effects between models, and multiple-time series models. The mathematical results in Rogosa (1980) demonstrate the inability of the method of cross-lagged correlation to recover the structure of the reciprocal effects specified by these models. Results and numerical examples are pre- sented for two-wave and multiwave data. Rogosa (1985) provides a nontechnical overview and extensive references on approaches to the analysis of reciprocal effects. DISCUSSION The message of the myths which is carried through into my work on statistical methods for longitudinal data, is that models for collections of growth curves are the proper basis for the statistical analysis of longitudinal data. The nature of research questions about growth and development makes these models a natural, if not essential, starting point. What I tried to do with these myths was to indicate some of the beliefs that have impeded doing good longi- tudinal research. The myths have served either to make the analysis of change appear prohibitively difficult or to direct research in in productive direc- tions. Rather simple approaches work well with longitudinal data, and much progress can be made using straightforward descriptive analysis of individual trajectories followed by statistical estimation procedures for collections of growth curves. Although only a small number of observations often are avail- able in empirical research, the resulting difficulties in statistical estima- tion arising from these limited longitudinal designs should not alter the research questions or the proper statistical models. The nine myths discussed in this chapter are not exhaustive. Two additional candidates deserve some mention. The first could be stated as "The average growth curve informs about individual growth." This myth dominated practice in psychologica; learning experiments, although Estes (1956) demonstrated that the learning curve ob- tained from averaging individual responses at each trial was equivalent to the average of the individual learning curves only for special forms of the learn- ing curve. This myth has also impeded studies of physical maturation (Bock, 1979). Another setting for this myth is the analysis of longitudinal data with a hierarchical or multilevel structure (Rogosa, 1979, pp. 168-174). A second candidate myth is that "Standardizing longitudinal data can be useful." An inexplicable champion of this myth is Goldstien (1983). Standardization renders impossible useful analyses of longitudinal data by removing essential information about individual growth and individual differences in growth. A related, but complex issue is the effects of different metrics/transformations of X on the longitudinal analysis. The myths speak against what I call the "Avoid Change at Any Cost Academy of Longitudinal Research," which recommends analyses that try to draw complex conclusions about change over time without any examination of individual growth. That doctrine appears counterproductive, as these myths and my tech- nical papers so demonstrate. The doctrine of this Academy is sometimes justi- fied by over interpretations of the often-quoted last sentence of Cronbach and Furby (1970, p. 79): "Investigators who ask questions regarding gain [dif- ference] scores would ordinarily be better advised to frame their questions another way". This statement could be regarded as a meta-myth. The factual basis for their conclusion is the shortcomings of the estimate of the amount of change from only two observations. But such facts do not support abandoning the framing of research questions about growth and change in a natural way. the suggested surrender to uninformative regression and residual-change analy- ses is to be much lamented; the proper lesson to draw from difficulties with the difference score is that richer longitudinal designs and the application of appropriate statistical models for the longitudinal data are needed. An appropriate question to be raised at this point is, where do we go from here? The myths serve more to discredit popular analysis procedures than to prescribe replacements. This function is important in the sense of "first things first"; the ground work for new approaches requires some appreciation of the flaws of past and current thinking. Statistical methods respond to (well-formulated) research questions. Naturally, there is no single statistical procedure for the analysis of longi- tudinal data; different research questions dictate different data structures and thus different statistical models and methods. Although at present, the "toolkit" of dependable methods for the analysis of longitudinal data is not complete, I do believe that the natural approach of statistical modeling of individual time trajectories (promoted in this chapter and in my technical papers) serves well as the common basis for the development of statistical methods. To follow on this theme of the linking of research questions and useful statistical methods, I close this chapter with an organization of seven research topics (questions commonly addressed with longitudinal data. The parenthetical listing of Myths under each topic indicates relevant portions of this chapter, but no attempt is made here to survey the available statistical procedures and literature. 1. Individual and group growth (Myths 1, 2, 5, 6). A basic type of question in longitudinal research concerns description of the form and amount of change. Such questions may be posed for an individual case or for the average of a group or subgroup of cases. Interest centers on the estimation of the individual (or group) growth curve, the heterogeneity (individual differences) in the individual growth curves, and the statistical and psycho- metric properties of these estimates. 2. Correlates and predictors of change (Myths 6, 7). Questions about systematic individual differences in growth are a natural sequel to the de- scription of individual growth. A typical research question is given by "What kind of persons learn [grow] fastest?" (Cronbach & Furby, 1970, p. 77). The key quantities are the associations between parameters of the individual growth curves and the correlate(s) of change, which may be exogenous individu- al characteristics (e.g., gender, IQ) or the initial status on the attribute measured over time. 3. Stability over time (Myth 8). Questions about consistency over time are a natural complement to questions about change. In the behavioral sciences literature many different research questions fall under the heading of "stability." Two key topics are the consistency over time of an individual and the consistency of individual differences over time. 4. Comparing experimental groups. The comparison of change across experimental groups is a standard, well-developed area of statistical method- ology employing some form of repeated measures analysis of variance. When the effects of each treatment can be assumed identical for all members within each group (no individual differences in response to treatment), comparison of the parameters of the group growth curves yields inferences about the "treatment effects." 5. Comparing nonexperimental groups (Myths 1, 6). The comparison of change among nonexperimental or nonequivalent groups has been a central topic in the methodology for the evaluation of social programs. The practical or political difficulties of random assignment of individuals to treatment are sometimes overwhelming in a field trial of a program. Yet the question of the relative efficacies of each program/treatment remains. The extensive litera- ture on this topic is dominated by the application of statistical adjustment procedures (analysis of covariance and relatives) to very meager (pretest- posttest) longitudinal data. 6. Analysis of reciprocal effects (Myth 9). As discussed in Myth 9, questions about reciprocal effects are common and complex. Clearly, consider- able empirical research on simpler longitudinal questions should precede attempts to assess reciprocal effects. Despite the complexity of these ques- tions, empirical research has attempted to answer the oversimplified question, "Does X cause Y or does Y cause X?" from meager longitudinal data by casually comparing a couple of correlations (or structural regression coefficients). 7. Growth in multiple measures. All questions about growth in a single attribute have extensions to multiple attributes. Natural questions include relative strengths and weaknesses in individual and group growth and associations of rates of growth across multiple attributes. REFERENCES Anderson, J.E. (1939). The limitations of infant and preschool tests in the measurement of intelligence. Journal of Psychology, 8, 351-379. Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. E. Harris (Ed.), Problems in the measurement of change (pp. 3-20). Madison, WI: University of Wisconsin Press. Blomqvist, N. (1977). On the relation between change and initial value. Jour- nal of the American Statistical Association, 72, 746-749. Bloom, B.S. (1964). Stability and change in human characteristics. New York: Wiley. Bond, L. (1979). On the base-free measure of change proposed by Tucker, Damar- in, and Messick. Psychometrika, 44, 351-355. Coleman, J.S. (1968). The mathematical study of change. In H. M. Blalock & A.B. Blalock (Eds.), Methodology in social research (pp. 428-478). New York: McGraw Hill. Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation: Design and analy- sis for field settings. Boston: Houghton Mifflin. Crano, W.D., & Mellon, P.M. (1978). Causal influence of teachers' expectations on children's academic performance: A cross-lagged panel analysis. Journal of Educational Psychology, 70, 39-49. Cronbach L.J., & Furby, L. (1970). How should we measure "change"-or should we? Psychological Bulletin, 74, 68- 80. Estes, W.K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53, 134- 140. Foulkes, M.A., & Davis, C.E. (1981). An index of tracking for longitudinal data. Biometrics, 37, 439-446. Furby, L. (1973). Interpreting regression toward the mean in development research. Developmental Psychology, 8, 172- 179. Galtron, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute, 15, 246-263. Goldberger, A.S. (1971). Econometrics and psychometrics: A survey of communal- ities. Psychometrika, 36, 83-105. Golstein, H. (1979). The design and analysis of longitudinal studies. London: Academic Press. Goldstein, H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 20, 369-378. Guttman, L.A. (1954). A new approach to factor analysis: The radex. In P.F. Lazarfeld (Ed.), Mathematical thinking in the social sciences. New York: Columbia University Press. Healy, M.J.R., & Golstein, H. (1978). Regression to the mean. Annals of Human Biology, 5, 277-280. Heise, D.R. (1969). Separating reliability and stability in test-retest corre- lation. American Sociological Review, 34, 93-101. Heise, D.R. (1970). Causal inference from panel data. In E.F. Borgatta & G.W. Bohrnstedt (Eds.), Sociological methodology, 1970, San Francisco: Jossey-Bass. Humphreys L.G. (1960). Investigations of the simplex. Psychometrika, 25, 313- 323. Joreskog, K.G. (1979). Analyzing psychological data by structural analysis of covariance matrices. In K. G. Joreskog & D. Sorbom (Eds.), Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books. Lacey, J.I., & Lacey, B.C. (1962). The law of initial value in the longitudin- al study of autonomic constitution: Reproducibility of autonomic responses and response patterns over a four year interval. In W.M. Wolf (Ed.), Rhythmic functions in the living system. Annals of the New York Academy of Sciences, 98, 1257-1290. Linn, R.L., & Slinde, J.A. (1977). The determination of the significance of change between pre-and post-testing periods. Review of Educational Research, 47, 121-150. Lord, F.M. (1956). The measurement of growth. Educational and Psychological Measurement, 16, 421-437. Lord, F.M. (1958). Further problems in the measurement of growth. Educational and Psychological Measurement, 18, 437-454. Lord, F.M. (1963). Elementary models for measuring change. In C.W. Harris (Ed.), Problems in measuring change (pp. 21-38). Madison, WI: University of Wisconsin Press. McMahan, C.A. (1981). An index of tracking. Biometrics, 37, 447-455.Nessel- roade, J.R., Stigler, S.M., & Baltes, P.B. (1980). Regression toward the mean and the study of change. Psychological Bulletin, 88, 622-637. Nielson, F., & Rosenfeld, R. A. (1981). Substantive interpretations of differ- ential equation models. American Sociological Review, 46, 159-174. O'Connor, E.F. (1972). Extending classical test theory to the measurement of change. Review of Educational Research, 42, 73-98. Rogosa, D.R. (1980). A critique of cross-lagged correlation. Psychological Bulletin, 88, 245-258. Rogosa, D.R. (1985). Analysis of reciprocal effects. In T. Husen & N. Post- lethwaite (Eds.), International encyclopedia of education (pp. 4221-4225)). London: Pergamon Press. Rogosa, D.R. (1979). Time and time again: Some analysis problems in longitu- dinal research. In C.E. Bidwell & D.M. Windham, (Eds.), The analysis of educa- tional productivity, volume II: Issues in microanalysis (pp. 153-201). Boston MA: Ballinger Press. Rogosa, D.R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 90, 726-748. Rogosa, D.R., & Willett, J.B. (1983a). Comparing two indices of tracking. Biometrics, 39, 795-796. Rogosa, D.R., & Willett, J.B. (1983b). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measure- ment, 20, 335- 343. Rogosa, D.R., & Willett, J.B. (1985a). Satisfying a simplex structure is simpler than it should be. Journal of Educational Statistics, 10, 99-107. Rogosa, D.R., & Willett, J.B. (1985b). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203-228. Rosenshine, B. (1973). The smallest meaningful sample of classroom transac- 58 tions. Journal of Research in Science Teaching, 10, 221-226. Salemi, M.K., & Tauchen, G.E. (1982). Estimation of nonlinear learning models. Journal of the American Statistical Association, 77, 725-731. Tucker, L.R., Damarin, F., & Messick, S.A. (1966). A base- free measure of change. Psychometrika, 31, 457-473. Tuma, N.B., & Hannan, M.T. (1984). Social dynamics: Models and Methods. New York: Academic Press. Werts, C.E., Linn, R.L., & Joreskog, K.G. (1977). A simplex model for analyz- ing academic growth. Educational and Psychological methodology 1977 (pp. 84- 136). San Francisco: Jossey-Bass. Wheatron, B., Muthen, B., Alwin, D., & Summers, G. (1977). Assessing reliabil- ity and stability in panel models with multiple indicators. In D.R. Heise (Ed.), sociological methodology 1977 (pp. 84-136). San Francisco: Jossey-Bass. Wilder, J. (1957). The law of initial value in neurology and psychiatry. Journal of Nervous and Mental Disease, 125, 73-86. Wohlwill, J.F. (1973). The study of behavioral development. New York: Academic Press. *This chapter is a revised version of a colloquium of the same title presented at National Institutes of Health, Stanford University, University of Califor- nia-Berkeley, Center for Advanced Studies in the Behavioral Sciences, and Vanderbilt University. Preparation of this chapter has been supported by a Seed Grant from the Spencer Foundation. I would like to thank Ghassan Ghan- dour, John B. Willett, and Gary Williamson for computational assistance in preparing the examples.