Grants
Current Grants:
Previous Grants:
Exploratory Study of the Relationship between Students' Mental Models of Climate Change and Their Environmentally Sustainable Decisions and Behavior
Sponsor Ref Number: 0746137
Period of Support: 08/07 - 09/09
Amount of Award: $188,312
Project summary:
The purpose of this SGER is to develop measures of students' conception of climate change and examine the conjecture that environmentally sustainable decisions and behavior are related to these conceptions of the natural world. Specifically, the study asks: (1) Can mental models of climate change be measured reliably, validly and efficiently to provide a gauge on students' understanding: (2) if so, what mental models of Climate Change do middle, high school and college students hold? (3) What is the correlation between students' mental models and their demographic characteristics, reported behavioral and policy preferences regarding Climate Change?
The project is timely because environmental issues are receiving wide media coverage and much public discussion. Improving our understanding of what kinds of education interventions alter students' behaviors around climate change is of national importance. The study would initiate an innovative approach to building the empirical knowledge base about how and why secondary school and college students' alter their behaviors around the critical issue of climate change.
Understanding Academic Performance in Organic Chemistry: An Investigation Examining Underrepresented Groups
Sponsor Ref Number: 0814559
Period of Support: 08/08 - 08/10
Amount of Award: $804,594
Project summary:
Thousands of students enroll in "Introduction to Organic Chemistry" (O-CHEM) each year. Successful completion of O-CHEM is a prerequisite for many graduate and professional STEM programs, yet the failure rate is notoriously high. O-CHEM has unique knowledge representation protocols that often challenge even students who initially master general chemistry. There are very few large-scale studies examining why some students succeed while others have difficulty in O-CHEM. Such issues are of particular importance when considering the impact on under-represented minority students and women. A large body of evidence indicates that these groups perform significantly worse in O-CHEM, contributing to the under-representation of these groups in STEM careers. Previous studies by the PI focusing on how students succeed in STEM courses used techniques such as concept-mapping to examine "knowledge organization." Recent studies show that both experts and high-achieving students demonstrate enhanced knowledge bases. No study has examined how knowledge organization mediates academic performance in O-CHEM. Furthermore, no study has examined the activities employed by students to organize their knowledge appropriately, although more generally, investigations into both expert performance and academic success have demonstrated that concentrated, goal-directed activities are correlated with superior performance. Previous studies by the PI have successfully identified the importance of such activities using techniques such as think aloud protocol analysis, structured interviews, and diaries.
The goal of this study is to combine the insights provided by multiple measurement techniques such as concept-mapping, think-aloud protocol analysis, and diaries in order to examine factors contributing to both academic success and difficulties in O-CHEM, particularly among under-represented minority students and women. This is a "frontier research empirical study." It is being undertaken collaboratively between departments of education, psychology (cognitive science), and chemistry. It is examining equal numbers of minority and non-minority students, and equal numbers of males and females within each group. Groups are also comprised of equal numbers of high-, average-, and low-achieving students.
This study has four specific objectives; to 1) Examine O-CHEM knowledge structures to identify major conceptual difficulties; 2) Compare student-instructor O-CHEM knowledge structure correspondence and identify specific discrepancies; 3) Compare O-CHEM problem solving success and knowledge structures, and; 4) Compare specific study activities and knowledge structures. It may lead to the design of randomized trial studies involving individualized and group tutoring in O-CHEM and, ultimately, the restructuring of O-CHEM syllabi and teaching methods in order to close potential gaps between student knowledge and instructor ideals.
Stanford Challenge Award: Marine Science Curriculum for K-12
Period of Support: 09/08-09/09
Amount of Award: $70,000
Project summary:
Stanford University's Hopkins Marine Station, SEAL, and the Hilton Bialek Habitat (affiliated with Carmel Unified School District) are collaborating on a program in which Hopkins graduate students, post-docs and faculty teach K-12 students from underserved communities in Monterey County about the oceans, marine conservation, and sustainability using a place-based, outdoor learning approach. We call this the Hopkins Ocean Literacy Program. Learning objectives include demonstrated student understanding of the oceans and how they relate to all forms of life in the watershed, as well as an increased sense of environmental stewardship indicated by demonstrated changes in attitudes and behaviors. Activities will include ocean-based restoration projects, weekend ocean and watershed explorations, after-school programs, and trips to the Hopkins Marine Station. Through our curriculum, students in communities without outdoor programs will have the opportunity to learn from nature by participating in place-based learning activities that are both informative and fun. We will use the experience gained through this program to develop an on-line curriculum to help students, teachers and parents increase their understanding of marine science and the central role healthy oceans play in the global environment. These web-based learning tools will be hosted by Stanford and made available to teachers nationally.
National Center for Research on Evaluation,
Standards and Student Learning (CRESST), Studies of
Performance-Based Assessments-Measurement of Progress
Sponsor Ref Number: UCLA 0070 G
CC908-06
Period of Support: 2/2001-2/2006
Amount of Award: $979,080
Project summary:
Standards and accountability measures require concrete links to
everyday practices of teachers and the learning opportunities of
students if they are going to make a real difference in student
learning. Needed are classroom assessments that are aligned with
external accountability requirements, which teachers can use on a
regular basis to assess their students’ progress, provide feedback,
and take action according. Such measures also can communicate to
students what is important to learn as well as provide essential
feedback on how they are doing. In one of our projects we
investigate strategies for creating such classroom
assessments—embedding assessments in science curricula that
systematically tap various dimensions of learning, and to explore
the integration of classroom-based assessments and measures of
students’ opportunities to learn. In the other, we conduct
psychometric studies—theoretical and empirical—to examine the
reliability and validity of cognitive interpretations of different
kinds of science achievement assessments.
Assessing Student Learning and Accounting for Their Achievement:
The Quest to Hold Higher Education Accountable
Award Number: Atlantic Philanthropies (USA) Inc.
Period of Support: 9/2000-9/2005
Amount of Award: $720,863.75
Project summary: This four-year study evaluates alternative
accountability systems and recommends principles for designing
such systems with the goal of improving teaching, and learning,
while recognizing the critical importance that research
plays in some of these institutions. The study takes the
perspective of a decision-maker and looks at the intended
and unintended consequences of alternative accountability
system designs, evaluating the fit between input, process,
and output indicators with valued outcomes. Inevitably,
the project will have to conceptualize and empirically test ideas
for matching output indicators, especially learning-assessment
indicators, with valued outcomes; identify and develop alternative
accountability models for tracking progress in improving student
outcomes, and recommend elements of a comprehensive indicator
system for higher education institutions.
List of Publication & presentations acknowledging award:
- Shavelson, R. J., Ruiz-Primo, M. A., & Wiley, E.
Windows into the Mind. Paper (In Press). International Journal
of Higher Education.
- Shavelson, R.J., & Huang, Leta (2003). Responding
responsibly to the frenzy to assess learning in higher education, Change,
35(1), 10-19.
- Naughton, B.A., Suen, A.Y., & Shavelson, R.J. (April
2003). Accountability for what? Understanding the learning objectives in
state higher education accountability programs. Annual meeting of the American
Educational Research Association, Chicago.
- Shavelson, R. "Assessment and Achievement: The Quest to
Hold Higher Education Accountable." Stanford University Invited Address,
California Association for Institutional Research, Rohnert Park, CA, November
14, 2003.
Embedding Assessments
in the FAST Curriculum:
On Beginning the Romance among Curriculum, Teaching and
Assessment
Award Number: NSF ESI-0095520
Period of Support: 8/15/2001 - 7/31/2003
Amount of Award: $764,316
Project summary: Paul Black and Dylan Wiliam (1998),
in a comprehensive review of the effects of formative evaluation
on student performance, concluded that if this feedback
were closely connected to instruction and provided information
about how to improve performance, it would produce a large
positive effect on student performance. They also noted that
this kind of feedback rarely occurred in classrooms. The
importance of this kind of evaluation was recognized in the
National Science Education Standards where a link was forged
between content, teaching and assessment standards. This feasibility
study begins a "romance" among curriculum, teaching and
assessment. It sets forth a framework for conceiving of science
achievement with links to methods for assessing different
aspects of achievement. This framework is then used to create
a set of assessments, both formative (embedded within a unit)
and summative (end-of unit) for a unit from the "Foundational
Approaches in Science Teaching" (FAST) middle school science
program. The developmental process will be documented; the
assessment-embedded-FAST unit will be evaluated in a small,
randomized experiment focusing on both student learning and
teacher implementation (especially formative feedback to students);
and claims about the link between types of science achievement
and assessment methods will be evaluated empirically. If the
feasibility study demonstrates that the framework and methods
have a salutary effect on teaching and learning, a full-scale
research and development effort to link assessments with curricula
will be proposed to NSF, using FAST and at least one other
middle-school program.
List of publications & presentations acknowledging
award:
- On The Integration of Formative Assessment in Teaching
and Learning: Implications for New Pathways in Teacher Education
Richard J. Shavelson, Stanford University, USA, for the Stanford Education
Assessment Laboratory and the University of Hawaii Curriculum Research and
Development Group. Paper presented at the biannual meeting of the European
Association for Research on Learning and Instruction, 2003, Padua, Italy.
- On The Integration of Formative Assessment in Teaching
and Learning: Implications for New Pathways in Teacher Education (In
Press)
Richard J. Shavelson, Stanford University, USA, for the Stanford Education
Assessment Laboratory and the University of Hawaii Curriculum Research and
Development Group. Pergamon Publication in an edited book: Achtenhagen,
F., & Oser, F. (Eds.) (In Press), New Pathways in the Field of Teacher
Education.
Developing, Supporting,
and Aligning Classroom and Large-Scale Assessment to Sustain
Education Reform: Phase One
Award number: NSF REC-9909370
Period of support: 4/15/2000 - 3/31/2003
Amount of award: $1,505,125
Project summary: This project investigates an innovation
that has the potential to raise student achievement in science
education to the level and quality espoused in standards-based
reform efforts: the development of models and practices
to enhance teachers' formative assessment repertoires. Using
a variety of research methods, this project investigates
and develops assessment procedures that teachers can employ
to promote the quality of science teaching and learning,
and related areas of mathematics. While the focus initially is on
formative assessment practices - paying particular attention to
the impact of these types of classroom assessment on student
learning, engagement, and sense of purpose - it then moves
to teachers' summative judgments and their link to formative
work.
An additional aspect of the project is to identify issues
associates with broader implementation of programs for teachers
that support these assessment practices, then develop and
test model plans for high quality professional-development
activities about classroom assessment procedures that comport
with the research findings. The research also investigates
the challenges associated with promulgating these models
and practices to large numbers of teachers. Thus the study
aims for wide dissemination of ways in which students respond
to changes in science instruction and assessment that aim to enhance
their roles in classroom assessment, including in peer- and
self-assessment.
Toward the end of the initial three-year period, the study
begins to probe how classroom assessment and large-scale
assessment might be mutually reinforcing to raise educational
quality. A second phase - to study and formulate alternative
large-scale assessment systems that integrate classroom assessments
with examination data for accountability and monitoring purposed
- may follow if evidence warrants.
List of publications & presentations acknowledging
award:
- Coffey, J., Sato, M., & Schneider, B. (2001, April).
Classroom Assessment Up Close-And Personal.
Paper presented at the Annual Meeting of the American Education
research Association, Seattle, WA.
- Shavelson, R.J. (January 2003). Bridging the Gap
between Formative and Summative Assessment. Paper presented at the National
Research Council Workshop on Bridging the Gap between Large-scale and Classroom
Assessment. National Academies Building, Washington, DC.
- On Linking Formative and Summative Functions in the
Design of Large-Scale Assessment Systems (in publication)
Richard J. Shavelson, Stanford University
Paul J. Black, Dylan Wiliam, Kings College London
Janet Coffey, Stanford University
Center for Assessment
and Evaluation of Student Learning
Award Number: WestEd ESI-0119790
Period of Support: 9/2001-8/2006
Amount of Award: $727,019
Project Summary: The aim of the proposed national Center
for Assessment and Evaluation of Student Learning (hereafter
CAESL or "the Center") is to address the need for increasing
the assessment capacity within the K-12 science education
system, thereby increasing student literacy in science. CAESAL
is a collaboration among The Concord Consortium, CRESST/University
of California Los Angeles, Stanford University, Lawrence
Hall of Science (LHS) and the Graduate School of Education
at the University of California-Berkeley, and WestEd. CAESAL
will also partner with the Bay Area Schools for Excellence
in Education (BASEE, an NSF-funded local systemic change
project involving eight school districts in the San Francisco Bay
Area), El Centro Unified School District, Fresno Unified School
District, Garvey School District, Kings Canyon Unified School
District, Pomona Unified School District, Sacramento City
Unified School District, San Francisco Unified School District,
San Diego Unified School District which will serve as test
beds for Center activities and resources. In addition, the
Center will work with San Jose State University to co-develop
and integrate a variety of resources into its preservice and
graduate programs. Apple Computer, Inc. will support the
Center by providing significant cost share ($1.25 million)
in the form of consulting, equipment and dissemination.
In order to achieve its goal, CAESAL will initiate research
and development projects to address all facets of assessment
through the following:
1. Enhance the capacity of prospective assessment and
evaluation professionals though a collaborative graduate
program among the university partners.
2. Develop and field test models for enhancing the formative
and summative assessment capabilities of practicing science
teachers though professional development and developing
teachers' capacity to lead assessment efforts in their districts.
3. Enhance the formative and summative assessment capabilities
of preservice science teachers though the development of
assessment course modules.
4. Conduct applied research to inform the Center itself,
the field, and practitioners on (a) formative and summative
assessment practices and (b) technology-intensive assessment
environments, and use findings from this research to generate
new products.
5. Enhance the capacity of parents, school administrators
policy makers, and the public to make decisions about support
the appropriate educational roles of different kinds of
assessment and evaluation through outreach programs.
Inverness Research Associates will provide formative and summative
evaluation services to assess Center outcomes and success.
List of publications & presentations acknowledging
award:
Transferring
New Assessment Technologies to Teachers and Other Educators
Award number: NSF ESI-9596080, A004
Period of support: 3/1/1995 - 7/31/2000
Amount of award: $854,738
Project summary: The current grant was a supplement
to a grant awarded in 1990 (NSF TPE-905543; $1,038,718).
The original grant was carried out in four phases: (1) compile
and classify emerging concepts and procedures in new assessment
from currently available sources, (2) create performance
assessments and study the procedures for doing so, and (3)
design and evaluate student portfolios, and (4) evaluate the
quality of the assessments. The supplement to this grant sought
to disseminate the professional development parts of the original
grant beyond research reports and professional development
materials (e.g., Brown, J. & Shavelson, R.J. (1996). Assessing
hands-on science: A teacher's guide to performance assessment.
Thousand Oaks, CA: Corwin Press, Inc.), and extended what
we had learned by designing a study to evaluate inquiry-based
science education programs to meet Congressional demands for
accountability. This latter extension occupied a significant
amount of project time. It incorporated traditional and performance-based
assessments into a multilevel-multifaceted model of evaluation
to assess the learning impacts of science education reform
programs. The multilevel aspect of the evaluation conceived
of learning assessments varying in distance from the enacted
curriculum in the classroom. Science journals provided immediate
measures of curricular impact. Close measures took one step
further away from classroom activities in pushing students
to reason. Proximal measures focused on the same concepts but in
a novel way. Distal measures were consistent with a state science
framework but students had not been prepared directly in the
curriculum for the assessments. The multifacet aspect of the
model recognized that science achievement involved more than
extensive propositional knowledge as measured by multiple-choice
tests and recognized the importance of procedural and strategic
knowledge (see Figure 1). We collected extensive evaluation
data and provided evidence of the capacity of the model to
pick up even weak "treatment" effects of inquiry curriculum.
The present proposal draws heavily on the work of this prior
grant.
List of publications & presentations acknowledging
award:
- Stecher, B.M., Klein, S.P., Solano-Flores, G., McCaffrey,
D., Robyn, A., Shavelson, R. J., and Haertel, E. (2000).
The effects of content, format, and inquiry level on performance
on science performance assessment scores. Applied Measurement
in Education, 13, (2), 139-160.
- Shavelson, R.J., & Ruiz-Primo, M.A. (1999). On
the psychometrics of assessing science understanding. In
J.J. Mintzes, J.H. Wamhersee & J.D. Novak (Eds.), Assessing
science understanding: A human constructivist view. New
York: Academic Press.
- Shavelson, R.J. (1999). On the role
of assessment in self-directed learning. In W. Althof, F.
Baeriswyl & K.H. Reich (Eds.), Autonomie und Entwicklung:
Fentschrift zum 60 Geburtstag von Fritz Oser, 65-93.
University of Freiburg Press.
- Shavelson, R.J. & Ruiz-Primo, MA
(1999). Leistungsbewertung im naturwissenschaftlichen Unterricht
(Evaluation in natural science instruction). Unterrichtswissenschaft,
27, 102-127.
- Solano-Flores, G., Jovanovic, J., Shavelson, R.J.,
& Bachman, M. (1999). On the development and evaluation
of a shell for generating science performance assessments.
International Journal of Science Education,
21 (3), 293-315.
- Shavelson, R.J., Ruiz-Primo, MA&
Wiley, E.W. (1999). Note on sources of sampling variability
in science performance assessments. Journal of Educational
Measurement, 36 (1), 61-71.
- Shavelson, R.J., Solano-Flores, G.,
& Ruiz-Primo, MA (1998). Toward a science performance assessment
technology. Evaluation and Program Planning, 21, 171-184.
- Klein, SP, Stecher, BM, Shavelson, R.J., McCaffrey,
D., Ormseth, T., Bell, R.M., Comfort, K., & Othman, A.R.
(1998) Analytic versus holistic scoring of science performance
tasks. Applied Measurement in Education,
11(2), 121-137.
- Shavelson, R.J. (1997). On a science
performance assessment technology: Implications for the future
of the national assessment of educational progress. In National
Academy of Education (Ed.), Assessment in transition: Monitoring
the nation's educational progress, background studies.
Stanford, CA: National Academy of Education.
- Solano-Flores, G., & Shavelson,
R.J. (1997). Development of performance assessments in science:
Conceptual, practical, and logistical issues. Educational
Measurement: Issues and Practice, 16(3), 16-24.
- Klein, SP, Jovanovic, J., Stecher, BM,
McCaffrey, D., Shavelson, R.J., Haertel, E., Solano-Flores,
G., & Comfort, K. (1997). Gender and racial/ethnic differences
on performance assessments in science. Educational Evaluation
and Policy Analysis, 19(2), 83-97.
- Shavelson, R.J. (1996). On school reform:
Curriculum and instruction, yes... but don't forget assessment!
Hong Kong Educational Research Journal, 11 (2),
147-156.
- Ruiz-Primo, MA & Shavelson, R.J. (1996). Rhetoric
and reality in science performance assessments: An update.
Journal of Research in Science Teaching, 33(10), 1045-1063.
- Ruiz-Primo, MA, Shavelson, R.J., &
Baxter, G.P. (1995). Evaluation of a prototype teacher enhancement
program on performance assessment. In P. Kansanen (Ed.),
Discussions of some education issues VI.
(Research Report 145). Finland: University of Helsinki, Department
of Teacher Education, 173-211. Reform in the United States.
In D.K. Sharpes & A-L Leino (Eds.), The dynamic concept
of curriculum: Invited papers to honour the memory of Paul
Hellgren. (Research Bulletin 90). Finland: University
of Helsinki, Department of Education, 57-76.
- Shavelson, R.J., Gao, X., & Baxter, GP (1995).
On the content validity of performance assessments: Centrality
of domain specification. In M. Birenbaum & F. Dochy (Eds.),
Alternatives in assessment of achievements,
learning focuses and prior knowledge. Boston: Kluwer
Academic Publishers.
- Shavelson, R.J. (2000, May). Trends
in science assessment: Linking methods to facets of achievement.
Shavelson, R.J., & Ruiz-Primo, MA (1999 March). Assessing
NSF Programming-Standards-Based Reform Assessment Technologies:
San Francisco Unified School District.
Briefing, Performance Evaluation Review, Washington, DC:
NSF. Invited talk to the Department of Educational Measurement,
Umea University, Sweden.
- Ayala, C.C. & Shavelson, R.J.(2000,
April). New dimensions for performance assessments.
Paper presented at the American Educational Research Association
(AERA) Annual Meeting, New Orleans.
- Shavelson, R.J. (2000, March). On
balancing accountability and learning goals in assessing science
achievement. Invited talk for the Washington Educational
Research Association (WERA) Research Conference, Seattle,
WA.
- Shavelson, R.J. (2000, March). Accountability issues
from invited talk & generalizability of performance
measurements. Invited breakout session for the Washington
Educational Research Association (WERA) Research Conference,
Seattle, WA.
- Shavelson, R.J. (1999, May). On linking
assessment to a cognitive model of science achievement.
Invited talk to Berkeley Evaluation and Assessment Research
(BEAR), Berkeley, CA. Boston , MA.
- Shavelson, R.J. (1998, October). On
assessment and evaluation in science education reform.
Presentation to Bay Area Schools for Excellence in Education
(BASEE), Palo Alto, CA.
- Backman, J., Hardy, C. & Shavelson,
R.J. (1998, July). Assessing student learning. Presentation
to the California LASER K-8 Science Education Strategic
Planning Institute. Shavelson, R.J. (1998, June). On linking
assessment of learning with sustainable development. Invited
address at the Conference on Sustainable Development, Rantasalmi,
Finland.
Models-Based
Assessment Design: Individual and Group Problem Solving in Science
Award number: UCLA 0070-G-9H813
Period of supports: 6/15/1996-6/30/2001
Amount of award: $875,776
Project summary: Assessment plays a central role in
current education reform. Indeed, it is one of the major
instruments of reform, if not the major instrument.
The rationale behind this policy instrument goes something
like: (a) if teachers teach to the test, and they do; and
(b) if students tend to learn what they are taught, and they
do; then (c) by using authentic, alternative,
or performance assessments that tap into "higher-order"
thinking and/or problem-solving in a subject-matter, the chances
are that they way teachers teach and what student learn may
be changed in a manner consistent with these assessment and
the reform (Glaser Raghavan, & Baxter, 1992; Shavelson,
Carey & Webb, 1989; Shavelson, Baxter & Pine, 1991).
This project addresses one aspect of this chain of reasoning:
the claim that authentic, alternative or performance assessments
tap into problem-solving in a subject-matter
domain. Messick's work (1995) provides conceptual background
for addressing the construct validity issues raised by such claims
(see also Embertson, 1995).
More specially, this project examines the problem-solving
and cognitive-structure clams through a series of studies
that address the: (a) conceptual underpinnings of the design
of alternative, especially performance, assessments (e.g.,
Shavelson, in press); (b) exchangeability of alternative
methods of measuring performance, (e.g., notebooks based
on hands-on investigations, computer simulations, pencil & paper
objective tests), especially as they impact diverse populations
such as language minority and handicapped students (e.g.,
Baxter, & Shavelson, 1994; Dalton, Morocco, Tivnan, &
Rawson,1994; Solano-Flores et al., in preparation); (c) structural
representations of knowledge generated by concept maps (Ruiz-Primo
& Shavelson, in press).
The team of researchers conducting studies in this project
seeks, then, to understand how variability in students'
problem-solving performances arise over different tasks,
over different methods, over differences in students' backgrounds,
and the like. Ambitiously stated, the project attempts to
make progress toward building a framework or "theory" of
subject-matter achievement-in large part achievement in mathematics
and science-that underlies claims that alternative assessments
tap higher-order knowledge and can address policy needs (e.g.,
use of alternative assessment for different student populations).
Such a theory of learning and performance (Glaser &
Bassock, 1989; Glaser & Silver, 1994) would, for example,
illuminate the variability observed in students' performances
from task to task, from occasion to occasion, and from one
measurement method to another (e.g., Cronback, Linn, Brennan,
& Haertel, 1995), perhaps by explaining the transition
of students' knowledge from little to partial to full to
expert in a subject-matter domain. This theory would also
address the propositional and procedural knowledge, problem-solving
heuristics, and attitudes of mind that students are to acquire
in the domain. Measurement methods vary as to what aspect
of achievement they tap. The theory would link measurement
methods to the components of the domain, indicating which
methods are best suited to measure each type of knowledge,
or the relation between knowledge types in applications or problem
solving. For example, paper-and-pencil methods (e.g., multiple-choice)
appear to be particularly apt for measuring facts and concepts
(at least on face validity grounds); concept maps at measuring
relations among concepts; performance assessments at measuring
procedural knowledge. Equally important, the theory would
also link the medium or symbolic representation used by
a measurement method to the propositional or procedural
knowledge tapped by that medium. Finally the theory would
address both demographics and content, paying particular attention
to the learning and performance of language minority and disabled
students (see Shavelson, in press). More realistically, the
studies proposed in this project will shed light on some of
the parameters of such a theory.
We now turn to descriptions of studies in each of the
following areas: (a) conceptual underpinnings of the design
of alternative, especially performance, assessments (Solano-Flores
& Shavelson); (b) exchangeability of alternative methods
of measuring performance, especially as they impact diverse
populations (Solano-Flores, Ruiz-Primo, & Shavelson);
and (c) concept-map representation of knowledge structure
(Ruiz-Primo & Shavelson).
Study I: Validity of Conceptual Underpinnings For The Design
Of Performance Assessments
We have developed a framework for conceptualizing science
performance-assessment tasks and their corresponding scoring
systems (Shavelson, in press). Four types of tasks that
parallel investigations carried out by scientists have been
identified so far: comparative (compare two or more objects
along some dimension), component-identification (decompose
a whole into its component parts), classification (classify
a set of objects along a set of dimensions for a particular purpose),
and observation (systematically observe and record data in
time series) investigations (Shavelson, 1994; Solano-Flores
et al., 1995). For each type of task, we have identified
a corresponding scoring system. For example, procedure-based
scoring is used with comparative investigations (Baxter et
al., 1992) and evidence-based scoring is used with component-identification
investigations (Shavelson et al, 1991). With this framework,
we are learning how to structure performance assessment tasks
and build scoring systems to go with them.
Background and Research Objectives. Although this
framework holds promise for developing a performance-assessment
technology, its cognitive foundations have not been systematically
studied. What is needed, then, is a cognitive analysis linked
to observed performance on the four types of performance tasks.
Research conducted by Baxter and Glaser on cognitive analysis
of science performance assessment shows that talk aloud techniques
can be used fruitfully to validate cognitive interpretations
of these assessments (e.g., Baxter & Elder, 1995; Baxter
et al., 1994 Glaser et al., 1992) However, talk aloud techniques
may potentially affect performance (e.g., Shavelson, Webb
& Burstein, 1986).
The purpose of this study, then, is twofold: to examine
the reasoning used by students as they take different types
of science performance assessments and to see whether different
methods of cognitive data collection provide consistent
information on this reasoning.
Technical Approach. Fifth-to seventh-grade students
with instruction in a specific domain of science (e.g.,
electricity) will be given one of four types of science
performance assessments corresponding to that knowledge
domain: for example, Incline Planes (comparative), Electric
Mysteries (component identification), Sink and Flat (classification),
and Daytime Astronomy (observation). (for descriptions of
these assessments, see Shavelson, in press; Shavelson, Baxter
& Pine, 1991). For each type of assessment, half the students
(n=10) will perform the investigation individually, talking
aloud as they complete it. Students' talk aloud will be tape
recorded. The other half (n=20) will perform the assessment
in pair (npairs=10) and their verbal interaction will be tape recorded
as they complete the investigation. The talk aloud and verbal
interaction protocols will be analyzed to determine: (a)
the kinds of reasoning used by the students across assessments
(cf., Baxter et al., 1995), and (b) the kind of information
on the students' reasoning obtained with each method of data
collection.
Anticipated Impact. This study will provide evidence
as to whether the four types of investigations defined by
the conceptual framework effectively tap relevant, different
higher-order scientific reasoning. It will also shed light
on whether talk aloud and verbal interaction protocols provide
comparable information about cognition. A better understanding
of the cognitive processes underlying problem solving with
hands-on science assessments will bear directly on the validity
of the conceptual framework, enrich the process of developing
and validating science performance assessments, and bring
data to bear on proposed interpretations that performance assessments
measure higher-order thinking.
Study II: Exchangeability of Alternative Methods of Measuring
Performance
Previous research has found that science performance assessments
are highly sensitive not only to the tasks sampled (Shavelson,
Baxter & GAO, 1993), but also to the method used to
measure performance: direct observation, notebooks, computer
simulation and paper-and pencil (e.g., Baster & Shavelson
1994). Assuming observation of hands-on investigations are
the benchmark, paper-and-pencil (i.e., multiple-choice and
short-answer) methods are the least exchangeable for the benchmark
(r <.30). In contrast, notebooks were found to be an adequate
surrogate for the benchmark (r >.80). To our surprise,
computer simulations fell in between notebooks and paper-and
pencil as surrogated (r =.45); we had expected them to be as good
a surrogate to observation as notebooks. We have interpreted
these findings to indicate that: (a) a fundamental difference
between paper-and pencil methods and the other methods is
that paper-and pencil assessments do not react to the actions
taken by students (Shavelson, Boaxter & Pine, 1992), and
(b) students have incomplete knowledge and skills in the domain
assessed and that the symbolic representations used to assess
that performance may or may not tap that partial understanding.
These results notwithstanding, many large-scale assessments
rely on scores obtained using paper-and pencil methods (e.g.,
short answer) not only for practical reasons, but also because
the assessment developers assume that students' achievement
does not depend on the method used to measure achievement.
Finally the exchangeability issue is also fundamental to the question
of how to accommodate large scale assessment to individual
differences in students (e.g., language minority, special
needs students): If different methods tap different aspects
of understanding in a subject matter, how should assessments
deal with accommodation (see Dalton et al, 1994)?
A. Exchangeability of Hands-on and Computer-Simulation
Science Investigations
Background and Research Objectives. Unlike the
paper-and pencil methods, computer simulations have great
potential as surrogates for direct observation in that they
react to student's actions. Moreover, computer simulations
are affordable and offer logistical advantages over direct
observation and notebooks. Further exploration of this assessment
methods, then, seems warranted.
The moderate correlation between computer simulation and
direct observation is intriguing, especially because students
do not have problems interpreting the 2 dimensional computer
simulation as representing the 3 dimensional real object
(Ruiz-Primo, Solano-Flores, Brown, Druker, & Shavelson,
1994). At least two reasons, then might account for unexpectedly
moderate correlation between direct observation and computer
simulation, and between notebook and computer simulation:
(a) Students' performances are inherently unstable from one performance-assessment
occasion to another, that is, unstable from the time that
the hands-on task is administered and the computer simulation
is administered (see Ruiz-Primo, Baxter, & Shavelson,
1993); or (b) Computer simulations pose different cognitive
demands than hands-on methods of measurement.
The purpose of this study, then, is to test these competing
explanations of performance differences: Are they due to
instability of students performances, to cognitive demands
imposed by computer simulations or to some combination?
Technical Approach. Two substudies are envisioned.
In Study IIA, students will take the same assessment [either
electric mysteries (EM) or bugs (B); see Shavelson et al.,
1991] using two methods: hands-on and computer simulation.
The assessments will be administered in two sequences (hands-on
and computer simulation and computer simulation -hands-on)
and with two times between administrations (same day, two
weeks). Variations between scores produced by the different
versions and administration times will be examined statistically.
In Study IIB, students will carry out hands-on and the
computer simulation versions of EM and B while talking aloud
as they conduct their investigations. Interview protocols
and observations will be analyzed with respect to the solution
strategy used in conducting the investigations and the reasoning
underlying conclusions (e.g., systematic strategy or trial
and error; cf. Baxter et al, 1995) and cognitive demands imposed
by each method (e.g., Do students understand the problem
in the same way with both methods? Do explanations and reasoning
change from one method to the other?).
Anticipated Impact. Results of these substudies
will lead to a better understanding of the differences among
the assessment methods and the implications of using specific
methods in measuring students' achievement. Of particular
importance are issues of cost-savings and accommodation of
differences among students using alternative methods.
B. Searching for accommodations
Background and Research Objectives. The concept
of accommodation of individual differences in student performance
assumes that alternative methods measure the same construct.
The foregoing studies of exchangeability investigate part
of this assumption. Even if measurement methods are not completely
exchangeable (Baxter & Shavelson, 1994), they may measure
important, overlapping aspects of the construct of interest.
In that case, which method or combination of methods might
be relied upon to evaluate performance for different individual?
Should the highest score be taken? Some combination of scores (e.g.,
discard the highest and lowest and take the average of the
remainder)? How are individuals' performances affected by
variation in exchangeability and scoring method? The appropriate
design of testing policies relies heavily on (a) finding
psychometrically justifiable score combinations, and (b)
identifying patterns of performance associated with different
students. This study explores these questions about accommodation.
Technical Approach. Data from 300 students from
a pervious investigation (e.g., Shavelson, Baxter, & Pine,
1992) will be reanalyzed. These data contain information about
students' performance on different investigations ("Paper
Towels", "Bugs", "Electric Mysteries"), across assessment methods
(observation, notebook, computer simulation, multiple-choice
and short-answer), as well as information on student characteristics
(e.g., ethnicity, cognitive ability). The reanalyzes will focus
on: (a) patterns of performance across different measurement
methods associated wit individual and groups (e.g., ethnicity)
of students, (b) different combinations of scores reflecting
students' performance levels in the context of accommodation,
and (c) the psychometric properties of these new scores or
profiles.
Anticipated Impact. Results of this study will
provide a better understanding of the challenges posed by
accommodation and possible ways to address these challenges.
Study III. Validity of Concept-Map Representations of Knowledge
Structure
Alternative assessments are intended to provide evidence
about what students know and can do in a subject matter. Performance
assessments in science, for example, provide evidence especially
about what students can do in carrying out investigations.
Other assessment techniques, such as concept maps, are supposed
to provide information on another aspect of science learning:
development of knowledge structures-representations of the
interrelation of important science concepts in students'
minds. Even though we know little about their psychometric
properties, concept maps have being used in large scale
assessment for this purpose (e.g., Lomask et al., 1992).
Background and Research Objectives. Our review
of literature has revealed the use of a myriad of mapping
techniques (e.g., Ruiz-Primo & Shavelson, in press),
varying the tasks presented to students, in the response required
from students, and in methods of scoring. We suspect that
these variations may tap different aspect of student's cognitive
structures even though they all are interpreted as representing
the same structure. Before concept maps are formally used as assessment
tools we need to investigate whether they tap important aspects
of the students' knowledge in a subject domain. Preference
for one or another technique should be based on the accuracy
of their cognitive representations, psychometric qualities,
and practicality.
The purpose of this study, then, is to investigate concept
maps as tools for large-scale assessment of students' knowledge
structures. Among the questions addressed are: Do concept
maps actually provide evidence on students' propositional
knowledge of a topic? Do maps provide reliable scores? Do
different mapping techniques tap the same aspects of students'
conceptual knowledge; that is, do concept maps varying in
task and format produce exchangeable representations of structure?
Technical Approach. This study will examine the
cognitive processes involved in and the exchangeability of
mapping techniques. The selection of mapping techniques will
be based on criteria developed in our review, including differences
in the cognitive demands required by the task, the structure
of content domain to be mapped and practicality of the methods
used. Since the focus of the study is on large-scale assessment,
techniques that require one-to-one interaction between student
and tester will be excluded.
High school students will be asked to construct a map
on, say, "atomic structure," after they have been taught
that unit, under three different mapping conditions that
vary in the flexibility afforded students in constructing
the map: (a) construct a map on paper using circles as concepts
and labeled lines connecting the concepts; (b) construct
a map using circles as concepts and labeled lines connecting
the concepts; (b) construct a map using "node" cards (i.e.,
cards with concept labels) so the student can physically move the
concepts around until a satisfactory structure is reached
and then draw the labeled lines between the "node" card (cf.
White & Gunstone, 1991); and (c) construct a map using
a computer program that allows students to manipulate icons.
A sample of student (n=30 in each condition) randomly assigned
to each condition and a random subsample (n=10) in each condition
will be asked to talk aloud as they complete the task to reveal
possible differences in cognitive process involved in constructing.
The mapping techniques will be compared as to their psychometric
(i.e., reliability and validity) and practical (e.g., training
time, construction time, scoring time) properties.
Anticipated Impact. This study will provide information
on the practical and psychometric characteristic of different
mapping techniques. It will recommend promising concept-mapping
techniques for large-scale assessment, if any.
List of publications & presentations acknowledging
award:
- Baxter, G.P., Elder, A.D., & Glaser,
R. (1995). Cognitive analysis of a science performance
assessment. CSE Technical Report 398. Los Angeles, CA:
UCLA-CRESST.
- Baxter, G.P., Glaser, R., & Raghavan, K. (1994).
Analysis of cognitive demand in selected
alternative science assessments. CSE Technical Report
382. Los Angeles, CA: UCLA-CRESST.
- Baxter, G.P., & Shavelson, R.J. (1994). Science
performance assessments: benchmarks and surrogates. International
Journal of Educational Research, 21, 267-297.
- Cronbach, L.J., Linn, R., Brennan, R., & Haertel,
E. (1995 summer). Generalizability analysis for educational
assessments. Evaluation Comment, 1-29.
- Dalton, B., Morocco, C.C., Tivnan, T., & Rawson,
P. (1994). Effect of format on learning disabled and non-learning
disabled students' performance on a hands-on science assessment.
International Journal of Educational
Research, 21, 299-315.
- Embertson, S.E. (1995). A measurement model for linking
individual learning to processes and knowledge: Application
to mathematical reasoning. Journal of Educational Measurement,
32, 277-294.
- Glaser, R., Bassock, M. (1989). Learning theory and
the study of instruction. Annual Review of Psychology,
40, 631-666.
- Glaser, R., Raghavan, K. & Baxter, G.P. (1992).
Cognitive theory as the basis for design
of innovative assessment: Design characteristics of science
assessments. CSE Technical Report 349. Los Angeles, CA:
UCLA-CRESST.
- Glaser, R., & Silver, E. (1994). Assessment
testing, and instruction: Retrospect and prospect. CSE
Technical Report 379. Los Angeles, CA: UCLA_CRESST.
- Messick, S. (1995). Validity of Psychological assessment:
Validation of inferences from persons' responses and performances
as scientific inquiry into score meaning. American Psychologist,
50, 741-749.
- Ruiz-Primo, M.A., & Shavelson, R.J. (in press).
Concept maps as potential alternative assessments in science.
Journal of Research on Science Teaching.
- Ruiz-Primo, Solano-Flores, Brown, Druker, & Shavelson,
1994.
Shavelson, R.J., Baxter, G.P., & Gao, X. (1993). Sampling variability
of performance assessment. Journal of Educational Measurement,
30, 215-232.
- Shavelson, R.J., Baxter, G.P., & Pine, J. (1991).
Performance assessment in science. Applied Measurement
in Education, 4, 347-362.
- Shavelson, R.J., Webb, N.M., & Burstein, L, (1986).
The measurement of teaching. In M. Wittrock (Ed.), Handbook
of Research on Teaching. New York: MacMillan.
- Solano-Flores, G., Ruiz-Primo, M.A., Baxter, G.P.,
& Shavelson, R.J. (in preparation) Science performance
assessments with language minority students. Stanford, CA:
Stanford University School of Education.
- White, R., & Gunstone, R. (1991). Probing understanding.
New York: Falmer Press.
Construct
Validity of Problem Solving Assessment
Award number:
UCLA 0070-G-9H813
Period of support: 6/15/1996 - 6/30/2001
Amount of award: $318,987
Project summary: This project seeks to capitalize on
our prior research showing that psychologically meaningful
and useful subscores can be obtained from conventional achievement
tests designed for large-scale educational surveys. The
prior research analyzed the NELS: 88 math and science tests
at 8th, 10th, and 12th grade levels, using both statistical
analyses of data from the national sample and small-scale
interview studies of local high school students to obtain
think aloud protocols associated with task performance. The
results show that these subscores: 1) represent important ability
distinctions in high school mathematics and science achievement;
2) show significantly different patterns of relationships
with instructional, course taking, and educational program
variables as well as gender, ethnicity, and other student
background variables; and 3) are derivable from multiple choice
tests not designed for this purpose. This in turn suggests
a new approach to test design and validation in which content
x process distinctions in test specification tables are subjected
to multivariate statistical and intensive cognitive analysis
and redrawn to identify component ability constructs explicitly.
For details on this work, see Kupermintz, Ennis, Hamilton, Talbert,
& Snow, (1995); Hamilton, Nussbaum, Kupermintz, Kerkhoven,
& Snow, (1995); Kupermintz, & Snow, (submitted); Nussbaum,
Hamilton, & Snow, (submitted); Hamilton, Kupermintz, &
Snow, (submitted); and also Hamilton, Nussbaum & Snow
(in press).
Background and Research Objectives. The next step
in this research is to examine the construct validity of such
distinctions in an expanded array of both multiple choice
and constructed response tasks. Of particular interest are
performance assessment tasks explicitly designed to assess
different kinds of complex knowledge and problem solving
in large scale survey tests such as those used in NAEP and
various state assessments. For example, our previous studies distinguished
reasoning and knowledge subscores in the NELS: 88 math knowledge
and reasoning subscores in the NELS: 88 science tests. The
results suggest that these or other constructs might be distinguishable
in tasks of the sorts now being developed for new performance
assessments and that performance tasks might be explicitly
designed to sharpen and improve such distinctions. Furthermore,
this possibility might exist in other subject-matter fields,
such as history and geography.
This prior results also show that instructional variables
such as teacher emphasis on understanding and higher order
thinking relate more to math knowledge development than
to math reasoning, and that the source of gender differences
in science may be located in the spatial mechanical reasoning
dimension of science problem solving tasks, rather than
in other aspects of science achievement. But we do not yet
understand the properties of math and science assessment
tasks that underlie such correlations. Construct validation research
must seek an understanding of such properties in the design
and evaluation of new as well as old forms of performance
assessment. Furthermore, such research may provide clues
for targeted instructional improvements. For example, if some
female students typically fail science assessment tasks requiring
spatial mechanical reasoning, understanding the properties
of such tasks may suggest how instruction can be revised to
improve the preparation of such students.
Other previous results indicate differential relations
between cognitive achievements subscores and measures of affective
and conative variable such as student motivation, anxiety
learning style, and self regulation. Thus, a new multidimensional
approach to achievement test validation should include affective
and conative as well as cognitive reference constructs. Previous
work in our project has provided a catalogue of such constructs
relevant to research on instruction (see Snow & Jackson,
1994; Snow, Corno, & Jackson, in press). But so far
there have been no systematic attempts to examine affective
and conative aspects of new forms of educational assessment.
In brief then, the primary objective of this study is
to determine if knowledge and ability distinctions previously
found important in high school math and science achievement
tests occur also in other multiple choice and constructed
response assessments, particularly in those used in large-scale
educational surveys. A second objective is to examine alternative
assessment designs that would sharpen and elaborate such knowledge
and ability distinctions in such fields as math, science, and
history-geography, and suggest instructional intervention strategies
related to them.
Technical Approach. Our procedure builds on the
previous large-scale statistical analyses of multiple choice
and constructed response tasks from NESL: 88. We would extend
our local interview procedure to examine constructed response
tasks chosen from NAEP, but also similar tasks specifically
designed to afford use of different cognitive structures and
processes to bring out particular knowledge and ability distinctions.
We would also administer cognitive tests and conative-affective
questionnaires to groups of local high school students, to
select students who show different cognitive-conative-affective
aptitude profiles, and thus to obtain interview think-aloud protocols
on the selected tasks from students with known attitude profiles.
Our interview procedures are designed to obtain in- depth
verbal descriptions of student thinking as they work through
constructed response tasks. These procedures are described
in Ennis, Kerkhoven, and Snow (1993) and Hamilton, Nusbaum,
and Show (in press). Small-scale instructional interventions
will be designed as miniature experiments or case studies.
Statistical procedures for large scale analysis rely on routine
correlational and regression methods as well as new methods
for item factor analysis provided by Bock (see Bock, Gibbons,
& Muraki, 1988).
Time Line. Year 1: June 1996-May1997. Complete
review and analysis of NELS: 88 math, science and history
survey data on knowledge and ability distinctions and their
cognitive and affective correlates. Review existing NAEP
and other large scale assessment instruments to identify
tasks in which these and other such distinctions might occur.
Plan experimental redesign of chosen tasks. Year 2: June
1997-May 1998. Plan and conduct local high school study
of assessment instruments along with test and questionnaire measures
of cognitive and affective reference constructs. Select subsamples
with different aptitude and achievement profiles and conduct
interview study of knowledge and reasoning contrast. Year
3: June 1998-May1999. Redesign tasks to sharpen contrasts.
Replicate Year 2 study adding redesigned tasks and refinement
of reference measures. Again conduct subsample interview study.
Year 4-5: June 1999-May 2001. Summarize
validation evidence and implications for review by local teachers.
Conduct survey and interview study of local teachers to explore
links between knowledge and ability distinctions in assessment
tasks and instructional tasks and practices. Complete major
monograph on multidimensional knowledge and ability components
in achievement assessment, the redesign of assessment tasks
to represent them validly in large scale surveys, and their
use in targeting and evaluating local instruction in high
school math and science.
Anticipated Impact. This study is expected to influence
the way especially large-scale achievement assessments design
and validate tests. It provides the concepts and methodological
tools that will better align interpretations of achievement
test scores with the evidence of what they are measuring.
List of publications & presentations acknowledging
award:
- Bock, D., Gibbons, R., & Muraki,
E. (1998). Full-information item factor analysis. Applied
Psychological Measurement, 12, 261-280.
- Ennis, M., Kerkhoven, J.I.M., &
Snow, R.E. (1993). Enhancing the validity and usefulness
of large-scale educational assessments (Report No. P93-151).
- Stanford University, Center for Research
on the Context of Secondary School Teaching.
- Kupermintz, H., Ennis, M.N., Hamilton,
L.S., Talbert, J.E., & Snow, R.E. (1995). Enhancing the
validity and usefulness of large-scale educational assessments:
I. NELS: 88 mathematics achievement. American Educational
Research Journal, 31, 525-554.
- Hamilton, L.S., Nussbaum, E.M., Kupermintz,
H., Kerkhoven, J.I.M., & Snow, R.E. (1995). Enhancing
the validity and usefulness of large-scale educational assessments:
II. NELS: 88 science achievement. American Educational
Research Journal, 31, 555-581.
- Kupermintz, H. & Snow, R.E. (submitted).
Enhancing the validity and usefulness of large-scale educational
assessments: III, NELS: 88 mathematics achievement to twelfth
grade.
- Nussbaum, E.M., Hamilton, L.S., &
Snow, R.E. (submitted). Enhancing the validity and usefulness
of large-scale educational assessments: III NELS: 88 mathematics
achievement to twelfth grate.
- Hamilton, L.S., Kupermintz, H., &
Snow, R.E. (submitted). Enhancing the validity and usefulness
of large-scale educational assessments: V, NELS: 88 mathematics
and science achievement and affective interrelationships.
- Hamilton, L.S., Nussbaum, & Snow,
R.E. (submitted). Interview procedures for validating science
assessments.
- Snow, R.E. & Jackson, D.N. III (1994).
Individual differences in conation: Selected constructs
and measures. In H.F. O'Neill, Jr. and M. Drillings (Eds.)
Motivation: Research and Theory.
(Pp.72-99) Hillsdale, N.J.: Lawrence Erlbaum Associates.
- Snow, R.E., Corno, L, & Jackson,
D.N. III (1995). Individual differences in affective and conative
functions. In D.C. Berliner & R.C. Calfee (Eds.)
Handbook of Education Psychology. New
York: Macmillan.
On Enhancing
Teachers' Formative Assessment Practices: The Case for Science
Journals
Award number: NSF ESI-9910020
Period of support: 10/1/1999 - 9/30/2001
Amount of award: $100,000
Project summary: The purpose of this project is to
explore the use of students' science journals as a staff development
tool to improve formative assessment practices in science
classrooms at the elementary level by: 1) creating a conceptual
and practical framework that can guide effective use of journals
by teachers and students; 2) creating a pilot teacher enhancement
program (Ruiz-Primo, 1994) that, by using science journals
as an example of a classroom assessment, helps teachers reflect
on their formative assessment practices and provides them
with a framework that can be used to improve this practice;
and 3) exploring the impact of the implementation of the program
on the teachers' formative assessment practices and on student
performance.
Two products will result from this project, a preliminary
conceptual framework on the use of journals as a formative
assessment tool and a pilot teacher enhancement program (TEP)
for improving teachers' formative assessment practices. The
next step in this project will be the revision, implementation,
evaluation, and dissemination of the TEP in a larger context
as well as explore science journals as an assessment tool
for self- and peer-evaluation.
List of publications & presentations acknowledging
award:
- Min Li, Ruiz-Primo, M.A., Ayala, C.C., Shavelson, R.J.
(2000, April). Study of the reliability and validity
of inferring students' understanding from their science
journals. Paper presented at the American Educational
Research Association (AERA) Annual Meeting, New Orleans.
- Ruiz-Primo, M.A., Li, M., Ayala, C.C., & Shavleson,
R.J. (2000, April). Students' science journals as an
assessment tool. Paper presented at the American Educational
Research Association (AERA) Annual Meeting, New Orleans.
Multidimensional
Student Assessments for High School Mathematics and Science
Award number: NSF REC-9628293
Period of support: 8/15/1996 - 7/31/2001
Amount of award: $786,433
Project summary: This project develops a multidimensional
construct validation approach to the design and analysis
of student achievement assessments in high school mathematics
and science. The general aim is to elaborate the indicator
measures of student achievement and thus to enhance the
validity and usefulness of large-scale educational surveys.
Specific objectives are to: 1) identify knowledge and reasoning
components of the math and reasoning tests, both multiple-choice
and constructed response, used in the NELS:88 High School Effects
Study (HSES) and related these to knowledge and reasoning
components previously distinguished in the NELS:88 regular
national sample; 2) analyze the HSES constructed response
tasks in relation to student gender and ethnic differences,
as well as other student background and instructional variables;
3) extend the construct validation approach to selected items
from the National Assessment of Educational Progress (NAEP);
4) develop improved guidelines for other investigators using
NELS:88 data to apply multidimensional scoring of student
tests.
The results of this study should help develop improved indicators
of student achievement in science and mathematics by distinguishing
educationally and psychologically important components of
total scores and showing their differential relation to student
gender, ethnicity, other student background variables, and
course-taking and instructional differences. They should
also help develop the methodology of item construct validation
for the improvement of achievement assessments in future national
and international surveys.
List of publications & presentations acknowledging
award:
- Hamilton, L.S., Nussbaum, E.M., & Snow, R.E. (1997).
Interview procedures for validating science assessments.
Applied Measurement in Education, 10,
181-200.
- Kupermintz, H., & Snow, R. E. (1997).
Enhancing the validity and usefulness of large-scale educational
assessments: III. NELS:88 mathematics achievement to 12th grade.
American Educational Research Journal, 34, 124-150.
- Nussbaum, M., Hamilton, L.S., &
Snow, R.E. (1997). Enhancing the validity and usefulness
of large-scale educational assessments: III. NELS:88 Science
Achievement to 12th Grade. American Educational Research
Journal, 34(1), 151-173.
- Hamilton, L.S. (1997) Identifying
differential item functioning on science achievement tests.
Paper presented at the annual meeting of the National Council on
Measurement in Education, Chicago
|