The Evaluation Team collected students' oral proficiency data throughout the evaluation process. Students' oral proficiency in the target language was examined by three different means of assessment. These oral proficiency measures are: teacher FLOSEM ratings, student language self-ratings, and COCIs completed with the case study students. First, classroom language teachers were asked to assess every student's oral proficiency by means of the Stanford FLOSEM three times a year during the 1994-95 and 1995-96 school years. Second, every high school student was asked to self-rate their own oral proficiency level using the Stanford FLOSEM at the end of the 1995-96 school year and also asked to complete another self-rating proficiency scale for four language skills developed by the Evaluation Team (Sung Language Assessment Questionnaire). Third, language evaluators visited each school site in Spring and conducted face-to-face interviews with our designated case study students using the COCI.
The results of the oral proficiency assessment will be described to show how much progress students made during one school year and also across the various language levels of instruction. Growth in language proficiency will first be described by each of the assessment instruments and the correlation between the different types of assessment instruments will then be examined.
Classroom language teachers were asked to rate their students' oral proficiency using the FLOSEM in September, January, and May. Since some teachers were unable to complete the mid-year rating, only the September and May FLOSEM ratings for the 1995-96 school year were compared. Proficiency scores were collected from one thousand three hundred nineteen (1,319) students at the beginning of the school year and from one thousand one hundred eighty-nine (1,189) students at the end of the school year. Comparisons of the two FLOSEM scores were computed by the paired t-test procedure and the results showed that students made significant progress in their oral language proficiency within one school year, t (1,131) = 24.747, p < .0001. Significant improvement in oral proficiency ratings were found for both the elementary school students,[t (369) = 7.237, p < .0001], and high school students, [t (740) = 27.035, p < .0001]. Detailed analyses of the teacher FLOSEM ratings between the beginning and end of the school year by each language program showed that students in most of the language programs made significant gains in oral proficiency (p < .05 level) within a school year, except for the elementary and middle school Cantonese programs. Mean FLOSEM scores collected at the beginning and end of the school year with t-test results for significant differences are shown in Table 2.
In Table 2, students' proficiency growth in each level of instruction was also clearly noticed. Students in levels from I through IV in most language program showed a significant increase [minimal p < .05] in oral proficiency from September to May. Significant growth in oral proficiency was not observed, however, in high school Russian levels I and III and in the third (3rd) and fifth (5th) grades of the elementary Japanese programs where teachers recorded little or no growth in the target language from September to May.
Growth in Target Language Across Language Levels
Students' FLOSEM scores were also examined by their level of instruction. For high school students, FLOSEM scores collected at the end of the 1994-95 and 1995-96 school years were analyzed by their level of instruction (e.g., Japanese I and Japanese II). One-way Analysis of Variance (ANOVA) results showed that there was a significant difference on the FLOSEM scores by instructional level, F (4, 601) = 334.093, p < .0001 for the 1994-95 school year, and F (4, 535) = 278.199, p < .0001 for the 1995-96 school year, indicating that there was a significant cross-sectional increase of students' oral proficiency by language level.
Thus, steady improvement in students' target language proficiency as assessed by the teachers' FLOSEM ratings across levels of instruction was found as presented in Table 2. An important feature of the growth between levels of instruction that requires highlighting is noted between language levels III and IV for the high school Mandarin, Korean and Russian programs. Although the sample sizes are extremely small for two language programs [Mandarin level IV (N=3) and Russian level IV (N=2)], the enrollment in Korean level IV was large (N=29) to argue that in these level IV classes the large FLOSEM increase was due most likely to the fact that more language heritage students are enrolled in these advanced language classes. For example, over 90% of the students in the Mandarin, Korean and Russian level IV classes were from heritage language background households. This contrasts with the fact that in Japanese level IV classes only about 10% of the students were either Japanese full or mixed heritage students.
Growth in Target Language Proficiency for Continuing Students
For students who continued in their language instruction from the 1994-95 to 1995-96 school year, FLOSEM ratings taken at the end of each school year were compared to examine language development across language levels. A paired t-test comparison was computed for the 569 continuing students for whom ratings were available for the two-year period. The results showed that continuing students regardless of language level made significant progress in their oral proficiency attainment between the end of the 1994-95 and the 1995-96 school year, t(568) = 16.266, p < .0001. Significant growth in target language proficiency was found for both elementary and high school students [t(220) = 2.193, p = .029 for elementary students, and t(339) = 25.831, p < .0001 for high school students]. As noted in Table 3, detailed analyses on growth in foreign language oral proficiency for continuing students by each language program and instruction level showed that students in most of the language programs and levels made significant improvements (p < .05 level) across two school years except for the elementary Cantonese program and several of the high school levels. We had FLOSEM ratings for only three continuing students in the elementary Cantonese program and although their ratings showed an increase across school years, there were too few students for statistical purposes.
In taking a closer look at the Russian program, a group of students from one school site repeated the level I class, and the FLOSEM ratings for the two academic years showed that they did not significantly improve in oral proficiency across the two-year period. Also, students who advanced from Russian level II to level III showed some improvement in their FLOSEM mean ratings increasing from 9.75 to 10.38, but the increase was not statistically significant. The minimal growth in oral proficiency among students in Russian classes contrasted with the significant improvements found with high school students in the other language programs. Table 3 shows the mean FLOSEM ratings collected for all students in all language programs and levels at the end of the 1994-95 and 1995-96 school years along with t-test results for significant differences.
Comparisons of 1994-95 and 1995-96 Target Language Proficiency Development
Other than comparing individual student progress within and over the school year by means of paired t-tests, the language program's "Overall Proficiency Level" was also examined by comparing this year's language program proficiency results with last year's student outcomes. Table 4 summarizes each language program's FLOSEM ratings by the various language levels for the 1994-95 and 1995-96 school years. Perusal of Table 4 shows large gains in student oral proficiency for the high school Japanese language programs. The proficiency changes were especially apparent in Japanese levels II and IV. Another very noticeable finding in our data was an enrollment increase, especially in Japanese language programs. Although some enrollment increases were observed in the Mandarin and Korean programs, the sharpest enrollment increase was found in the Japanese language programs which went from 310 students in 1994-95 to 466 students in 1995-96.
High School Students' Self-Rated Proficiency
A total of 708 high school students participated in the self-rating proficiency task -- 407 students in Japanese, 72 in Mandarin Chinese, 148 in Korean, and 81 in Russian. Students' self-ratings on the FLOSEM were compared by means of a Pearson Product Correlation with their teacher's FLOSEM ratings scores collected at the same time. When students' ratings across all language programs were compared to those of their teachers', findings revealed that the students' self-rated FLOSEM scores and their teachers' FLOSEM ratings were significantly correlated, r = .708; p < .0001. Significant correlations between students' and teachers' ratings were also found for all four language programs: r = .567; p < 0001 for Japanese; r = .435; p < .001 for Mandarin; r = .767; p < .0001 for Korean; and r = .621; p < .0001 for Russian classes (see Table 5).
An interesting finding in these rating scores was the fact that students' self-ratings (Mean = 15.121) were consistently higher than the teachers' ratings (Mean = 13.056) and the difference between students' and teachers' ratings proved to be statistically significant, t (1, 646) = 11.143; p < .0001. As can be noted in Table 5, the rating differences between students and teachers were greater for students in the lower instructional levels (Levels I, II and III) but for upper level students there was almost no rating difference between students and teachers. A similar difference was found in every language program, except the Mandarin Chinese programs, where the teachers' ratings were higher than students', not significantly though. It is clearly depicted in Figure 1. Table 5 provides detailed information of student-teacher rating differences in each language program by levels of instruction.
Self-rated proficiency ratings on the Sung Language Assessment Questionnaire were subjected to a confirmatory factor analysis to determine whether the four blocks of items [speaking, listening, reading and writing] demonstrated the same statistical consistency with the high school students as they did with the original sample of adults for whom the scale was initially designed. This statistical procedure is a necessary step to ensure the validity of this type of questionnaire. Factor analysis confirmed that the 10 items which formed each of the four language clusters loaded as predicted and that the measure could be used with the high school data.
The eigenvalues for a one-factor solution for each of the four scales were very high (over 5.0) and percent of total variance explained for each scale was: Listening (56.893%), Speaking (63.004%), Reading (65.622%), and Writing (69.533%). Mean scores for each language skill on the Sung Language Assessment Questionnaire were computed for all participating students. Pearson correlations were computed to examine the inter-relationship between mean scores on each of the four language skills. Inter-correlations between all four language skill areas were significant (p < .0001 level), and this was true for all four language programs. When students' self-rated proficiency ratings were examined by instructional level, the inter-correlations among the four language skills were again high and significant (p < .001 level) for all instructional levels.
Students' self-ratings on the Sung Language Assessment Questionnaire were then compared to their self-rated FLOSEM scores. Self-rated FLOSEM scores correlated significantly (p < .0001) with all four self-rated scales on the Sung Language Assessment Questionnaire. Significant correlations between the two instruments were also obtained when self-rating scores were examined for each of the four language programs. In every case, the correlations were highly significant (p < .0001). However, students' self-rated proficiency on the four language skills did not always correlate significantly with teachers' FLOSEM ratings.
Self-rated FLOSEM scores and self-rated proficiency on the Sung scale for the four language skills were examined by language program type, level of instruction, and students' ethnic heritage background. First, students' self-proficiency was significantly different by the level of language instruction, F (4, 670) = 106.181, p <. 0001. The higher the instructional level, the higher the self-rated FLOSEM scores (see Figure 1). This instructional level difference (on the self-rated FLOSEM) was found between every level of instruction from the first year through Level 5 of language instruction. The significant instructional level difference was also found with self-rated scales on the four language skills of the Sung Language Assessment Questionnaire (see Figure 2) , F (4, 679) = 69.530, p < .0001 for listening comprehension; F (4, 680) = 72.404, p < .0001 for speaking; F (4, 679) = 60.713, p < .0001 for reading comprehension; and F (4, 677) = 80.094, p < .0001 for writing. Instructional level differences on self-rated scores for these four language skills were found even when the significance level (p < .001) was set very stringently. The only contrast that was not significant was between Level 2 and Level 3. There was no significant difference on self-rated proficiency on the Sung scale between the second and third language levels and this was true across all four language skills. However, there was a significant difference on the self-rated FLOSEM between these two levels of language instruction, p = .004, and also on the teachers' FLOSEM ratings, p < .0001.
Students' self-proficiency was significantly different by language program type. The significant difference was found with self-rated FLOSEM, F (3, 693) = 92.238, p < .0001, and also with the Sung instrument for all four language skills: listening [F (3, 703) = 94.178, p < .0001]; speaking [F (3, 704) = 83.400, p < .0001]; reading [F (3, 703) = 83.001, p < .0001]; and writing [F (3, 701) = 86.785, p < .0001]. These significant results were all due to the fact that the proficiency level for students in the Korean program was significantly higher than those of the other three language programs (see Figure 3).
Students' ethnic heritage was also entered into the analysis of students' self-rated proficiency. There was a significant difference on the students' self-rated proficiency by their ethnic heritage. First, FLOSEM self-rating scores differed by ethnic background, F (2, 503) = 86.799, p < .0001, with the highest FLOSEM ratings for ethnic heritage students (M = 19.876), followed by mixed heritage students (M = 14.208), and with the lowest ratings given by non-ethnic students (M = 13.537). The same pattern of significant proficiency differences were obtained on the Sung instrument for the four language skills: listening [F (2, 510) = 105.730, p < .0001]; speaking [F (2, 511) = 78.879, p < .0001]; reading [F (2, 510) = 94.729, p < .0001]; and for writing [F (2, 509) = 92.986, p < .0001] (see Figure 4). Tukey's HSD multiple comparisons showed that proficiency of ethnic heritage students was significantly higher than that of the other two groups, whose proficiency was not significantly different from each other.
COCI Ratings by Language Evaluators
Students' communicative proficiency was also measured by means of the Classroom Oral Competency Interview (COCI) during the last several weeks of the 1994-95 and 1995-96 school years. Since this method of measuring oral proficiency requires a face-to-face interview lasting for around 7 minutes, COCI interviews were conducted only with our six case study students from each language program and language level. The Evaluation Team's language evaluators visited each school site in the later part of May to conduct the COCI interviews. COCIs were administered to a total of 375 students over the last two years: 199 students in the 1994-95 school year and 245 students in the 1995-96 school year. Of these, there were 71 continuing case study students for whom we have COCIs for both school years. These continuing students were all high school students because in the 1994-95 school year, COCI ratings were not assigned to elementary school students after the interviews were completed because technically the COCI was designed for use with students in high school level 2 or higher foreign language classes. However, in 1995-96 we decided to use the COCI rating scale even with the elementary level students.
The COCI scale has ten grades and each grade was assigned a numeric code that could then be used in statistical analysis. The numeric scale which was devised for COCI ratings extended from 1 for "Prefunctional", 2 to 4 for "Formulaic Low" to "Formulaic High", from 5 to 7 for "Created Low" to "Created High", and from 8 to 10 for "Planned Low" to "Planned High." Overall, 1995-96 COCI ratings for case study students (M = 3.208; n = 245) were slightly lower than the 1994-95 ratings (M = 3.538; n = 199). Even after deleting those elementary school students who received low ratings on the COCI and who did not have ratings in the 1994-95 school year, the 1995-96 COCI ratings were still lower (M = 3.450; n = 189) than the 1994-95 ratings. Detailed COCI scores by each language program are shown in Table 6. Ratings for each language program show that 1995-96 ratings are lower than 1994-95 scores, except for the Mandarin programs where the most recent COCI ratings were slightly higher than in the previous school year. The lower portion of Table 6 also provides information on the continuing 71 case study students' COCI ratings. For all continuing case study students, COCI ratings were significantly higher this year than in 1994-95, t (70) = 3.972, p = .001. However, when comparisons are made by each language program, the only significant difference found was that between ratings in the two time periods for the Russian language students.
Using the COCI ratings shown in the top portion of Table 6, a one-way analysis of variance was computed for language program type for each school year separately. A significant difference was found for each school year: F (3, 195) = 10.215, p < .0001 for 1994-95 COCI ratings and F (3, 241) = 16.051, p < .0001 for 1995-96 ratings. Multiple comparisons of each language program type yielded findings showing that the Korean program recorded significantly higher ratings than all other language programs (p < .0001). However, no other comparisons were significant. This finding was identical in both time periods. The significant difference was due to the fact that case study students in the Korean program had significantly higher proficiency ratings on the COCI than students in the other language programs.
Since the COCI was administered to case study students by language evaluators from the Evaluation Project, the COCI ratings were next correlated with the classroom teachers' proficiency ratings of these same students using the Stanford FLOSEM. The top portion of Table 7 depicts the correlations between the COCI and FLOSEM ratings for the 1994-95 school year by language program. All of the correlations, except for the Russian program (r=.493), were positive and greater than r=.750 as can be seen in Table 7, and all attained statistical significance. Similar correlations computed for the 1995-96 school year resulted in similar findings even with students in the Russian programs. These correlations can be found in the lower portion of Table 7.
Documentation of language acquisition and growth across instructional time was the central concern of this chapter. Using two rating measures -- Stanford FLOSEM and Sung Language Assessment Questionnaire -- and the COCI, we found that with few exceptions students in all four language programs, at all grade levels and levels of instruction, and on each of our language assessment instrument did show improvement within a school year (September to May) and across school years. The FLOSEM ratings obtained from teachers indicated that teachers saw improvement in their students oral proficiency in the language of instruction. Since the teachers knew that their program was being evaluated, it is possible that teachers were biased in favor of positively evaluating students' oral performance on the September to May FLOSEM ratings. However, the fact that students' own self-ratings on the FLOSEM were correlated with their teachers lends confidence to the validity of the FLOSEM as a measure of oral proficiency and as a technique for assessing growth in the language following one and two years of instruction.
The fact that the students' self-ratings on the Sung Language Assessment Questionnaire also correlated with their FLOSEM provides another layer of evidence that both teachers and students recognized the growing linguistic competence shown by students learning a LCTL. Finally, the COCI ratings obtained by independent language evaluators confirm that case study students across all language programs and levels of instruction do make progress in becoming orally proficient in the language that they are learning in school. Again, high and significant correlations between the COCI and teacher ratings on the FLOSEM provide concurrent validation of both instruments as easy to use measures to assess oral proficiency.
Finally, as a cautionary note, it is important to mention that there were some statistical comparisons which failed to attain significance. This is due to extreme heterogeneity of teachers, students, languages and levels of instruction in the data. Nonetheless, most students it seems at all grade levels and in most language by level of instruction combinations showed progress in learning a LCTL. Also observed in the findings was a significant difference of ethnic heritage students on oral proficiency attainment when compared to non-ethnic heritage students. This attests to the linguistic advantage that ethnic heritage students may have in a language program if they already know the language prior to instruction and/or if they are able to obtain assistance in the language from someone other than the teacher. We will address issues of ethnic heritage background in the results chapters that follow.
Home | Organization | Map | Events and Discussions | Resources | Standards and Frameworks