| A Comprehensive System for the Evaluation of Schools |
William
J.Webster |
School Effectiveness Indices
The final tier of the accountability system is the most important from the standpoint of defining and rewarding outstanding schools. Inherent in the task of identifying outstanding schools are two complex issues:
how to define effectiveness, and
how to develop a model to assess effectiveness.
For accountability purposes, the only fair and equitable method of comparison among and between schools is one that statistically adjusts the outcome variables by the important inputs that relate to those outcomes but are not under the control of the schools. The difference between predicted and actual achievement can then be interpreted as a comparison with other statistically similar schools, and as the school's own effect on achievement. It is important to note that a longitudinal data base is necessary for these types of studies since cohorts must be used in the analyses.
The Anatomy of Effectiveness Indices
The school effectiveness methodology, as implemented in the Dallas Independent School District, defines a school's effectiveness as being associated with exceptional measured performance above or below that which would be expected across the entire District. When a school's population of students departs markedly from its own preestablished trend or from the more general trend of similar students throughout the District, this departure is attributed to school effect. The problem of measuring a school's effect, then, becomes one of establishing the student levels of accomplishment on the various important outcome variables, setting levels of performance based on these expectations, and determining the extent to which its students, on the average, exceed or fall short of expectation. The procedures involve the use of hierarchical linear modeling with student level variables and multiple regression analysis with aggregate school level variables to compute prediction equations by grade level for each outcome variable and then using those equations within schools to obtain gains over expectations. Relative weights are assigned to the outcomes by the Accountability Task Force. Once weighted levels of performance have been determined, the methodology provides an indicator of how well a school performs relative to other schools throughout the District. The same targets that were used in the School Improvement Plan and District Improvement Plan processes are used as outcome variables in the school effectiveness indices. Thus schools work on improving target variables in an absolute sense through their School Improvement Plans and are judged both in terms of meeting their goals and in terms of a normative rank through the effectiveness indices. The effectiveness indices are also used to establish meaningful targets for the School Improvement Plan.
School performance on the effectiveness indices is considered in terms of overall District patterns on the important outcome variables. If the District experiences a year of greatly increased achievement, individual school ranks on the effectiveness indices are not so important as long as improvement is shown. However, because of the link between the effectiveness indices and School Improvement Plan goals, should such a year occur most schools would meet their School Improvement Plan objectives. The emphasis of the methodology is on the valid identification of effective schools. Once effective schools are reliably and validly identified, detailed studies can be done of the process variables that contributed to their effectiveness.
The first step in developing the effectiveness methodology involves what educational practitioners have called "leveling the playing field". The Accountability Task Force , as well as most practitioners, was extremely concerned that all schools, regardless of the students that they served, had an opportunity to rank high on the effectiveness indices if they improved. Thus, the first step in developing the equations was to eliminate the variance in outcomes associated with student contextual variables over which the schools had no control. To accomplish this each outcome and predictor variable was regressed on a set of important student background variables and their interactions to produce a set of residuals for each of the predictor and outcome variables. (Webster, Mendro, and Almaguer, 1992).
The basic OLS regression model is generated from the standard OLS equation. This is represented by equation 1 for student-level variables:
Using this model, the Y represents any of the predictor or outcome variables in the system. The Xs represent predictor variables used in the first-stage equations. (These values are student demographic variables without reference to school at the moment.) After a solution is found for each X, the model is solved for each student and the value of the residual ri is determined. This value of r represents the portion of the students score that can be attributed to background variables plus any individual error for the student on the particular outcome measure Y. This equation is solved for each of the possible Y variables and the student residuals determined for each student. Student level variables included in the first stage are:
The stage one equations appear as follows:
The reader will note that the first stage OLS regression equations include first and second level interactions. These equations account for between nine and twenty percent of the variance in student achievement.
Hierarchical linear modeling is then used on the residuals of both the outcome and predictor variables. Student level equations are developed utilizing individual student data rather than school means. Satisfactory prediction was achieved in all cases without having to go back more than one year (R2 >.70). This maintained the degrees of freedom associated with the equations. A previous model that was utilized by the District in 1984 used a variant of time-series analysis, but since this model required at least three years of historical data, it suffered from severe subject mortality due to a high student mobility rate (Webster and Olson, 1988). Sanders and Horn (1995) use five years of data in their equations in the Tennessee value-added system but they estimate missing data points as part of their procedures.
The standard equations for the random effects HLM model are given in equations 2-4 for a single level 1 predictor and a single level 2 conditioning variable. Note that level 1 contains a model of school level data. The two types are modeled simultaneously in an HLM model.. As of data in the case of the OLS regression model, these equations can be expanded by the inclusion of more level 1 student predictor variables (X) and the inclusion of more level 2 school conditioning variables (W). School effects are estimated directly from shrinkage-adjusted empirical Bayes residuals resulting from the application of the HLM model (Bryk and Raudenbush, 1992). A series of research papers developed by Dallas staff contain more explicit formulations of the model under many different conditions. The interested reader is referred to Webster, et. al., (1995,1996,1997,1998), Mendro, et. al., (1995), Orsak, et. al., (1997), or Weerasinghe, et. al., (1997), for more detailed models and discussions of these applications.
The HLM models utilized in Dallas are two-stage, two-level random models that include a number of school level contextual variables. These variables include:
The stage two equations appear as follows:
To summarize, the models have the following steps:
- School variables are predicted in a regular OLS regression using two years of prior outcome variable data. Effectiveness scores are computed from the residuals of the regression. School level variables have not been discussed in this paper but involve the use of basic OLS regression models to obtain school level residuals. (For details about the school level models see Webster,et.al., 1998.)
- Student variables are predicted from two-stage, two-level modified OLS regression and HLM models.
- The first stage of the student variable process regresses outcome variables and prior predictor variables against student-level concomitant variables, adjusts the residuals for homogeneity, and provides residuals for the HLM stage.
- The second stage of the student variable process uses one year of prior level residuals from the first stage to predict the outcome residuals from the first level in a two-level HLM random effects model with an array of school-level conditioning variables at the second-level.
- The results of each HLM analysis by student outcome variable and the school-level outcome variable OLS regressions are standardized and weighted by Accountability Task Force determined weights.
- The weighted results are combined to give a total school effectiveness estimate for each school.
The Accountability System includes a number of criterion variables. Student level variables include Iowa Tests of Basic Skills,grades k through 8, reading and mathematics; Tests of Achievement and Proficiency,grade 10, reading and mathematics; student attendance; Texas Assessment of Academic Skills,grades 3 through 8 and 10, reading, mathematics, writing, social studies, and science; Texas Assessment of Academic Skills-Spanish, grades 3 through 6, reading, writing, and mathematics; Spanish Assessment of Basic Education, grades 1 through 6, reading and mathematics; Assessments of Course Performance, grades 9 through 12, reading/language arts, mathematics, social studies, science, world languages, and ESOL; Woodcock-Munoz Language Survey, grades 1 through 6; Scholastic Aptitude Test, verbal and quantitative; American College Test; and Preliminary Scholastic Aptitude Test, verbal and quantitative. School level variables include promotion rate, graduation rate, percent tested on the Preliminary Scholastic Aptitude Test, the Scholastic Aptitude Test, and the American College test, dropout rate, percent of students enrolled in Pre-honors and honors courses, percent of students enrolled in advanced placement courses, and percent of students passing advanced placement examinations.
Authentic Assessment and Performance Testing
Schools are encouraged to use portfolios, protocol analysis, and other forms of authentic assessment in monitoring their programs. This information can then be used to provide evidence of accomplishment in instances where the more standard types of assessment fail to show progress. Performance testing was being built into the District's Assessment of Course Performance (ACP) tests. The ACPs are final standard examinations in 143 courses, grades 7-12. One hour was to be multiple choice while the other hour was to be performance tests. The performance testing was vetoed by school administrators because it was too much work and it did not help their students drill and practice for the state test.
While it is not certain that the necessary reliability across scorers on the performance tests is attainable, it is important that the message be communicated to teachers that the kinds of skills and activities measured by performance tests are the kinds of skills and activities that the District wants them to teach their students. Early evidence on performance tests suggests that they are much more difficult than the average multiple choice tests (Dryden 1991). Figure 2. shows the formative and summative data currently available to the schools. Indicators that are collected centrally and provided to schools are specified with an "E". Formative indicators that should be part of a school's "action research" process are specified with a "C".