By: Dr. Peter M. Nelson—Penn State University College of Education
Testing is not a new concept in education—we’ve used tests in some form since children were first crowded into single room schoolhouses. Yet it’s difficult to ignore the increasingly prominent role of assessment in education. Schools are investing more time and resources than ever in different forms of assessment.
It’s important to remember the reasons behind all this testing: assessment data are designed to improve learning outcomes for students. When it comes to taking advantage of assessments, schools need (1) assessments that tend to provide technically defensible scores (i.e., reliable and valid) and (2) people on staff who know how to use and interpret those assessments. If both of those needs are not met, schools must work to make sure they are. Below are a few reminders for those who select, administer, or interpret assessments in schools.
When we are faced with an important decision, we often spend a fair amount of time thinking about the issue at hand. For example, if we plan to buy a car, we might research the quality of different make and models, or get familiar with some basic dealership terminology (e.g., what the heck does MSRP stand for? Do I need to know it before I look at a car? [Yes. You do.]). It’s unlikely that we will buy a car without doing some homework, and the same is true for selecting a test.
When evaluating a potential test, you’ll need to focus on two key components: reliability and validity.
- Reliability deals primarily with the consistency in scores. In other words, tests need to demonstrate that they are accurately measuring. A test developer may do this by demonstrating that scores are consistent within the same test, across multiple versions of a test, and on the same test across a short period of time.
- Validity evidence should establish that a test is not just measuring something, but that it is measuring the right thing. For example, if you’re interested in adopting a new math test, there should be evidence that the test you are interested in is related to other commonly accepted assessments of math achievement.
There are at least two ways you can do your homework on the technical quality of an assessment. First, most publicly available assessments are associated with a technical document (for example, click here to request the FAST technical manual). While technical documents may have a lot of technical jargon, they generally provide a description of different reliability and validity concepts as well as the existing evidence for the use and interpretation of a test. If you’re looking for help to interpret the jargon, you might consider connecting with someone who has some training in that area (e.g., a school psychologist).
Second, just as we might turn to reputable sources when deciding on a car (e.g, Consumer Reports or Kelley Blue Book), we can access similar sources for researching assessments. For example, the National Center on Response to Intervention (RtII) provides a user friendly summary of reliability and validity evidence for commonly used assessments for screening and progress monitoring. These summary tables at NCRtI are a great spot for educators to learn more about commonly-used assessments.
Fidelity in Administration: Assessment knowledge also includes a strong understanding of how to administer and (when applicable) score the assessments. Remember all those fancy statistics about reliability and validity you made your school psychologist review? They don’t mean much unless you use the test for its intended purpose and follow the guidelines for administration.
Whether you are using the data for screening or progress monitoring purposes, it’s likely that there are guidelines for that assessment’s practical use. The farther you stray from those guidelines, the less meaningful your results will become. It’s likely that you’ll find the guidelines in the technical document mentioned above.
Know the Basics for Interpretation. When making decisions with data, it’s helpful to understand what various scores mean (and don’t mean). In an increasingly data heavy context, knowing how to interpret assessment data is a skill that is directly relevant for all stakeholders at the school.
While everyone has had some experience with tests, we’ve come a long way since “percent correct” and it’s useful to know how to understand all the different kinds of data that are tied to assessments. With just one administration, teachers may be able to consider a student’s performance relative to a content-related criterion (e.g., what skills has he or she mastered?) and several normative criteria (e.g. how did he or she perform relative to same-aged peers at the national, state, district, and/or school level?). Without recognizing the potential utility of those comparisons, it may be difficult to make an informed decision about service delivery. Similar comparisons are necessary when interpreting student growth on multiple administrations of an assessment. As with other characteristics of the assessment, information on score interpretation should be readily available and clearly communicated in a technical or user manual.
Dr. Nelson is an Assistant Professor of School Psychology at Penn State University. He completed his doctoral training in school psychology at the University of Minnesota after obtaining his M.A. in education from the University of Mississippi. A former high school teacher, his primary research interests focus on data-based decision-making, prevention, and intervention in the classroom setting. He has published and presented on issues related to effective math intervention, classroom environment assessment, teacher development, screening, and progress monitoring.