Not All Tests Are Bad. Some Are.

March 18, 2022
Blog

6 key components to help you evaluate assessments and know the difference

Not all assessments are equal. There are high-quality assessments. There are mediocre assessments. And then there are tools that are counter-productive to efforts to understand and support student learning.

In this article, we focus on the classroom assessments embedded within a curriculum. These everyday tools help teachers understand how students respond to classroom instruction. Our team believes they are potentially the most important and meaningful assessment tools for teachers. Nonetheless, just because they are embedded within a curriculum does not mean that they are quality assessments. How to know the difference? We cover the six elements of assessments to help you and your colleagues identify quality assessment tools in your classrooms.

Assessment Tasks

Assessment tasks are the essential elements of assessments. There are a variety of task types that range from open performance-based tasks — such as writing, presentations, and projects — to limited tasks, like multiple choice and true/false questions.

In considering assessment tasks from instructional programs, two primary focuses should be: the accuracy of the information and the quality of the information gathered. The accuracy of the information depends on a few things. Consider first if the task is aligned with what it measures. Curricular programs have an advantage over most other assessment types in this regard, since the tasks align directly with the topics of the units of study. One concern with the assessments of curricular program comes when the contexts and topics of the assessment relate so strongly with those of the unit. When this happens, the tasks can actually measure familiarity with those context more than the general idea. For example, if a math assessment claims to measure a certain standard, but the assessment tasks relate directly to a game that was learned in the unit, it is difficult to know if the student learned generalizable knowledge of the standard. They may have only learned the idea within the context of the game.

Accuracy of assessment information also depends on the ability of the student to get the problem right by chance. This is always an issue with multiple choice tasks. Another issue with curricular assessment can come from inherent bias due to language or context that can be written into the task type. Teachers should make sure that students have access to the language, contexts, and other aspects of a task that can limit students’ abilities to demonstrate their knowledge. The accuracy of an assessment is important. Generally, those that align well with the content that is being taught provide the most accurate information. Because assessments directly embedded into instructional programs can be precise and directly aligned to the course of study, they have a distinct advantage over more general assessments of knowledge. They can provide very accurate information related to whether the student is responding to instruction.

Whereas the accuracy of information from assessments embedded in instructional programs is often very good, the quality of the information provided from assessments varies widely. The quality of the information relates to the potential for using the information to better understand student thinking, not just whether they got answers correct or not.

Considering the following questions. Do the tasks:

Provide detailed formative information that guides instruction?
Help teachers to understand student thinking?
Help teachers provide targeted and constructive feedback?
Communicate progress clearly and completely?

For each of the questions above, open tasks that prompt students to express their thinking and reasoning provide the richest information. Closed tasks where student responses are evaluated as either right versus wrong do not provide the kinds of insights that enable teachers to provide the information teachers need. While open tasks can take more time for teachers to evaluate and score, their is the potential for increased efficiency in that through directly looking at student work teachers have the potential to identify areas where students might benefit from additional support. In contrast to an open tasks that provide insights into student thinking, some curricular programs provide all multiple choice or short answer questions. While these questions might help a teacher to get a general idea of performance relative to benchmarks, they rarely provide the kinds of detailed information that more open tasks provide. Often computer administered and scored assessments put the sharpest constraints on the quality of the information provided. When teachers are only provided with computer analysis of the correctness or incorrectness of responses on assessments, generally the information is so limited as to be almost useless for anything other than the most basic analysis, and often only confirms what teachers already know.

Assessments with a variety of assessment task types are more helpful. Short answer tasks can help teachers assess basic knowledges while open ended tasks can reveal deeper understandings and student thinking. Additionally, the variety of task types also provides students with multiple opportunities to demonstrate what they have learned. Having a variety of assessment task types, some which can be answered more quickly and others that require more processing and analysis, can help to provide a balance between the depth and breadth of an assessment, allowing teachers to understand student learning across a variety of skills and concepts, while also providing insights into their understanding, and ability to communicate their reasoning of important topics.

If the assessments provided with an instructional program have only closed, multiple choice, short answer, or all computer scored tasks, consider how much information will be lost and the limitations of these assessments.

Standards Alignments

The quality of assessments in instructional programs also depends on the documentation that accompanies the tasks. Assessment documents need to clearly call out what they intend to assess. Whether these are alignments to standards, targets, outcomes, big ideas, or something else, they need to be clearly stated in order for teachers to evaluate progress and gather the necessary formative information to guide instruction and measure progress. Assessments which provide general standards alignments for an entire assessment are not precise enough (e.g. this assessment measures these 6 standards.) The best assessments provide standards alignments for each task. Sometimes tasks align with more than one standard. This is not an issue unless every task aligns with a whole handful of standards. Alignments that become too general lack precision, can be more difficult to evaluate and score, and are not as helpful as assessments that target specific learning outcomes.

Beware of assessments that provide no standards alignments. When evaluating the assessments of an instructional program, look for documents that provide detailed standards alignments. If these are not included, there are questions about accuracy, but also the quality of the information will not be as rich. What often results in the need for teachers to reverse engineer the tasks of assessments to figure out what they are intended to assess.

Answer Keys

Answer keys are helpful for closed question types that have only one correct answer: multiple choice, multiple select, short answer, and true/false. These can help ensure that assessments with these question types are marked accurately. Alternatively, answer keys for open question types, whey the key says something like, “Responses will vary” are not helpful.

Rubrics

Open-ended assessment tasks — performance tasks, interviews, writing assignments, projects, and portfolios require rubrics. Rubrics are leveled descriptors to help evaluate the quality of work. Generally, rubrics have 3 to 5 levels and include detailed information to help students and teachers understand expectations. These help teachers to get a better understanding of the expectations, and guide teachers in their evaluation and grading of student responses. Well written rubrics can provide insights into where a student is in their learning process, but also how to support students in their continued learning.

Rubrics for more open tasks are essential to improve the consistency of grading practices across classrooms. This fascinating study shares how rubrics helped diminish implicit bias in grading practices in 1,500 second grade classrooms.

If the assessments of your program have no rubrics, this is likely because the tasks are closed and do not provide enough information for this kind of analysis. But not all rubrics are the same. Consider also the quality of the rubrics themselves. Basic, generic rubrics that do not provide details about how students might respond to specific tasks are not helpful and will lead too much variability in scoring.

Scoring Guides

Scoring guides, in contrast to both answer keys and rubrics, help teachers to understand how to measure the responses and are an essential element of almost all assessments (except perhaps those that are entirely rubric based, in which case the rubrics become the scoring guide). For example:

“This question is worth 2 points. One point for providing the correct answer, and one point for creating a drawing that matches.”

Scoring guides are critical for achieving consistency in grading practices across classes. Some scoring guides also include information about performance levels for individual tasks. For example:

“If the student received all the points (6) for this question set, they should be considered ready for the work of this unit. For students who scored 4 – 5 points, some scaffolded support may be necessary. Students who scored 1 – 3 points should be provided with targeted supports.”

If no scoring guides are provided it causes a variety of issues. If, for example, one teacher decides to give 3 points for each question, in order to give partial credit, while another teacher gives each question only one point for correctness, discrepancies will inevitably arise. It creates questions related to fairness, but also makes collaborative conversations about student progress inefficient and less effective. For these reasons, when the program does not provide scoring guides teachers should work together as a team to build them. While this can be productive work it takes time. When this is the case, leaders need to provide time and set the expectation that this task will be completed. For these reasons, assessments from instructional are best when they come with detailed scoring guides for teachers to work with. While the teachers might choose to modify the scoring guide for their purposes, having a publisher-provided scoring guide gives teachers a good starting point and immediately improves consistency and efficiency.

Benchmarks and Cutoffs

Benchmarks and cutoff scores provide guidance for teachers to understand the expectations of whole assessments.

Example: If the student received more than x number of points they have demonstrated mastery of the subject.

These can be helpful for teachers to understand how to evaluate results, get perspective on performance, and communicate to parents.

While these are helpful for establishing consistent expectations, they are only one small element of the big picture. The overall results should not be the center of focus so much that the details get overlooked.

It is also important to pay attention to and be wary of point-based cutoff scores. When different questions have different possible point values those questions with more points become more influential in determining the overall. Sometimes this is intentional, sometimes it isn’t. If for instance a series of short answer questions creates a multi-part question worth 8 points, while another question is only worth 2 points – one for a correct answer and one point for showing work – that means that the 8 point question will have four times the impact on the overall performance. While this might be intentional, sometimes it can be problematic. High quality assessments will be designed so that the weights of different topics have proportionate impacts on the overall score.

Student Work Samples/Anchor Papers

To further support teachers in understanding the expectations of an assessment, sometimes student work samples or anchor papers are provided. These are often provided in combination with rubrics. Anchor papers help with the scoring of common assessments and can also be created internally by districts if not provided by the assessment creator. The highest quality assessments often contain these kinds of documents.

Conclusion

As you evaluate the assessments provided with the instructional programs being used in your district, or for your evaluation of programs you are considering for adoption, or as you prepare your assessments to transition to standards-based grading, consider these 6 key components of quality assessments.

Looking to elevate your assessment results district-wide? Take a look at our software to organize, collect, and amplify quality assessment data.

A special note about percentile rankings/norms:

Larger assessment systems that have collected large data sets, such as state assessments, usually provide information about percentile bands. These include charts or tables to compare results with the larger peer group.

These sorts of assessments require standardized practices for administration to ensure reliable results. Although these percentile bands can be helpful when considering special programming for a student, they are generally not helpful for communicating to parents. These rarely accompany assessments from instructional programs.

About us and this blog

Our team and tools help schools implement standards-based grading, streamline assessment systems, and use meaningful data to drive decision-making.

Preparing Assessments for SBG?

Download our free infographic 6 Best Practices in Standards-based Grading to reference key steps to help improve standards-based grading systems in your schools.