Valid and Reliable Conclusions - Reliable Conclusions

Now that the data has been analyzed, a conclusion based on valid and reliable findings must be drawn from the results of the data analysis. Chicago Public Schools has this to say about valid and reliable performance assessments:

  • "An assessment is reliable if it yields results that are accurate and stable. In order for a Performance Assessment to be reliable, it should be administered and scored in a consistent way for all the students who take the assessment."
  • "An assessment is valid for a particular purpose if it in fact measures what it was intended to measure. An assessment of a learning outcome is valid to the extent that scores truly measure that outcome and are not affected by anything irrelevant to the outcome."
  • "Some important aspects of validity are content coverage, generalizability and fairness. The assessments for a given outcome should be aligned with both the outcome and instruction and, when taken together, should cover all important aspects of the outcome. The assessments should address the higher-order thinking skills specified in the outcome. The tasks used should have answers or solutions that can't be memorized, but which, instead, call on the student to apply knowledge and skills to a new situation. "
  • "Assessment results are generalizable [to the extent that] if there is evidence that scores on one assessment can predict how well students perform on another assessment of the same outcome. (Chicago Public Schools Instructional Intranet)."

Validity and Reliability in Numbers

All data has some validity and some reliability. Each data point is a valid number. Collect those numbers together and they have some reliability. But validity is established by asking, "was each number collected properly?" Reliability is established when those numbers can be trusted across classrooms, or across the years.

Questions To Ask

Asking questions can help to establish validity and reliability. For example, was the correct assessment model chosen and did it really measure what was intended to be measured? The website has discussed using a survey with audience members and how the wording of the question can really limit the outcome or give some great insight into the audience member’s thinking. If one asks, “did you like the play?”, then the answer given is only valid as far as it measures an audience member’s opinion of “liking” the play.

Table of Numbers Example

Here is an example of student scores collected in the form of a table…

Student ID

Score on a scale from 1-4

001

2.5

002

3.4

003

3.2

…it is difficult to conclude with this data. Each score is valid, but without comparing the scores to each other, or to a standard for performance, they have no reliability.

Establishing Validity and Reliability

In the table below, the data has been organized in a way that establishes the validity of the scores and the reliability of the scores compared to each other and to a standard for achieving standards. This allows validity and reliability for student performance to be established…

Student ID

Scores in descending order

002

3.4……….achieved standards

003

3.2……….achieved standards

001

2.5……….did not achieve standards

If one wants to add more rigor to the validity and reliability of the evidence of student achievement of standards, one might reorganize the table to show the progress of these students compared to the overall class progress. Simply add the columns and draw some more conclusions. For example:

Student ID

Pre-

Mid-

Post-

001

2.5

2.6

3.1

002

3.2

3.0

2.9

003

3.4

3.3

3.5

A reader of this data can conclude that the evidence points to a variety of valid and reliable statements. For example, “student 001 is steadily improving over time,” “student 002 is slipping over time and is failing to achieve standards,” and “student 003 is holding steady.”

Now let’s say that one wants to add more rigor to the validity and reliability of the evidence of student achievement of standards. The table can be reorganized to show the progress of these students compared to the overall class progress. Simply add the columns and draw some more conclusions. For example:

Student ID

Pre-

Mid-

Post-

001

2.5

2.6

3.1

002

3.2

3.0

2.9

003

3.4

3.3

3.5

Totals

3.03

2.96

3.16

Findings: Student achievement rose from and average of 3.03 to an average of 3.16 over the course of the term.

Conclusion: Although student achievement slipped in the mid marking period, overall, more than half the students improved in their ability to achieve standards.

Balancing Validity and Reliability

Balancing validity and reliability extends from the assessment model to the analysis and finally into the conclusion. The conclusion is valid and reliable when it reveals this balance. Rarely are conclusions thought to be valid and reliable if they overstate one side of the findings. Nothing in the world is ever one-sided. Good stories always have a balance that confirms the ups and downs and the good and the bad of life, and in research, the well expected and the unexpected outcomes of our work.

For example, when one concludes that, “all audience members liked the show,” the reader is bound to ask , “Really? All members liked the show?” But if one's conclusion is worded such that, “all audience members liked the show, and maybe the seats could be improved,” this is getting closer to a balance. Finally, if one concludes, “Most of the audience members liked the show, and the seats in the balcony could be improved,” this begins to convince the reader that the writer has tried to balance the results of the survey from a valid and reliable point of view.

Go Forth and Analyze

One should now be ready to collect data in a real situation, prepare it, organize it, analyze it and powerfully conclude from it. Perhaps the most important thing to remember is to keep balancing ideas. If one initially asks a limited question, resolve to ask a better, more informative question the next time. If one thinks everyone loves a play, he or she should take a moment to really ask what could be improved. If one wants to know what students really learn, help them unpack that in a meaningful way, a way that helps to improve the teaching and their learning.

0