Manual

Conners 4 Manual

Chapter 8: Test-Retest Reliability


Test-Retest Reliability

view all chapter tables | print this section

Test-retest reliability is computed using the correlation between scores obtained on two occasions over a specified period of time for the same youth by the same rater. Measures with stable scores are expected to have high correlations, indicating little change in scores from one administration to another. The test-retest reliability of the Conners 4 was assessed by computing the correlation of T-scores obtained on two separate administrations over a 2- to 4-week interval (14 to 30 days) within a subset of youth from the general population portion of the Normative Sample (N = 81 for Parent, N = 61 for Teacher, and N = 68 for Self-Report; in appendix F, please see Table F.1 for demographic characteristics of the youth being rated and Table F.2 for demographic characteristics of the parent and teacher raters).

Correlation coefficients provide us with a statistical measure of the degree of association between two variables. The reliability coefficients are Pearson’s correlations, ranging from -1 to 1, with higher values indicating greater consistency or agreement between ratings. Although there are several approaches to interpretation, the correlation coefficients are categorized herein as follows: absolute values lower than .20 are classified as very weak; values of .20 to .39 are considered weak; values of .40 to .59 are moderate; values of .60 to .79 are strong; and absolute values greater than or equal to .80 are very strong (Evans, 1996).

The obtained correlations, as well as those corrected for variation (Bryant & Gokhale, 1972), are provided in Tables 8.9 to 8.11. These tables also show the means, medians, and SDs at each time point. Overall, the results demonstrate evidence of excellent test-retest reliability for the Conners 4 scales and that the effect of time across administrations is negligible (i.e., corrected correlations ranged from .83 to .99 for Parent, .81 to .97 for Teacher, and .63 to .86 for Self-Report, all p < .001). As further evidence of score stability over the course of the retest period, mean scores from each time point are closely aligned, as seen in Tables 8.9 to 8.11. The stable nature of the scores, as demonstrated by the test-retest reliability coefficients, provides assurance that changes observed in scores over time is due to a true change in the symptoms or impairments, as opposed to imprecise measurement.

The stability of the Conners 4 scores was further evaluated in the test-retest samples by calculating the difference between each individual’s Time 1 and Time 2 ratings. If scores increased or decreased by greater than, or equal to, 10 T-score points (i.e., 1 SD or greater), the change was considered notable. Tables 8.12 to 8.14 present the percentage of the sample with increases and decreases in scores, with most showing differences of fewer than 10 points. These tables also present the mean differences, as well as differences in SDs, between ratings from Time 1 to Time 2 (positive differences indicate that scores increased at Time 2, while negative differences indicate that scores decreased at Time 2). The differences in scores from Time 1 to Time 2 were slight (mean differences ranged from -2.4 to 1.0 points across all forms) for Parent, Teacher, and Self-Report, indicating consistency in responses across the time interval. Additionally, the differences between the SDs were quite small (ranging from -1.7 to 2.2 across all forms), showing a similar dispersion of scores from Time 1 to Time 2. The results provide support for excellent stability of the Conners 4 scores. Taken together, these results (the proportion of the sample with minimal change in their scores, and the marginal mean differences in scores across a specific time interval) demonstrate the stability of scale scores for the Conners 4 across administrations.








< Back Next >