
CAARS 2 Manual

Chapter 8: Test-Retest Reliability

Test-Retest Reliability

Test-retest reliability is computed using the correlation between scores obtained on two occasions over a specified period of time for the same individual by the same rater. Measures with reliable and consistent scores are expected to have high correlations across time, indicating little change in scores from one administration to another. The test-retest reliability of the CAARS 2 was assessed by computing the correlation of T-scores obtained on two separate administrations over a 2- to 4-week interval (14 to 30 days) within a subset of individuals from the general population portion of the Normative Sample (N = 88 for Self-Report and N = 61 for Observer; please see appendix J for demographic characteristics of the test-retest reliability sample).

Correlation coefficients provide us with a statistical measure of the degree of association between two variables. The reliability coefficients are Pearson correlation coefficients, ranging from -1 to 1, with higher values indicating greater consistency or agreement between ratings. Although there are several approaches to interpretation, the correlation coefficients are categorized herein as follows: absolute values lower than .20 are classified as very weak; values of .20 to .39 are considered weak; values of .40 to .59 are moderate; values of .60 to .79 are strong; and absolute values greater than or equal to .80 are very strong (Evans, 1996).

The obtained correlations, as well as those corrected for variation (Bryant & Gokhale, 1972), are provided in Tables 8.5 and 8.6. These tables also show the means, medians, and standard deviations at each time point. Overall, the results demonstrate evidence of very strong test-retest reliability for the CAARS 2 scales, as well as showing that the effect of time across administrations was negligible (corrected correlations ranged from .83 to .95 for Self-Report and .82 to .90 for Observer, all p < .001). As further evidence of score stability over the course of the retest period, mean scores from each time point are closely aligned, as seen in Tables 8.5 and 8.6. The stable nature of the scores, as demonstrated by the test-retest reliability coefficients, provides assurance that changes observed in CAARS 2 scores over time are due to true changes in the symptoms or impairments, as opposed to imprecise measurement.

Click to expand
Click to expand

Table 8.6. Test-Retest Reliability: CAARS 2 Observer

Scale Obtained r Corrected r Time 1 Time 2 Cohen's d
M Mdn SD M Mdn SD
Content Scales Inattention/​Executive Dysfunction .87 .90 49.7 48 9.1 49.1 48 9.2 -0.07
Hyperactivity .88 .84 48.9 45 10.6 49.0 44 11.1 0.02
Impulsivity .79 .83 49.3 47 9.3 49.1 47 9.2 -0.03
Emotional Dysregulation .81 .82 49.5 48 10.0 48.8 47 9.7 -0.07
Negative Self-Concept .86 .84 49.6 47 10.4 49.9 48 10.7 0.02
DSM Symptom Scales ADHD Inattentive Symptoms .87 .91 49.8 47 9.0 49.5 47 9.0 -0.04
ADHD Hyperactive/​Impulsive Symptoms .87 .84 48.9 45 10.5 49.1 45 10.9 0.02
Total ADHD Symptoms .89 .90 49.2 46 9.9 49.1 47 9.8 0.00
Note. N = 61. Time between administrations = 2 to 4 weeks (14 to 30 days). All correlations significant, p < .001. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80. Guidelines for interpreting Cohen’s |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive d-ratio values indicate higher scores at Time 2 than Time 1.
< Back Next >