CAARS 2 Manual Chapter 8: Test-Retest Reliability |
Test-retest reliability is computed using the correlation between scores obtained on two occasions over a specified
period of time for the same individual by the same rater. Measures with reliable and consistent scores are expected
to have high correlations across time, indicating little change in scores from one administration to another. The
test-retest reliability of the CAARS 2 was assessed by computing the correlation of T-scores obtained on two
separate administrations over a 2- to 4-week interval (14 to 30 days) within a subset of individuals from the
general population portion of the Normative Sample (N = 88 for Self-Report and N = 61 for Observer;
please
see
appendix J for demographic characteristics of the
test-retest reliability sample).
Correlation coefficients provide us with a statistical measure of the degree of association between two variables.
The reliability coefficients are Pearson correlation coefficients, ranging from -1 to 1, with higher values
indicating greater consistency or agreement between ratings. Although there are several approaches to
interpretation, the correlation coefficients are categorized herein as follows: absolute values lower than .20 are
classified as very weak; values of .20 to .39 are considered weak; values of .40 to .59 are moderate; values of .60
to .79 are strong; and absolute values greater than or equal to .80 are very strong (Evans, 1996).
The obtained correlations, as well as those corrected for variation (Bryant & Gokhale, 1972), are provided in
Tables 8.5 and 8.6. These tables also show the means, medians, and
standard deviations at each time point.
Overall, the
results demonstrate evidence of very strong test-retest reliability for the CAARS 2 scales, as well as showing that
the effect of time across administrations was negligible (corrected correlations ranged from .83 to .95 for
Self-Report and .82 to .90 for Observer, all p < .001). As further evidence of score stability over the
course of
the retest period, mean scores from each time point are closely aligned, as seen in Tables
8.5 and 8.6. The
stable nature of the scores, as demonstrated by the test-retest reliability coefficients, provides assurance
that changes observed in CAARS 2 scores over time are due to true changes in the symptoms or impairments, as
opposed to imprecise measurement.
Table 8.5. Test-Retest Reliability: CAARS 2 Self-Report
Scale
|
Obtained r
|
Corrected r
|
Time 1
|
Time 2
|
Cohen's d
|
M
|
Mdn
|
SD
|
M
|
Mdn
|
SD
|
Content Scales
|
Inattention/Executive Dysfunction
|
.88
|
.95
|
47.5
|
45
|
7.5
|
47.9
|
47
|
8.0
|
0.05
|
Hyperactivity
|
.82
|
.93
|
47.5
|
46
|
7.4
|
47.6
|
46
|
8.0
|
0.01
|
Impulsivity
|
.89
|
.92
|
47.7
|
46
|
8.9
|
48.0
|
46
|
9.6
|
0.03
|
Emotional Dysregulation
|
.92
|
.89
|
49.6
|
47
|
11.1
|
49.4
|
47
|
11.0
|
-0.02
|
Negative Self-Concept
|
.88
|
.83
|
50.0
|
47
|
10.7
|
50.8
|
49
|
11.8
|
0.08
|
DSM Symptom Scales
|
ADHD Inattentive Symptoms
|
.86
|
.95
|
48.0
|
47
|
7.3
|
48.2
|
47
|
7.9
|
0.03
|
ADHD Hyperactive/Impulsive Symptoms
|
.82
|
.92
|
47.7
|
47
|
7.3
|
47.6
|
45
|
8.2
|
-0.01
|
Total ADHD Symptoms
|
.87
|
.95
|
47.7
|
46
|
7.2
|
47.7
|
46
|
8.0
|
0.01
|
Note. N = 88. Time between administrations = 2 to 4 weeks (14 to 30 days). All correlations
significant, p < .001. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39,
moderate = .40
to .59,
strong = .60 to .79, very strong ≥ .80. Positive d-ratio values indicate higher scores at Time 2 than Time 1.
Table 8.6. Test-Retest Reliability: CAARS 2 Observer
Scale
|
Obtained r
|
Corrected r
|
Time 1
|
Time 2
|
Cohen's d
|
M
|
Mdn
|
SD
|
M
|
Mdn
|
SD
|
Content Scales
|
Inattention/Executive Dysfunction
|
.87
|
.90
|
49.7
|
48
|
9.1
|
49.1
|
48
|
9.2
|
-0.07
|
Hyperactivity
|
.88
|
.84
|
48.9
|
45
|
10.6
|
49.0
|
44
|
11.1
|
0.02
|
Impulsivity
|
.79
|
.83
|
49.3
|
47
|
9.3
|
49.1
|
47
|
9.2
|
-0.03
|
Emotional Dysregulation
|
.81
|
.82
|
49.5
|
48
|
10.0
|
48.8
|
47
|
9.7
|
-0.07
|
Negative Self-Concept
|
.86
|
.84
|
49.6
|
47
|
10.4
|
49.9
|
48
|
10.7
|
0.02
|
DSM Symptom Scales
|
ADHD Inattentive Symptoms
|
.87
|
.91
|
49.8
|
47
|
9.0
|
49.5
|
47
|
9.0
|
-0.04
|
ADHD Hyperactive/Impulsive Symptoms
|
.87
|
.84
|
48.9
|
45
|
10.5
|
49.1
|
45
|
10.9
|
0.02
|
Total ADHD Symptoms
|
.89
|
.90
|
49.2
|
46
|
9.9
|
49.1
|
47
|
9.8
|
0.00
|
Note. N = 61. Time between administrations = 2 to 4 weeks (14 to 30 days). All correlations
significant, p < .001. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39,
moderate = .40 to .59,
strong = .60 to .79, very strong ≥ .80. Guidelines for interpreting Cohen’s |d|: negligible effect size <
0.20;
small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive d-ratio
values indicate higher scores at Time 2 than Time 1.