-
Chapter 1: Introduction
-
Chapter 2: Background
-
Chapter 3: Administration and Scoring
-
Chapter 4: Interpretation
-
Chapter 5: Case Studies
-
Chapter 6: Development
-
Chapter 7: Standardization
-
Chapter 8: Reliability
-
Chapter 9: Validity
-
Chapter 10: Fairness
-
Chapter 11: CAARS 2–Short
-
Chapter 12: CAARS 2–ADHD Index
-
Chapter 13: Translations
-
Appendices
CAARS 2 ManualChapter 12: Reliability |
Reliability |
- Internal Consistency and Standard Error of Measurement
- Test Information
- Test-Retest Reliability
- Inter-Rater Reliability
Multiple indicators of reliability are provided for scores from the CAARS 2–ADHD Index, including internal consistency, test information, test-retest reliability, and inter-rater reliability. A thorough explanation of reliability, as well as the reliability evidence for the full-length CAARS 2, can be found in chapter 8, Reliability. Note that the CAARS 2–ADHD Index raw scores are used primarily in this section to permit correlational analyses.
Internal Consistency and Standard Error of Measurement
Internal consistency estimates for the CAARS 2–ADHD Index are presented in Table 12.8 for the Normative and ADHD Reference Samples (see chapter 7, Standardization, for a description of these samples; see chapter 8, Reliability, for an in-depth description of internal consistency and the coefficients in the section).
The reliability coefficients presented in Table 12.8 exceed guidelines for high reliability and indicate that the CAARS 2–ADHD Index has excellent internal consistency on all forms across all age groups. Across all age groups and genders in the Normative Sample, the median omega of the CAARS 2–ADHD Index was .90 (ranging from .85 to .92) for Self-Report, and .90 (ranging from .83 to .93) for Observer. For the ADHD Reference Samples, the median omega was .83 (ranging from .74 to .84) for Self-Report, and .86 (ranging from .82 to .88) for Observer. In summary, multiple metrics indicate that the CAARS 2–ADHD Index provides cohesive, consistent, and reliable estimates of key predictive ADHD symptoms.
The standard error of measurement (SEM) is a statistic that quantifies the amount of error in the obtained raw scores (more details are provided in chapter 8, Reliability). Overall, the median SEM was 3.16 for Self-Report and 3.18 for Observer in the CAARS 2 Normative Sample across age groups, and 4.12 for Self-Report and 3.78 for Observer in the ADHD Reference Sample (see Table 12.8). The low values reported here indicate a small standard error of measurement, or very little error in the estimated true scores for the CAARS 2–ADHD Index, and therefore a high degree of reliability.
Click to expand |
Table 12.8. Internal Consistency and Standard Error: CAARS 2–ADHD Index Normative Samples
Form | Sample | Age Group | Combined Gender | Male | Female | |||||||||
N | α | ω | SEM | N | α | ω | SEM | N | α | ω | SEM | |||
Self-Report | Normative Sample | 18–24 | 110 | .89 | .90 | 3.23 | 55 | .89 | .89 | 3.31 | 55 | .89 | .89 | 3.29 |
25–29 | 110 | .90 | .90 | 3.14 | 55 | .87 | .88 | 3.52 | 55 | .91 | .92 | 2.89 | ||
30–39 | 220 | .91 | .91 | 3.03 | 110 | .90 | .90 | 3.14 | 110 | .90 | .91 | 3.06 | ||
40–49 | 220 | .91 | .91 | 2.96 | 110 | .90 | .90 | 3.13 | 110 | .92 | .92 | 2.83 | ||
50–59 | 220 | .90 | .90 | 3.16 | 110 | .88 | .89 | 3.39 | 110 | .91 | .91 | 2.99 | ||
60–69 | 220 | .90 | .90 | 3.18 | 110 | .86 | .87 | 3.62 | 110 | .91 | .92 | 2.87 | ||
70+ | 220 | .86 | .86 | 3.72 | 94 | .86 | .87 | 3.64 | 126 | .84 | .85 | 3.94 | ||
ADHD Reference Sample | 18+ | 255 | .82 | .83 | 4.12 | 129 | .83 | .84 | 4.00 | 124 | .73 | .74 | 5.10 | |
Observer | Normative Sample | 18–24 | 110 | .89 | .89 | 3.26 | 55 | .90 | .90 | 3.16 | 55 | .89 | .89 | 3.26 |
25–29 | 110 | .92 | .93 | 2.72 | 55 | .81 | .83 | 4.14 | 55 | .92 | .93 | 2.72 | ||
30–39 | 220 | .92 | .92 | 2.74 | 110 | .92 | .92 | 2.84 | 110 | .92 | .92 | 2.74 | ||
40–49 | 220 | .91 | .91 | 2.99 | 110 | .90 | .90 | 3.10 | 110 | .91 | .91 | 2.99 | ||
50–59 | 220 | .90 | .90 | 3.18 | 110 | .90 | .90 | 3.13 | 110 | .90 | .90 | 3.18 | ||
60–69 | 220 | .89 | .89 | 3.27 | 110 | .88 | .88 | 3.39 | 110 | .89 | .89 | 3.27 | ||
70+ | 220 | .88 | .88 | 3.40 | 94 | .88 | .89 | 3.38 | 126 | .88 | .88 | 3.40 | ||
ADHD Reference Sample | 18+ | 170 | .85 | .86 | 3.78 | 87 | .88 | .88 | 3.42 | 81 | .80 | .82 | 4.24 |
Test Information
Test information was explored in the Total Samples for Self-Report and Observer (for details about the Total Sample, see Table 6.4 in chapter 6, Development; for details about test information, see chapter 8, Reliability). As seen in Figure 12.3, precision of measurement for the CAARS 2–ADHD Index is excellent for both Self-Report and Observer forms. The obtained precision of measurement values exceed 10 at both average (denoted as 0 along the x-axis) and clinical (denoted as ≥ 2 along the x-axis) levels of the constructs being measured. Values of this magnitude exceed guidelines for test information, suggesting excellent precision of measurement for the CAARS 2–ADHD Index across both rater forms and further supporting the reliability of this index.
Test-Retest Reliability
The test-retest reliability of the CAARS 2–ADHD Index was assessed by computing the correlation of raw scores from individuals who completed the test on two separate occasions. It was expected that the ADHD Index would have stable scores that do not vary significantly upon re-administration. Ratings of individuals from the general population (i.e., individuals who did not have a clinical diagnosis) completed the CAARS 2–ADHD Index twice with a 2- to 4-week time interval between administrations (N = 88 for Self-Report; N = 61 for Observer; refer to appendix J for demographic characteristics of the test-retest samples).
The correlation coefficients and descriptive statistics of the raw scores from the test-retest sample are provided in Table 12.9. Results demonstrated that the CAARS 2–ADHD Index has very strong test-retest reliability (i.e., r = .84 for Self-Report, .86 for Observer; p < .001). As further evidence of score stability over the retest period, mean scores from each administration were closely aligned, as seen in Table 12.10. The stability of the CAARS 2–ADHD Index probability score was further evaluated by calculating the difference between the percentage of the samples within each probability score category range from Time 1 to Time 2. Overall, fewer than 5% of the Self-Report and Observer samples showed a shift in probability score category; that is, scores that were classified as “Low” at Time 1 were overwhelmingly likely to also be classified as “Low” at Time 2. The stable nature of the scores, as demonstrated by the test-retest reliability coefficient, provides confidence that any change observed in scores over time is likely due to a true change, as opposed to unreliable or imprecise measurement.
Click to expand |
Table 12.9. Test-Retest Reliability: CAARS 2–ADHD Index
Form | r | Time 1 | Time 2 | ||||
M | Mdn | SD | M | Mdn | SD | ||
Self-Report | .84 | 7.7 | 7 | 5.5 | 7.6 | 7 | 6.1 |
Observer | .86 | 7.5 | 6 | 6.8 | 6.9 | 5 | 6.3 |
Click to expand |
Table 12.10. Percentage of Change in Probability Score Range: CAARS 2–ADHD Index
Form | Administration | Percentage of Sample Within Each Score Range | Cliff's d | ||||
Very Low | Low | Borderline | High | Very High | |||
Self-Report | Time 1 | 87.5 | 5.7 | 1.1 | 2.3 | 3.4 | -.05 |
Time 2 | 83.0 | 5.7 | 3.4 | 5.7 | 2.3 | ||
Observer | Time 1 | 70.5 | 14.8 | 8.2 | 0.0 | 6.6 | .04 |
Time 2 | 73.8 | 14.8 | 4.9 | 3.3 | 3.3 |
Inter-Rater Reliability
Inter-rater reliability refers to the degree of agreement between two raters who are rating the same individual. Estimates of inter-rater reliability help describe levels of score consistency between raters, typically Indexed by a correlation coefficient (in this case, Pearson’s r; LeBreton & Senter, 2008). Consistency between raw scores from various raters was explored for the CAARS 2–ADHD Index Self-Report and Observer. Two inter-rater studies were conducted with the CAARS 2–ADHD Index: (a) two raters of the same rater type rated the same individual (i.e., two relatives or two friends), and (b) two raters of different rater types rated the same individual (i.e., self-report/observer, or two different observers).
Study 1: Two Observers, Same Rater Type. In the first inter-rater study, dyads of Observers (N = 29) completed the CAARS 2–ADHD Index about the same individual (refer to appendix J for the demographic characteristics of the inter-rater samples).
The obtained inter-rater reliability coefficient is provided in Table 12.11. Strong inter-rater agreement was found for the CAARS 2–ADHD Index (r = .65, p < .001). Table 12.12 shows the descriptive statistics for each rater, highlighting the similarity of average scores between the two raters through very small differences between mean raw scores. The consistency of the CAARS 2–ADHD Index probability score between raters of the same type was further evaluated by calculating the difference between the percentage of the sample within each probability score category range from Rater 1 to Rater 2. There was a negligible effect size of the differences between raters (Cliff’s d = -.11; see Table 12.12), indicating that different raters typically provided ratings that led to scores within the same category, as seen through the similar percentages of each sample with scores in each category. These findings support the reliability of the CAARS 2–ADHD Index.
Click to expand |
Table 12.11. Inter-Rater Reliability Study 1 (Two Observers, Same Rater Type): CAARS 2–ADHD Index
ADHD Index | r | Rater 1 | Rater 2 | ||||
M | Mdn | SD | M | Mdn | SD | ||
.65 | 17.6 | 19 | 6.8 | 18.9 | 19 | 6.9 |
Click to expand |
Table 12.12. Percentage of Change in Probability Score Range: CAARS 2–ADHD Index Inter-Rater Reliability Study 1
Percentage of Sample Within Each Score Range | Cliff's d | |||||
Very Low | Low | Borderline | High | Very High | ||
Rater 1 | 13.8 | 13.8 | 10.3 | 24.1 | 37.9 | -.11 |
Rater 2 | 3.4 | 17.2 | 10.3 | 24.1 | 44.8 |
Study 2: Different Types of Raters. In the second inter-rater study, comparisons were made across types of raters. As the CAARS 2–ADHD Index Self-Report and Observer both measure the same constructs, similarity in scores across different types of Observer raters, as well as between observer ratings and self-reported ratings, would provide evidence of the reliability of the test scores. Although some degree of similarity in ratings is expected between informants, given that they are rating the same individual on the same constructs, it is nonetheless expected that there be a certain amount of disparity in their ratings, given that different informants will observe the individual in different contexts and from different vantage points. As such, they may have dissimilar perceptions of or experiences with the individual’s behavior and functioning.
For the CAARS 2–ADHD Index, correlation coefficients (Pearson’s r, LeBreton & Senter, 2008) were calculated between scores for the following pairs of raters: (a) Observer and Self-Report (N = 211), and (b) two Observers with differing relations to the individual (N = 47; all ratings completed within a 30-day period). The dyads with two observers comprised unique relationship types, such that the type of rater was varied; that is, a spouse/romantic partner and a friend were permitted to rate the same individual, but two friends were not permitted to rate the same individual for the purpose of this study. All raters completed the CAARS 2–ADHD Index within a 30-day period of each other. Refer to appendix J for demographic characteristics for the rated individual and raters.
Results, as seen in Table 12.13, were similar to the results found on the full-length CAARS 2 and the CAARS 2–Short. The correlations between the CAARS 2–ADHD Index raw probability scores were moderate for both comparisons (r = .54, p < .001). In addition, there was a small effect size of the difference between raters (Cliff’s d = -.17 and -.19; see Table 12.14). These results indicate that although raters largely provided ratings that led to similar categories of the probability score, some differences may arise, as evidenced by the slight differences seen in the proportions of each rater sample in each score category in Table 12.14. A number of factors, including setting differences, may contribute to this range in agreement across raters. Whereas individuals can report on their own behaviors across multiple settings, observers are typically limited in terms of the number of specific environments they can report on (e.g., work but not home). The different roles occupied by raters vis-à-vis the individual will impact not only which contexts they are privy to but also the standards and bases for their perceptions, adding further to the inconsistency in ratings produced by different observer types. Given that different observers each provide some unique information, there is clearly immense value in obtaining ratings from multiple sources when administering the CAARS 2–ADHD Index.
Click to expand |
Table 12.13. Inter-Rater Reliability Study 2: CAARS 2–ADHD Index
Dyads | r | Self-Report | Observer | ||||
M | Mdn | SD | M | Mdn | SD | ||
Self-Report/Observer | .54 | 21.7 | 23 | 7 | 19.2 | 20 | 7.7 |
Observer/Observer | .54 | 16.2 | 17 | 7.6 | 18.4 | 18 | 7.6 |
Click to expand |
Table 12.14. Percentage of Change in Probability Score Range: CAARS 2–ADHD Index Study 2
Dyads | Rater | Percentage of Sample Within Each Score Range | Cliff's d | ||||
Very Low | Low | Borderline | High | Very High | |||
Self-Report/Observer | Self-Report | 8.5 | 6.6 | 3.3 | 11.4 | 70.1 | -.17 |
Observer | 11.8 | 9.5 | 9.5 | 16.6 | 52.6 | ||
Observer/Observer | Observer 1 | 17.0 | 19.1 | 8.5 | 25.5 | 29.8 | -.19 |
Observer 2 | 14.9 | 6.4 | 8.5 | 23.4 | 46.8 |
< Back | Next > |