-
Chapter 1: Introduction
-
Chapter 2: Background
-
Chapter 3: Administration and Scoring
-
Chapter 4: Interpretation
-
Chapter 5: Case Studies
-
Chapter 6: Development
-
Chapter 7: Standardization
-
Chapter 8: Reliability
-
Chapter 9: Validity
-
Chapter 10: Fairness
-
Chapter 11: CAARS 2–Short
-
Chapter 12: CAARS 2–ADHD Index
-
Chapter 13: Translations
-
Appendices
CAARS 2 ManualChapter 8: Inter-Rater Reliability |
Inter-Rater Reliability |
Inter-rater reliability refers to the degree of agreement between two raters who are rating the same individual. Estimates of inter-rater reliability help describe levels of consistency between raters, typically indexed by a correlation coefficient like Pearson’s r (LeBreton & Senter, 2008). Two inter-rater studies were conducted with the CAARS 2: (a) two Observers of the same rater type (e.g., two friends) rated the same individual; and (b) two different types of raters (e.g., self-report/observer, or parent/friend) rated the same individual.
Study 1: Two Observers, Same Rater Type. In the first inter-rater study, dyads of Observers (N = 29) completed the CAARS 2 about the same individual. The dyads were comprised of either two relatives (79.3%; most frequently included a parent and a sibling, two different parents, or two different children) or two friends (20.7%). Raters were paired based on similarity in the setting in which they could observe the rated individual (e.g., siblings, parents, and children may have similar exposure to home life, while friends may see the individual in social settings primarily). Ratings were obtained from the dyads on two separate administrations (all ratings completed within a 30-day period). Demographic characteristics of these samples are provided in appendix J.
Scores on the CAARS 2 scales for the paired raters were compared via Pearson’s correlations, ranging from -1 to 1, with higher values indicating greater consistency or agreement between raters. The obtained inter-rater reliability coefficients and those corrected for range variation (Bryant & Gokhale, 1972), as well as the means, medians and standard deviations for each dyad, are provided in Table 8.7. Results of the inter-rater study were indicative of moderate to strong levels of consistency within rater dyads across all CAARS 2 scales (i.e., corrected r ranged from .40 to .70). This pattern of results indicates that different raters can provide different perspectives and is a reminder of the value of obtaining ratings from multiple raters and evaluating multiple sources of information when conducting a comprehensive assessment.
Click to expand |
Table 8.7. Inter-Rater Reliability Study 1 (Two Observers, Same Rater Type): CAARS 2
Scale | Obtained r | Corrected r | p | Rater 1 | Rater 2 | Cohen's d | |||||
M | Mdn | SD | M | Mdn | SD | ||||||
Content Scales | Inattention/Executive Dysfunction | .70 | .70 | < .001 | 59.9 | 60 | 10.2 | 61.0 | 61 | 9.8 | 0.11 |
Hyperactivity | .59 | .45 | .019 | 55.1 | 53 | 12.0 | 58.2 | 57 | 12.2 | 0.26 | |
Impulsivity | .55 | .40 | .039 | 56.1 | 55 | 12.5 | 54.8 | 51 | 12.1 | -0.11 | |
Emotional Dysregulation | .84 | .64 | < .001 | 57.7 | 57 | 13.3 | 56.9 | 53 | 13.8 | -0.06 | |
Negative Self-Concept | .76 | .59 | .001 | 60.3 | 61 | 13.3 | 61.3 | 61 | 12.2 | 0.08 | |
DSM Symptom Scales | ADHD Inattentive Symptoms | .62 | .59 | .001 | 59.5 | 59 | 10.6 | 60.3 | 60 | 10.1 | 0.07 |
ADHD Hyperactive/Impulsive Symptoms | .53 | .40 | .039 | 55.1 | 52 | 11.7 | 57.0 | 55 | 12.2 | 0.16 | |
Total ADHD Symptoms | .61 | .49 | .010 | 57.9 | 54 | 10.5 | 59.3 | 56 | 10.7 | 0.13 |
Study 2: Different Types of Raters. In the second inter-rater study, comparisons were made across different types of raters, including comparison of Self-Report and Observer, as well as comparison of different types of Observers (e.g., spouse and friend). As the CAARS 2 Self-Report and Observer both measure the same constructs, similarity in scores across the different types of Observer raters, as well as between Observer ratings and self-reported ratings, would provide additional evidence of the reliability of the test scores. Although some degree of similarity in ratings is expected between informants, given that they are rating the same individual on the same constructs, it is nonetheless expected that there will be a certain degree of incongruence between their ratings, because different informants see the individual in different contexts and may have different perceptions of or experiences with the individual’s behavior.
For the CAARS 2, correlation coefficients (Pearson’s r, LeBreton & Senter, 2008) were calculated between scores for the following pairs of raters: (a) Observer and Self-Report (N = 211), and (b) two Observers with differing relations to the individual (N = 47; all ratings completed within a 30-day period). Dyads with two observers were required to represent different relationship types for Study 2 (e.g., a spouse/romantic partner and a friend, but not two friends). Refer to appendix J for the demographic characteristics of the raters and the individual being rated.
The obtained correlation coefficients between different rater types, as well as those corrected for range variation (Bryant & Gokhale, 1972), are provided in Table 8.8 (Self-Report/Observer) and Table 8.9 (Observer/Observer). The corrected correlations were weak to moderate in the Self-Report/Observer dyads (median r = .44, ranging from r = .37 to .46, p < .001), as well in the Observer/Observer dyads (median r = .45, ranging from r = .26 to .58, p < .001).
Click to expand |
Table 8.8. Inter-Rater Reliability Study 2 (Different Rater Types: Self-Report and Observer): CAARS 2
Scale | Obtained r | Corrected r | Self-Report | Observer | Cohen's d | |||||
M | Mdn | SD | M | Mdn | SD | |||||
Content Scales | Inattention/Executive Dysfunction | .59 | .46 | 65.5 | 67 | 11.8 | 62.8 | 62 | 11.7 | -0.23 |
Hyperactivity | .54 | .37 | 61.5 | 61 | 12.8 | 57.8 | 56 | 12.5 | -0.29 | |
Impulsivity | .53 | .39 | 61.8 | 62 | 12.4 | 56.0 | 55 | 11.8 | -0.47 | |
Emotional Dysregulation | .54 | .44 | 59.4 | 61 | 11.6 | 57.5 | 57 | 11.3 | -0.17 | |
Negative Self-Concept | .56 | .45 | 60.3 | 61 | 10.5 | 63.2 | 63 | 12.7 | 0.26 | |
DSM Symptom Scales | ADHD Inattentive Symptoms | .57 | .44 | 64.8 | 65 | 11.7 | 62.7 | 63 | 12 | -0.18 |
ADHD Hyperactive/Impulsive Symptoms | .55 | .37 | 62.3 | 63 | 13.3 | 57.4 | 55 | 12.1 | -0.38 | |
Total ADHD Symptoms | .57 | .45 | 64.4 | 65 | 12.1 | 60.7 | 61 | 11.3 | -0.31 |
Click to expand |
Table 8.9. Inter-Rater Reliability Study 2 (Different Rater Types: Two Types of Observers): CAARS 2
Scale | Obtained r | Corrected r | p | Observer 1 | Observer 2 | Cohen's d | |||||
M | Mdn | SD | M | Mdn | SD | ||||||
Content Scales | Inattention/Executive Dysfunction | .69 | .58 | < .001 | 58.9 | 59 | 11.9 | 61.2 | 60 | 11.3 | 0.20 |
Hyperactivity | .43 | .26 | .080 | 56.5 | 54 | 12.7 | 59.3 | 56 | 13.6 | 0.21 | |
Impulsivity | .51 | .38 | .011 | 55.4 | 53 | 11.5 | 57.1 | 54 | 12.6 | 0.14 | |
Emotional Dysregulation | .64 | .46 | .002 | 57.2 | 55 | 12.3 | 60.9 | 59 | 13.1 | 0.29 | |
Negative Self-Concept | .63 | .49 | < .001 | 58.5 | 57 | 11.7 | 60.1 | 59 | 12.3 | 0.13 | |
DSM Symptom Scales | ADHD Inattentive Symptoms | .62 | .50 | < .001 | 58.6 | 56 | 12.0 | 61.2 | 61 | 11.3 | 0.22 |
ADHD Hyperactive/Impulsive Symptoms | .44 | .29 | .054 | 56.3 | 52 | 12.7 | 58.4 | 55 | 12.7 | 0.17 | |
Total ADHD Symptoms | .53 | .44 | .003 | 57.9 | 57 | 11.6 | 60.5 | 61 | 11.1 | 0.23 |
Overall, these findings suggest modest differences between the ratings of Self-Report and Observer dyads, with self-reported ratings typically yielding higher scores. Additionally, Observers of different types had a similar pattern of results in that relationships were modest, such that there was some agreement while also providing unique insight. The low-to-moderate agreement observed between Observer and Self-Report results may be due to a variety of factors, such as differences in setting (e.g., consistency of behaviors can vary), in level of insight (e.g., ability to observe internal processes, and/or one’s own self-awareness), and in the nature of the relationship (e.g., willingness to disclose may vary for one’s parent as opposed to one’s spouse). Discrepancies between raters’ scores, as seen in these studies, emphasize the importance of consulting multiple sources to capture unique information that can reveal relevant differences. Self-report captures an individual’s unique insight into their own experiences and behaviors that may not be readily evident to observers, especially for behaviors that may be more inwardly felt than outwardly expressed (e.g., subjective feelings of restlessness). Two observers with different roles in the individual’s life may be drawing upon their experiences with the individual being rated in dissimilar contexts or from different vantage points, which may lead to slightly different responses between rater types and thereby reduce the similarity of their ratings. Results serve to highlight the importance of examining information from multiple sources when conducting a comprehensive assessment.
< Back | Next > |