Manual

CAARS 2 Manual

Chapter 11: Key Findings


Key Findings

Development and Standardization. The CAARS 2–Short was developed following best practices, as outlined within this chapter. Items from the full-length CAARS 2 Self-Report and Observer forms were selected to ensure optimal construct coverage while retaining psychometric properties similar to the full-length forms. Shortened scales were developed for the Inattention/Executive Dysfunction, Hyperactivity, Impulsivity, Emotional Dysregulation, and Negative Self-Concept scales by examining item performance, interrelationships and scale structure, associations with the full-length scales, and relationship with criterion variables. Standardized scores (i.e., T-scores and percentiles) were calculated based on data from the CAARS 2 Normative and ADHD Reference Samples.

Reliability. Evidence of reliability was established through the following:

  • Excellent internal consistency (median omega coefficient in the Normative Sample: Self-Report = .91 and Observer = .93).

  • Low standard error of measurement (SEM) values (indicating very little error in the estimated true scores; median SEM in the Normative Samples: Self-Report = 3.06 and Observer = 2.69).

  • A high degree of test information (mirroring similar trends seen in the full-length CAARS 2 scales, providing evidence for precision of measurement).

  • Statistically significant (p < .001), strong, and positive test-retest reliability across a 2- to 4-week interval (correlation coefficient ranges for Self-Report: r = .78 to .95; Observer: r = .76 to .90).

  • Moderate to strong inter-rater reliability between two observers of the same type (e.g., two friends; r = .40 to .69), and weak to moderate relationships between different rater types (e.g. comparing self-report ratings to observer ratings or two different observer types; r = .25 to .48), highlighting the importance of examining information from multiple sources with meaningfully different perspectives.

Validity. Evidence of validity was established through the following:

  • A high degree of association between the CAARS 2–Short and the full-length CAARS 2 (median tau: Self-Report = .88, Observer = .87).

  • Replication of the internal structure results from CAARS 2 full-length Content scales confirmatory factor analyses (CFA) which provided evidence to support the structure of the CAARS 2–Short scales (final models for the scales had strong fit statistics for both raters [CFI and TLI ≥ .962; SRMR ≤ .047; RMSEA ≤ .057]).

  • Meaningful differences (high degree of criterion-rated validity) between average scores of clinical groups on the CAARS 2–Short, such that individuals with ADHD yielded higher scores than individuals from the General Population (median Cohen’s d = 1.08 across all raters and scales) as well as higher scores than ratings of individuals with Depression and/or Anxiety (median Cohen’s d = 0.61 across all raters and scales).

  • Moderate to high levels of classification accuracy were demonstrated using scores from the CAARS 2–Short when distinguishing between individuals from the General Population and those diagnosed with ADHD (overall correct classification rates: 89.7% Self-Report, 84.1% Observer).

Fairness. Evidence of fairness, in terms of the absence of meaningful psychometric differences, is provided with regard to gender, race/ethnicity, country of residence, and education level (EL):

  • Gender. Results provide evidence for the equivalent measurement of males and females and the absence of meaningful gender differences. Measurement invariance was supported, there was no evidence of measurement bias (maximum ETSSD = |.06|), and ratings of males and females did not have statistically significantly different scores for most scales across rater forms, with the exception of slightly higher self-reported ratings of males on Hyperactivity and Impulsivity and slightly higher scores for ratings of women on Negative Self-Concept (Cohen’s d = |0.01 to 0.29|).

  • Race/ethnicity. There was no evidence of measurement bias across race/ethnicity (maximum ETSSD = |.12|), and measurement invariance was supported across comparisons (ratings of White individuals compared to ratings of Black and Hispanic individuals). For Asian vs. White comparisons, no meaningful scale score differences were found across all raters. Scores for Hispanic vs. White and Black vs. White individuals did not differ based on Observer ratings, whereas Self-Report ratings yielded small effects on three scales (Cohen’s d = |0.30| to |0.71|), and ratings of White individuals resulted in slightly higher scores than ratings of Hispanic and Black individuals.

  • Country of Residence. There was no evidence of measurement bias in terms of country of residence when comparing ratings of individuals in the U.S. and Canada. Assumptions of invariance were upheld, there was no evidence of measurement bias (maximum ETSSD = |.06|), and mean score differences were not statistically significant with negligible to small effect sizes (Cohen’s d = 0.02 to |0.39|).

  • Education level (EL). Results provide evidence for equivalence between ratings of individuals with different levels of education. Assumptions of invariance were upheld, there was no evidence of measurement bias (maximum ETSSD = |.07|), and mean score differences were not statistically significant with negligible to small effect sizes for all but one scale (Cohen’s d = 0.00 to |0.32|).
< Back Next >