Manual

CAARS 2 Manual

Chapter 11: Validity


Validity

Validity refers to the accuracy of measurement of the intended construct, or the degree to which evidence supports the interpretation of test scores for an intended use (AERA, APA, & NCME, 2014). Multiple sources of validity are considered when designing and evaluating a test. For test scores from the CAARS 2–Short, validity evidence based on relationships between tests scores and criterion variables was investigated (see Analyses and Results earlier in this chapter for details regarding the investigation of factor structure and correlations between the full-length and shortened forms for additional validity evidence). Please see chapter 9, Validity, for detailed descriptions of each type of analysis discussed in this section.

Clinical Group Differences

To provide evidence of the criterion-related validity of the scores from the CAARS 2–Short, mean test score differences were compared within the following groups:

  1. Individuals diagnosed with ADHD Inattentive Presentation (ADHD Inattentive), ADHD Combined Presentation (ADHD Combined), and individuals from the general population.

  2. Individuals diagnosed with ADHD (ADHD Inattentive, and ADHD Combined), individuals with Depression or Anxiety (diagnostic groups include Major Depressive Disorder, Persistent Depressive Disorder, Generalized Anxiety Disorder, Panic Disorder, Separation Anxiety, and Social Anxiety Disorder), and individuals from the general population.

All analyses were conducted via a series of analysis of variance tests (ANOVA; conducted in R via the stats package, version 3.6.1; R Core Team, 2013). Given the large number of comparisons conducted, a conservative significance level (p < .01) was adopted to determine statistical significance. Effect sizes, as measured by eta-squared (η2) and by Cohen’s d, are provided for all analyses.

ADHD and General Population Comparisons

To provide additional evidence of the criterion-related validity of the scores from the CAARS 2–Short, mean score differences between individuals from the general population and those with ADHD (including those diagnosed with ADHD Inattentive or ADHD Combined) were compared across CAARS 2–Short Content Scale scores. ADHD Predominantly Hyperactive/Impulsive Presentation was excluded from the analyses, as the sample size was too small to provide stable estimates (consistent with population estimates for this presentation type in adults; APA, 2013). To facilitate comparisons between these groups, a subsample of the general population sample was selected to match the demographics of the ADHD sample (i.e., unifying ADHD Inattentive and ADHD Combined). The demographic characteristics of the General Population and ADHD groups and their raters are presented in appendix J.

Individuals in the ADHD Inattentive and ADHD Combined groups were expected, on average, to have higher scores on all CAARS 2–Short Content Scales than individuals in the General Population group, as the CAARS 2–Short is designed to capture symptoms related to adult ADHD. Individuals in the ADHD Combined group were expected to score higher on Hyperactivity, Impulsivity, and Emotional Dysregulation than individuals in the ADHD Inattentive group, as these scales capture features of ADHD that are not as prominent for the Inattentive Presentation. Comparisons between the General Population and ADHD groups were analyzed with a series of ANOVAs, and significant omnibus F-tests were followed up with Tukey’s honestly significant difference (HSD) post-hoc tests for pairwise comparisons. Results are displayed in Tables 11.19 to 11.20 and also depicted graphically in Figures 11.4 to 11.5.

As expected, significant differences were observed for all scale-level comparisons between the General Population and ADHD groups, for both the CAARS 2 Self-Report and Observer. The size of these differences, as measured by eta-squared (η2; see Cohen, 1973, for detailed interpretation; guidelines are provided as notes below each table), were large (Self-Report η2 = .26 to .61 and Observer η2 = .14 to .43). Across both rater types, the largest effects were observed on the Inattention/Executive Dysfunction scale.

Pairwise comparisons between the individual ADHD groups and the General Population group yielded moderate to large effect sizes, as measured by Cohen’s d. Specifically, large effect sizes were found when looking at comparisons between the ADHD Inattentive and General Population groups for Self-Report (median Cohen’s d = 1.10), and moderate to large effects were found for Observer (median Cohen’s d = 0.85). Similarly, large effect sizes were found when comparing the ADHD Combined and General Population groups (median Cohen’s d = 2.47 for Self-Report and 1.17 for Observer). These results provide strong evidence to support the validity of the CAARS 2–Short, as they demonstrate that the CAARS 2–Short Content scale scores successfully distinguish the profiles of individuals with and without ADHD.

Furthermore, significant differences between presentations of ADHD (i.e., Inattentive vs. Combined Presentations) were also observed for the following Self-Report scales: Hyperactivity, Impulsivity, and Emotional Dysregulation, with higher scores observed for ADHD Combined than for ADHD Inattentive in all instances (see Figure 11.4). A similar pattern of results was noted on the Observer, although the results were not statistically significant (see Figure 11.5). These findings were as expected, given the presence of more hyperactive and impulsive symptoms among those diagnosed with ADHD Combined than those diagnosed with ADHD Inattentive. These clear, statistically significant, and expected group differences provide strong evidence for the validity of the CAARS 2–Short scores included in these analyses.

Click to expand

Table 11.19a. Differences between General Population and ADHD Groups: CAARS 2–Short Self-Report

CAARS 2–Short Scale GenPop
(N = 197)
ADHDin
(N = 96)
ADHDc
(N = 101)
F
(2, 275)
η2 Tukey's HSD Post-Hoc Tests
Inattention/​Executive Dysfunction M 48.8 70.3 72.6 305.27 .61 ADHDin, ADHDc > GenPop
SD 8.5 10.3 9.2
Hyperactivity M 48.9 58.9 71.3 193.35 .50 ADHDc > ADHDin > GenPop
SD 8.7 10.2 10.0
Impulsivity M 48.7 70.3 72.6 184.30 .48 ADHDc > ADHDin > GenPop
SD 8.7 10.3 9.2
Emotional Dysregulation M 48.9 58.9 71.3 113.55 .37 ADHDc > ADHDin > GenPop
SD 8.7 10.2 10.0
Negative Self-Concept M 49.6 60.1 62.6 67.46 .26 ADHDin, ADHDc > GenPop
SD 10.3 9.3 10.8
Note. All F tests statistically significant, p < .001. GenPop = individuals from the general population; ADHDin = individuals diagnosed with ADHD Predominantly Inattentive Presentation; ADHDc = individuals diagnosed with ADHD Combined Presentation. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. The > symbol indicates that the post-hoc difference is statistically significant at p < .01, and a comma indicates no significant difference between groups.
Click to expand

Table 11.19b. Differences between General Population and ADHD Groups: CAARS 2–Short Self-Report Effect Sizes

CAARS-Short Scale GenPop vs. ADHDin GenPop vs. ADHDc ADHDin vs. ADHDc
Inattention/​Executive Dysfunction 2.37 2.74 0.23
Hyperactivity 1.09 2.45 1.23
Impulsivity 2.35 2.71 0.23
Emotional Dysregulation 1.10 2.47 1.24
Negative Self-Concept 1.06 1.25 0.25
Note. GenPop = individuals from the general population (N = 197); ADHDin = individuals diagnosed with ADHD Predominantly Inattentive Presentation (N = 96); ADHDc = individuals diagnosed with ADHD Combined Presentation (N = 101). Values presented are Cohen's d effect sizes; guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Cohen's d values for which the corresponding post-hoc test was statistically significant (p < .01) are shaded in gray. A positive Cohen's d value indicates that the second group listed scored higher than the first group listed.
Click to expand

Table 11.20a. Differences between General Population and ADHD Groups: CAARS 2–Short Observer

CAARS 2–Short Scale GenPop
(N = 139)
ADHDin
(N = 63)
ADHDc
(N = 76)
F
(2, 275)
η2 Tukey's HSD Post-Hoc Tests
Inattention/​Executive Dysfunction M 48.6 66.1 64.7 102.85 .43 ADHDin, ADHDc > GenPop
SD 8.1 10.0 11.9
Hyperactivity M 49.3 58.3 63.2 45.44 .25 ADHDin, ADHDc > GenPop
SD 9.2 13.3 10.8
Impulsivity M 49.9 55.3 59.9 24.64 .15 ADHDc > ADHDin, GenPop
SD 9.3 9.9 11.8
Emotional Dysregulation M 49.8 56.1 59.3 21.82 .14 ADHDin, ADHDc > GenPop
SD 9.7 11.3 11.4
Negative Self-Concept M 49.5 63.2 61.9 47.43 .26 ADHDin, ADHDc > GenPop
SD 9.2 12.8 12.9
Note. All F tests statistically significant, p < .001. GenPop=individuals from the general population; ADHDin = individuals diagnosed with ADHD Predominantly Inattentive Presentation; ADHDc = individuals diagnosed with ADHD Combined Presentation. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. The > symbol indicates that the post-hoc difference is statistically significant at p < .01, and a comma indicates no significant difference between groups.
Click to expand

Table 11.20b. Differences between General Population and ADHD Groups: CAARS 2–Short Observer Effect Sizes

CAARS-Short Scale GenPop vs. ADHDin GenPop vs. ADHDc ADHDin vs. ADHDc
Inattention/​Executive Dysfunction 2.01 1.67 -0.13
Hyperactivity 0.85 1.43 0.42
Impulsivity 0.57 0.98 0.42
Emotional Dysregulation 0.62 0.93 0.28
Negative Self-Concept 1.33 1.17 -0.11
Note. GenPop = individuals from the general population (N = 139); ADHDin = individuals diagnosed with ADHD Predominantly Inattentive Presentation (N = 63); ADHDc = individuals diagnosed with ADHD Combined Presentation (N = 76). Values presented are Cohen's d effect sizes; guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Cohen's d values for which the corresponding post-hoc test was statistically significant (p < .01) are shaded in grey. A positive Cohen's d value indicates that the second group listed scored higher than the first group listed.
Click to expand

Figure 11.4. Profiles for General Population and ADHD Groups: CAARS 2–Short Self-Report

Click to expand

Figure 11.5. Profiles for General Population and ADHD Groups: CAARS–Short 2 Observer

ADHD, Depression/Anxiety, and General Population Comparisons

Additional group comparisons investigated differences between the scores of individuals with ADHD (all presentations combined into a single group; Self-Report N = 122; Observer N = 79), individuals with Depression or Anxiety (Self-Report N = 123; Observer N = 98), and a subset of the General Population selected to correspond demographically to the combination of clinical cases in the other two groups (Self-Report N = 245; Observer N = 177). The Depression/Anxiety sample included individuals with a confirmed diagnosis of one or more of Major Depressive Episode, Major Depressive Disorder, Persistent Depressive Disorder, Generalized Anxiety Disorder, Separation Anxiety Disorder, Social Anxiety Disorder, or Panic Disorder. For these analyses, individuals were excluded from the Depression/Anxiety sample if they had a co-occurring diagnosis of ADHD and, similarly, from the ADHD sample if they had any of the aforementioned depressive or anxiety disorders. For the General Population sample, a subsample of the full General Population sample was selected that corresponded to the combined ADHD and Depression/Anxiety samples in terms of gender, age group, race/ethnicity, and education level. The demographic characteristics of the ADHD, Depression/Anxiety, and General Population groups and their raters are presented in appendix J.

Individuals with Depression and/or Anxiety were expected to score higher than the General Population but lower than those with ADHD on most CAARS 2–Short Content Scales. Specifically, individuals with Depression and/or Anxiety were anticipated to have elevated scale scores where there is symptom overlap between these internalizing disorders and ADHD (i.e., Inattention/Executive Dysfunction, Negative Self-Concept, and Emotional Dysregulation). However, individuals with ADHD were expected to score significantly higher than individuals with Depression and Anxiety on scales that reflect features more unique to ADHD (i.e., Hyperactivity and Impulsivity).

The results of ANOVAs conducted to explore the differences between the ADHD, Depression/Anxiety, and correspondingly sized General Population groups are presented in Tables 11.21 and 11.22. Significant differences were observed between the groups for all scales (p < .001), with effect sizes that ranged from moderate to very large (η2 ranged from .19 to .48 for Self-Report, and .11 to .38 for Observer). Sizeable differences were observed for nearly all pairwise comparisons, highlighting the clearly distinct profiles of scores across CAARS 2–Short Content Scales for these three groups (see Figure 11.6 and Figure 11.7 for a graphical depiction of mean scale scores by group for Self-Report and Observer, respectively).

The ADHD group scored significantly higher than the General Population across all scales for Self-Report and Observer, with large effect sizes for all contrasts (median Cohen’s d = 1.41 for Self-Report, 1.08 for Observer). The Depression/Anxiety group also scored significantly higher than the General Population sample for most scales (median Cohen’s d = 0.72 for Self-Report, 0.55 for Observer). Moreover, differences between the ADHD group and Depression/Anxiety group revealed moderate to large effect sizes for the Inattention/Executive Dysfunction, Hyperactivity, and Impulsivity scales (median Cohen’s d = 1.05 for Self-Report, 0.92 for Observer), while there were no significant differences between the groups for Emotional Dysregulation (Cohen’s d = 0.20 for Self-Report and 0.25 for Observer) and Negative Self-Concept (Cohen’s d = -0.14 for Self-Report and -0.20 for Observer). The size and direction of the effects matched expectations. Overall, for scales containing features and symptoms of ADHD that do not typically overlap with those of internalizing disorders (e.g., hyperactivity, impulsivity), the differences were significant and moderate to large in size, whereas for scales such as Emotional Dysregulation and Negative Self-Concept that capture features common to both ADHD and internalizing disorders, the effects were not significant and were negligible to small in size. These results provide strong evidence for the validity of these CAARS 2–Short scores, as there were clear differences between groups that were hypothesized to score differently, resulting in clearly distinct profiles of scores.

Click to expand

Table 11.21a. Differences between ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Self-Report

CAARS 2–Short Scale GenPop
(N = 245)
ADHD
(N = 122)
Dep/Anx
(N = 123)
F
(2, 541)
η2 Tukey's HSD Post-Hoc Tests
Inattention/​Executive Dysfunction M 49.2 70.6 55.7 223.85 .48 ADHD > Dep/Anx > GenPop
SD 8.6 9.5 9.8
Hyperactivity M 49.5 64.0 54.0 80.52 .25 ADHD > Dep/Anx > GenPop
SD 9.4 12.0 10.3
Impulsivity M 49.2 65.7 54.0 104.40 .30 ADHD > Dep/Anx > GenPop
SD 9.5 11.7 10.6
Emotional Dysregulation M 48.9 59.8 57.6 57.14 .19 ADHD, Dep/Anx > GenPop
SD 9.4 10.5 11.5
Negative Self-Concept M 50.0 59.7 61.2 64.08 .21 ADHD, Dep/Anx > GenPop
SD 9.6 11.1 10.4
Note. All F tests statistically significant, p < .001. GenPop = individuals from the general population; Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. The > symbol indicates that the post-hoc difference is statistically significant at p < .01, and a comma indicates no significant difference between groups.
Click to expand

Table 11.21b. Differences between ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Self-Report Effect Sizes

CAARS 2–Short Scale ADHD vs. GenPop Dep/Anx vs. GenPop ADHD vs. Dep/Anx
Inattention/​Executive Dysfunction 2.41 0.72 1.56
Hyperactivity 1.41 0.47 0.90
Impulsivity 1.62 0.49 1.05
Emotional Dysregulation 1.11 0.86 0.20
Negative Self-Concept 0.95 1.13 -0.14
Note. GenPop = individuals from the general population (N = 245); Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder (N = 121). Values presented are Cohen's d effect sizes; guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Cohen's d values for which the corresponding post-hoc test was statistically significant (p < .01) are shaded in grey. A positive Cohen's d value indicates that the first group listed scored higher than the second group listed.
Click to expand

Table 11.22a. Differences between ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Observer

CAARS 2–Short Scale GenPop
(N = 177)
ADHD
(N = 80)
Dep/Anx
(N = 97)
F
(2, 351)
η2 Tukey's HSD Post-Hoc Tests
Inattention/​Executive Dysfunction M 48.0 67.3 54.4 107.38 .38 ADHD > Dep/Anx > GenPop
SD 8.4 10.6 11.2
Hyperactivity M 48.9 61.7 51.8 47.67 .21 ADHD > Dep/Anx, GenPop
SD 8.6 12.0 9.6
Impulsivity M 49.3 59.2 51.0 27.60 .14 ADHD > Dep/Anx, GenPop
SD 8.9 12.0 10.1
Emotional Dysregulation M 49.5 58.1 55.3 20.78 .11 ADHD, Dep/Anx > GenPop
SD 10.5 11.0 10.8
Negative Self-Concept M 48.9 60.4 63.0 58.54 .25 ADHD, Dep/Anx > GenPop
SD 9.7 12.9 13.0
Note. GenPop = individuals from the general population; Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder. Guidelines for interpreting Cohen's d (Cohen, 1988): small effect size = .20; medium effect size = .50; large effect size = .80. Guidelines for interpreting η2: small effect size = .01; medium effect size = .06; large effect size = .14. The > symbol indicates that the post-hoc difference is statistically significant at p < .01, and a comma indicates no significant difference between groups.
Click to expand

Table 11.22b. Differences between ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Observer Effect Sizes

CAARS-Short Scale ADHD vs. GenPop Dep/Anx vs. GenPop ADHD vs. Dep/Anx
Inattention/​Executive Dysfunction 2.12 0.67 1.19
Hyperactivity 1.32 0.33 0.92
Impulsivity 1.00 0.19 0.75
Emotional Dysregulation 0.81 0.55 0.25
Negative Self-Concept 1.08 1.29 -0.20
Note. GenPop = individuals from the general population (N = 177); ADHD = individuals with a diagnosis of ADHD (N = 80); Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder (N = 97). Values presented are Cohen's d effect sizes; guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Cohen's d values for which the corresponding post-hoc test was statistically significant (p < .01) are shaded in grey. A positive Cohen's d value indicates that the first group listed scored higher than the second group listed.
Click to expand

Figure 11.6. Profiles of ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Self-Report

Note. Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder.
Click to expand

Figure 11.7. Profiles of ADHD, Depression/Anxiety, and General Population Groups: CAARS 2–Short Observer

Note. Dep/Anx = individuals with a diagnosis of a Depressive Disorder and/or Anxiety Disorder.

Classification Accuracy

Classification accuracy statistics were derived for the CAARS 2–Short to consider the extent to which scores from each form correctly classify individuals into their respective groups (i.e., General Population vs. ADHD). For an operational definition of classification accuracy and detailed description of methods used, see chapter 9, Validity. For a description of the classification accuracy statistics that were examined and how each is derived, see chapter 6, Development.

For the CAARS 2–Short, the binary classification modelling consisted of two sets of analyses: (a) binomial logistic regression, followed by (b) the creation of confusion matrices and the derivation of classification accuracy statistics. The logistic regressions were used to predict how well the CAARS 2–Short Content Scales’ T-scores can identify individuals from the General Population versus those diagnosed with ADHD (all presentations).

To facilitate unbiased predictions, matched samples were created to ensure that each of the General Population and ADHD groups used for binary classification modelling were nearly equivalent in terms of their demographic composition. The matched General Population and ADHD samples used to examine the classification accuracy of the CAARS 2–Short were the same as those used to examine the classification accuracy of the full-length forms. The demographic characteristics of each sample are presented in appendix J.

Predicting diagnostic status depends on the prevalence of ADHD in the population. The prevalence (or base rate of ADHD in a relevant population) can vary widely depending on the purpose of the evaluation and the setting. For example, in a screening setting you might expect the prevalence of ADHD to be around 10% or less; whereas in a clinically referred sample a prevalence of approximately 50% may be more likely, or 60%-80% in an ADHD-specific clinical practice. Accordingly, the classification accuracy statistics of the CAARS 2–Short scales, assuming a 50% base rate, are summarized in Table 11.23, and the Positive and Negative Predictive Values based on varying base rates are provided in Table 11.24. The overall correct classification rate was high (89.7% for Self-Report and 84.1% for Observer), with a desired balance between sensitivity and specificity. These results support the use of the CAARS 2–Short for effective classification of individuals with and without ADHD.

Click to expand

Table 11.23. Classification Accuracy Statistics: CAARS 2–Short

Form Overall Correct Classification Rate (%) Sensitivity
(%)
Specificity
(%)
Positive Predictive Value
(%)
Negative Predictive Value
(%)
Kappa
Self-Report 89.7 88.1 91.3 90.9 88.5 .79
Observer 84.1 87.3 80.9 82.1 86.4 .68
Note. Predictor Scales = CAARS 2–Short Content Scales. Classification accuracy is based on classifying individuals with ADHD and individuals in the general population.
Click to expand

Table 11.24. Classification Accuracy Statistics Adjusted for Base Rates: CAARS 2–Short

Form 10% Base Rate 60% Base Rate 70% Base Rate 80% Base Rate
PPV (%) NPV (%) PPV (%) NPV (%) PPV (%) NPV (%) PPV (%) NPV (%)
Self-Report 66.8 76.5 92.4 86.5 93.4 84.6 94.2 82.8
Observer 47.8 96.9 84.6 84.2 86.5 81.9 88.0 79.9
Note. PPV = Positive Predictive Value. NPV = Negative Predictive Value. Predictor Scales = CAARS 2–Short Content Scales. Classification accuracy is based on classifying individuals with ADHD and individuals in the general population.
< Back Next >