Manual

CAARS 2 Manual

Chapter 11: Development


Development

The goal of the CAARS 2–Short is to assess core and associated symptoms of ADHD. Therefore, all five CAARS 2 Content Scales were included–Inattention/Executive Dysfunction, Hyperactivity, Impulsivity, Emotional Dysregulation, and Negative Self-Concept. Note that the CAARS 2–Short also includes the Response Style Analysis, ADHD Index, and Additional Questions; these components are parallel to the full-length form and are not analyzed or discussed separately in this chapter. Please see Table 1.1 in chapter 1, Introduction, for a comparison of the content included in the full-length CAARS 2 and CAARS 2–Short. In order to mitigate the risks to measurement precision, reliability, and validity that can occur with abbreviated versions of scales, recommended practices for developing short forms were followed for the CAARS 2–Short (Emons et al., 2007; Kruyen et al., 2013; Smith et al., 2000; Ziegler et al., 2014).

Samples

The CAARS 2–Short was derived and validated using the CAARS 2 Total Sample (see Table 6.4 in chapter 6, Development). The Total Sample, which included individuals from the general population and from clinical groups, comprised a sample of 2,232 individuals aged 18 or older who completed the CAARS 2 Self-Report and 2,150 observers who rated adults aged 18 or older. The Total Sample was used to select and validate the items for the shortened Content Scales. Note that two individuals from the Observer Total Sample were excluded from analyses as they had omitted items that affected analyses relevant to the creation of a shortened form. The samples were split into calibration and validation subsamples (see Tables 11.1a and 11.1b for demographic characteristics of the rated individuals and the raters, respectively).

Click to expand

Table 11.1a. Demographic Characteristics of the Rated Individuals: CAARS 2–Short Calibration and Validation Samples

Demographic Self-Report Observer
Calibration Validation Calibration Validation
N % N % N % N %
Gender Male 544 46.3 487 46.1 517 47.6 504 47.4
Female 621 52.9 569 53.8 565 52.0 559 52.5
Other 10 0.9 1 0.1 4 0.4 1 0.1
U.S. Race/Ethnicity Hispanic 117 10.0 112 10.6 126 11.6 106 10.0
Asian 50 4.3 42 4.0 34 3.1 35 3.3
Black 94 8.0 93 8.8 97 8.9 102 9.6
White 693 59.0 643 60.8 650 59.9 630 59.2
Other 28 2.4 15 1.4 20 1.8 21 2.0
U.S. Region Northeast 165 14.0 176 16.7 165 15.2 172 16.2
Midwest 232 19.7 198 18.7 219 20.2 198 18.6
South 382 32.5 328 31.0 333 30.7 328 30.8
West 203 17.3 203 19.2 210 19.3 196 18.4
Canadian Region Central 124 10.6 84 7.9 105 9.7 111 10.4
East 9 0.8 15 1.4 11 1.0 14 1.3
West 60 5.1 53 5.0 43 4.0 45 4.2
Canadian Race/Ethnicity Not a visible minority 161 13.7 125 11.8 129 11.9 133 12.5
Visible minority 32 2.7 27 2.6 30 2.8 37 3.5
Education Level No high school diploma 87 7.4 67 6.3 88 8.1 85 8.0
High school diploma/GED 291 24.8 268 25.4 301 27.7 322 30.3
Some college or associate degree 389 33.1 369 34.9 340 31.3 325 30.5
Bachelor's degree 251 21.4 222 21.0 214 19.7 204 19.2
Graduate or professional degree 157 13.4 131 12.4 143 13.2 128 12.0
Diagnosis ADHD Inattentive 64 5.4 50 4.7 35 3.2 30 2.8
ADHD Hyperactive/Impulsive 0 0.0 0 0.0 1 0.1 8 0.8
ADHD Combined 76 6.5 55 5.2 60 5.5 36 3.4
Anxiety 105 8.9 86 8.1 71 6.5 67 6.3
Depression 90 7.7 75 7.1 58 5.3 61 5.7
Other Diagnosis 69 5.9 45 4.3 49 4.5 44 4.1
No Diagnosis 930 79.1 863 81.6 923 85.0 912 85.7
Age in years M (SD) 47.4 (19.3) 47.6 (19.3) 47.7 (19.8) 47.8 (19.6)
Total 1,175 100.0 1,057 100.0 1,086 100.0 1,064 100.0
Note. Anxiety includes Generalized Anxiety Disorder, Panic Disorder, Separation Anxiety, Specific Phobia, and Social Anxiety Disorder. Depression includes Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring diagnoses count towards more than one diagnostic group.
Click to expand

Analyses and Results

Identical procedures were used to develop the CAARS 2–Short for the Self-Report and Observer forms. Consistent with recommended practice in developing shortened forms, both statistical methods and expert judgment were employed to ensure breadth of coverage of the target construct was retained in the shortened forms (Kruyen et al., 2013; Smith et al., 2000; Ziegler et al., 2014). The steps involved in item selection and subsequent validation of the shortened forms were as follows:

Step 1–Core items selected. Five experts in adult ADHD (see Acknowledgements) were asked to identify items from the full-length CAARS 2 that best represented the core construct for each scale. Experts were asked to identify core items for Self-Report and Observer separately. The number of experts who endorsed each item as core was summed, producing a score that ranged from 0 to 5 for each item. Expert consensus on a core item was defined as an item score of 4 or higher, which represented agreement from at least 4 out of 5 experts. All core items were initially included in the shortened scales, though a small subset of core items were later excluded due to statistical considerations as outlined in Step 2.

Step 2–Items excluded due to statistical considerations. Items were examined for local dependence (LD) as well as differential item functioning (DIF) across demographic groups (i.e., gender, race/ethnicity, language(s)1 spoken, and education level [EL]) in the Total Sample. LD refers to the assumption of an IRT model that the items in a scale share variance only due to a common factor and are not related to one another in other ways (e.g., a response to one question depends on a response from an earlier question; Embretson & Reise, 2000). DIF refers to the assumption of an IRT model that there is no statistical item bias in terms of group differences (Embreton & Reise, 2000). If there was evidence for significant and meaningful LD or DIF, the item was excluded from consideration on the CAARS 2–Short. Including the items could mean the scale score would be affected by factors other than the construct being measured (that is, significant DIF would indicate item responses are unduly influenced by group characteristics, while meaningful LD would suggest that responses are influenced by item similarity such that the items may be related for a reason beyond their shared latent construct). While LD and DIF had negligible impact on the full-length CAARS 2 (see chapter 6, Development, for item selection procedures that evaluated these same statistics and found little evidence for the meaningful influence of either statistic in the full-length CAARS 2 items), LD and DIF can have a larger influence on shorter scales and therefore more stringent criteria were set for the CAARS 2–Short.

Items with a medium DIF effect size in terms of the tested demographic groups were excluded from consideration for the short form. Using this criterion, no items were excluded from the Self-Report form and only one item was excluded from the Observer form (as there was a moderate effect size between Hispanic and White individuals for Observer).

LD was assessed using (a) residual correlations among items greater than .15, (b) modification indices for 1-factor confirmatory factor analysis (CFA) models for each of the scales to assess residual correlation pairs, and (c) the presence of a significant χ2 test (Chen & Thissen, 1997). When LD was detected for an item pair, the item with the better measurement properties overall was considered for inclusion on the shortened forms.

Step 3–Remaining items selected. Item selection was done using the calibration sample, by systematically adding items one at a time to the core set of items for each scale, based on the following considerations:

  • Statistical Properties. Additional items were added based on item discrimination, precision of measurement, and ability to discriminate between the General Population and ADHD samples.

    • Item discrimination assesses an item’s ability to distinguish individuals at low versus high levels of the trait. Item discrimination was measured using the slope parameter of each item from an IRT model. Higher values (e.g., > .75) were favored, as they indicate better discrimination (Embretson & Reise, 2000).

    • Precision of measurement is inversely related to the amount of error, so that an item with low error has high precision. Precision of measurement for items was assessed using item information curves (IICs). An IIC graphically shows precision of measurement across the range of the construct being measured, also known as theta. Precision at or above 1.5 SD from the average level of the construct was targeted, to best capture both subclinical and clinical levels of the construct. Greater amounts of information indicate higher precision of measurement and lower standard error (more details can be found in Test Information in chapter 8, Reliability).

    • Cliff’s delta (Cliff’s d; Cliff, 1993) was employed to examine how well each item distinguished between the General Population and ADHD samples. Cliff’s d is a measure of effect size used for non-parametric data. Items with higher effect sizes were preferred as they indicate better discrimination between groups.

  • Expert ratings. When items had similar statistics, the item with the higher expert rating was retained

  • Content represented. Many scales assess different content areas or facets within the construct. For example, for the full-length CAARS 2 Hyperactivity scale, 61% of the items assess behavioral aspects of hyperactivity, 31% assess verbal hyperactivity, and 8% assess both behavioral and verbal aspects. A similar ratio of items was retained for the shortened form, and across both raters, to ensure proportional coverage of all facets of the construct measured.

Note that the CAARS 2–Short Self-Report and Observer were developed separately; while they both cover the same core content areas, they differ at the item-level. Experts identified different items as core for the different rater types, and statistical analysis dictated empirical selection of certain items for the Self-Report and other items for the Observer form. As a result, the CAARS 2–Short Self-Report and Observer items are overlapping, but not completely aligned.

Step 4–Alternate shortened versions compared. The development team set a minimum and maximum length for each scale on the short form (see Table 11.2). The minimum was the fewest number of items that would still allow for reasonable breadth of coverage; the maximum was approximately two-thirds of the full-length scale. For example, the Inattention/Executive Dysfunction is the longest scale with the most content to cover. Therefore, it required more items than other scales (as seen in Table 11.3). Starting with the minimum length, alternate-length short forms were created sequentially by adding one item at a time; therefore, for example, a 7-item version was compared to an 8-item version, which only differed by one additional item. This approach enabled testing for the ideal length that balanced efficiency with reliability and validity (Smith et al., 2000).

Click to expand

The following criteria were used to assess reliability and validity:

  • Measurement precision of the scale, with an emphasis on peak precision at 1.5 or 2 standard deviations above the mean for a given construct. Ensuring precision at this range was the focus, as that is typically understood to capture the clinical range of the constructs measured (see also Test Information in chapter 8, Reliability). Information values greater than 10 indicate high precision, values below 10 are moderately precise, and values near 5 are considered adequate (Flannery et al., 1995; Reeve & Fayers, 2005).

  • Goodness-of-fit statistics were explored to ensure consistency in the factor structure between shortened and full-length scales. This comparison is helpful for ensuring that construct validity is retained (Rammstedt & Beierlein, 2014) and that all dimensions of the construct are proportionally represented in the short form (Maloney et al., 2011). A detailed discussion of the multiple fit indices considered is provided in Internal Structure in chapter 9, Validity.

  • Internal consistency, as measured by alpha and omega, was evaluated (see Internal Consistency in chapter 8, Reliability, for a detailed discussion of these metrics).

  • Correlations between raw scores on the shortened scales and the full-length scales were assessed (via Kendall’s tau coefficient, given the non-normality of the distribution of the scales). High correlation coefficients provide evidence that the scales are measuring the same construct. Reliability, validity, and construct coverage were prioritized over correlation between form lengths.

The statistical properties for each of the alternate versions were evaluated, and results for each were compared against the full-length CAARS 2 as a reference point. In instances where a shorter version performed as well statistically as a version with more items, the version that included the fewest items was favored. The process is illustrated with the CAARS 2–Short Observer Impulsivity scale as an example. As seen in Table 11.3, 4-, 5-, 6-, and 7-item versions of this scale were compared, and the analyses revealed acceptable and similar results for all versions in terms of correlations to the full-length scale, internal consistency, and model fit. However, compared to the other versions, the 6-item version had slightly less desirable fit statistics (higher RMSEA and SRMR and lower CFI and TLI), and the 4-item version had slightly lower internal consistency estimates. The precision of measurement, as seen in Figure 11.1, showed that the 7-item version was the only one to surpass a value of 10. Based on these results, the 7-item version was selected for the CAARS 2–Short Impulsivity scale. This process of comparing various scale lengths for each scale on the CAARS 2–Short was repeated until a final set of items was selected for all scales.

Click to expand

Table 11.3. Comparison of Short Form Options: CAARS 2 Observer Impulsivity Scale

Form Number of Items Correlation with Full-Length Internal Consistency Goodness-of-Fit Statistics General Population & ADHD Group Differences
τ α ω X2 df CFI TLI RMSEA
(95% CI)
SRMR Cliff's d
(95% CI)
Full-Length 13 -- .91 .91 232.661*** 65 .973 .968 .072
(.066, .079)
.045 .60 (.50, .69)
Short Form Options 7 .83 .88 .88 47.454*** 14 .988 .982 .076
(.062, .090)
.034 .61 (.51, .70)
6 .81 .86 .86 41.435*** 9 .986 .977 .089
(.073, .107)
.036 .63 (.54, .71)
5 .79 .85 .85 16.712** 5 .993 .986 .079
(.057, .103)
.025 .65 (.55, .72)
4 .77 .82 .83 6.88 2 .996 .987 .085
(.052, 123)
.020 .65 (.56, .73)
Note. N = 1,362. τ = Kendall's tau correlation coefficient; guidelines for interpreting |τ|: weak ≤ .20; medium = .21 to .34; strong ≥ .35.; CFI = Comparative Fit Index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual. All χ2 results are non-significant; p > .05.
Click to expand

Figure 11.1. Comparison of Short Form Options: CAARS 2–Short Observer Impulsivity Scale

Step 5–Final short form tested. The final set of items selected for the CAARS 2–Short Content Scales included 37 items each for Self-Report and Observer. Once the final versions for each scale were selected, items were recalibrated using IRT analyses for both the calibration and validation samples. The items selected for the CAARS 2–Short Content Scales, along with the slope (a) and location (b) parameters of the recalibration, can be found in Tables 11.4 and 11.5. Overall, the CAARS 2–Short demonstrated strong item discrimination, with a minimum slope greater than 1.0 for all samples tested. These results suggest that the selected items distinguish well between low and high levels of the construct being measured by each scale.

Click to expand

Table 11.4. IRT Parameters: CAARS 2–Short Self-Report

CAARS 2–Short Scale Item Stem CAARS 2–Short: Calibration Sample CAARS 2–Short: Validation Sample Full-Length CAARS 2: Total Sample
a b1 b2 b3 a b1 b2 b3 a b1 b2 b3
Inattention/​Executive Dysfunction Loses focus in conversations 2.47 -0.07 1.18 2.02 2.72 -0.02 1.26 2.07 3.92 0.05 0.77 1.34
Has trouble with multi-step tasks 2.65 0.30 1.23 2.06 2.70 0.40 1.44 2.17 3.69 -0.33 0.58 1.27
Difficulty prioritizing 3.24 0.17 1.09 1.78 3.14 0.26 1.19 1.79 2.72 0.01 0.99 1.84
Has difficulty paying attention to details 3.06 0.24 1.29 1.99 3.07 0.37 1.35 2.18 2.69 -0.05 1.01 1.78
Difficulty organizing 2.87 -0.09 0.98 1.71 2.61 0.03 1.04 1.80 2.31 -0.02 1.13 2.02
Makes careless mistakes 2.25 -0.23 1.26 2.16 2.35 -0.12 1.33 2.22 2.27 -0.26 0.77 1.70
Difficulty planning ahead 2.31 0.26 1.26 2.19 2.18 0.34 1.29 2.08 1.67 0.41 1.71 2.84
Misses deadlines 2.29 0.44 1.56 2.34 2.41 0.43 1.59 2.22 2.15 0.66 1.89 2.78
Forgets to do things 2.54 -0.39 1.10 1.92 2.71 -0.34 1.15 1.96 2.08 -0.30 1.25 2.20
Distracted easily 3.07 -0.35 0.72 1.45 2.65 -0.27 0.84 1.52 3.13 -0.32 0.75 1.47
Difficulty following instructions 3.20 0.33 1.40 2.25 3.33 0.48 1.45 2.21 2.86 0.40 1.48 2.33
Inattentive 2.61 0.29 1.31 2.06 2.93 0.35 1.38 2.25 2.61 0.06 1.14 1.97
Hyperactivity Distracts others 1.91 0.56 1.75 2.58 1.76 0.56 1.80 2.74 1.99 0.10 1.39 2.43
Taps hands or feet 1.60 -0.10 1.02 1.80 1.50 -0.06 1.09 2.00 2.60 -0.06 1.22 2.06
Feels restless when still 2.39 -0.38 0.74 1.75 2.42 -0.26 0.88 1.75 2.53 -0.07 1.09 1.92
Difficulty staying still 2.91 0.01 0.96 1.81 3.05 0.03 0.98 1.74 3.18 0.19 1.15 1.82
Moves around when they should not 3.49 0.18 1.08 1.84 3.83 0.18 1.12 1.84 2.26 0.30 1.47 2.32
Struggles with being quiet 1.72 0.26 1.36 2.38 1.55 0.31 1.58 2.70 2.75 -0.38 1.11 1.93
Leaves seat when they shouldn't 2.04 0.80 1.90 2.67 2.10 0.84 1.96 2.94 1.66 -1.31 0.03 0.91
Impulsivity Speaks without thinking first 2.00 -0.28 1.22 2.28 2.16 -0.30 1.28 2.11 2.01 0.53 1.71 2.56
Intrudes 2.38 0.59 1.75 2.62 2.05 0.75 2.02 2.88 2.64 0.17 1.36 2.25
Risky behavior 1.84 0.38 1.64 2.64 1.80 0.43 1.67 2.79 1.70 -0.08 1.00 1.81
Difficulty with turn-taking 2.02 0.26 1.46 2.39 2.20 0.37 1.56 2.37 1.93 -0.19 1.15 2.09
Impulsive 2.00 -0.16 1.09 2.06 2.14 -0.22 1.15 1.99 3.51 -0.16 0.86 1.68
Interrupt others 2.31 0.14 1.38 2.33 2.52 0.22 1.42 2.29 2.89 0.29 1.35 2.12
Rushes 1.91 -0.29 1.26 2.34 2.18 -0.31 1.21 2.34 3.12 0.18 1.13 1.92
Emotional Dysregulation Difficulty controlling anger 2.03 0.07 1.37 2.36 1.86 0.14 1.45 2.54 2.38 0.35 1.37 2.21
Moods change quickly 2.35 -0.07 1.10 1.95 2.71 -0.05 1.10 1.91 2.45 -0.32 0.80 1.75
Easily frustrated 3.14 -0.20 0.86 1.68 3.38 -0.11 0.89 1.72 2.24 -0.19 1.31 2.23
Overreacts 2.94 -0.18 1.07 1.92 2.85 -0.14 1.17 1.91 2.10 0.29 1.31 2.21
Difficulty controlling emotions 2.31 -0.09 1.09 1.99 2.41 0.06 1.16 2.01 2.26 0.43 1.61 2.34
Difficulty calming down 2.70 0.07 1.05 1.96 2.93 0.06 1.18 1.90 2.68 -0.11 0.88 1.69
Negative Self-Concept Lacks confidence from past failures 2.88 0.03 0.80 1.44 3.72 0.09 0.79 1.35 2.02 -0.31 1.24 2.35
Lacks confidence 5.04 -0.33 0.53 1.16 3.77 -0.31 0.59 1.35 2.87 -0.16 1.12 1.92
Feels inferior 2.47 -0.29 0.72 1.59 2.21 -0.23 0.79 1.80 1.78 0.27 1.41 2.42
Avoids challenges 2.78 -0.17 0.84 1.58 2.68 -0.05 0.90 1.79 2.10 0.81 1.93 2.80
Self-critical 1.64 -1.33 -0.01 0.92 1.76 -1.27 0.09 0.90 2.76 0.30 1.35 2.16
Note. a refers to the discrimination (slope) parameter of an IRT model; b1, b2, and b3 refer to the threshold (location) parameters of an IRT model.
Click to expand

Table 11.5. IRT Parameters: CAARS 2–Short Observer

CAARS 2–Short Scale Item Stem CAARS 2–Short: Calibration Sample CAARS 2–Short: Validation Sample Full-Length CAARS 2: Total Sample
a b1 b2 b3 a b1 b2 b3 a b1 b2 b3
Inattention/​Executive Dysfunction Loses focus in conversations 2.24 0.51 1.77 2.56 2.46 0.45 1.56 2.31 2.38 0.47 1.65 2.42
Has trouble with multi-step tasks 2.91 0.59 1.49 2.15 3.03 0.54 1.42 2.11 2.82 0.56 1.48 2.16
Difficulty prioritizing 3.82 0.36 1.11 1.77 3.37 0.30 1.17 1.94 3.71 0.32 1.13 1.85
Has difficulty paying attention to details 3.30 0.54 1.41 2.22 3.91 0.45 1.31 1.96 3.43 0.49 1.37 2.11
Difficulty organizing 2.75 0.28 1.19 1.95 3.02 0.20 1.20 1.95 2.93 0.23 1.18 1.94
Makes careless mistakes 2.26 0.30 1.52 2.28 2.74 0.22 1.43 2.13 2.36 0.25 1.50 2.24
Difficulty planning ahead 2.62 0.20 1.25 1.96 2.65 0.19 1.18 1.88 2.60 0.19 1.22 1.93
Misses deadlines 2.49 0.52 1.57 2.28 2.67 0.46 1.50 2.31 2.56 0.48 1.54 2.30
Forgets to do things 2.71 -0.11 1.34 2.10 2.77 -0.03 1.30 2.04 2.92 -0.07 1.30 2.03
Distracted easily 2.64 0.14 1.13 1.81 3.02 0.09 1.18 1.86 2.90 0.11 1.14 1.83
Difficulty following instructions 3.29 0.56 1.51 2.19 3.92 0.46 1.38 2.12 3.25 0.51 1.47 2.21
Inattentive 2.05 0.60 1.79 2.57 2.69 0.57 1.55 2.29 2.35 0.58 1.66 2.42
Hyperactivity Distracts others 1.84 0.72 1.82 2.75 1.53 0.77 1.97 2.82 1.96 0.69 1.75 2.57
Taps hands or feet 1.61 0.63 1.69 2.31 1.61 0.57 1.76 2.51 1.70 0.59 1.67 2.33
Appears restless when still 2.73 0.27 1.22 1.98 3.17 0.27 1.17 1.94 2.70 0.28 1.22 2.00
Difficulty staying still 4.03 0.47 1.31 2.01 4.73 0.46 1.27 1.89 3.73 0.48 1.32 1.99
Moves around when they should not 4.55 0.57 1.38 2.05 3.78 0.57 1.40 1.98 3.72 0.58 1.41 2.05
Struggles with being quiet 1.69 0.37 1.40 2.26 1.55 0.34 1.44 2.48 1.92 0.33 1.31 2.17
Leaves seat when they shouldn't 2.34 1.06 2.01 2.75 2.26 0.97 1.91 2.73 2.48 0.99 1.90 2.66
Impulsivity Rushes 2.01 0.22 1.60 2.43 2.28 0.13 1.35 2.20 2.08 0.17 1.48 2.34
Interrupts others 3.01 0.44 1.32 2.11 3.09 0.37 1.48 2.08 3.28 0.40 1.37 2.07
Impulsive 2.11 0.17 1.35 2.19 2.13 0.21 1.34 2.22 2.02 0.19 1.37 2.26
Difficulty with turn-taking 3.07 0.51 1.39 2.21 2.92 0.61 1.53 2.19 3.18 0.54 1.44 2.18
Risky behavior 2.05 0.59 1.66 2.33 2.07 0.52 1.64 2.46 1.92 0.57 1.69 2.47
Intrudes 2.36 0.68 1.64 2.30 2.39 0.64 1.63 2.49 2.34 0.66 1.65 2.42
Speaks without thinking first 2.28 -0.04 1.24 2.03 2.38 -0.14 1.22 1.93 2.39 -0.10 1.22 1.96
Emotional Dysregulation Difficulty controlling anger 2.38 0.14 1.20 1.94 2.47 0.18 1.22 2.11 2.45 0.15 1.20 2.02
Moods change quickly 3.06 0.11 1.14 1.94 3.32 0.20 1.22 1.90 3.22 0.14 1.18 1.93
Easily frustrated 3.10 -0.03 1.02 1.78 3.14 -0.03 1.04 1.91 3.18 -0.04 1.03 1.84
Overreacts 3.45 0.04 1.04 1.70 3.41 0.03 1.12 1.84 3.47 0.03 1.07 1.77
Difficulty controlling emotions 2.64 0.09 1.28 2.04 2.80 0.11 1.26 2.01 2.76 0.09 1.27 2.03
Difficulty calming down 3.29 0.27 1.21 1.94 3.07 0.23 1.22 1.90 2.97 0.25 1.23 1.96
Negative Self-Concept Lacks confidence from past failures 3.47 0.32 1.14 1.77 3.57 0.29 1.17 1.74 3.94 0.30 1.13 1.71
Lacks confidence 3.74 0.03 0.93 1.59 3.44 0.04 1.00 1.65 3.23 0.04 0.98 1.65
Feels inferior 1.92 0.40 1.47 2.32 1.86 0.37 1.58 2.53 1.83 0.39 1.54 2.46
Avoids challenges 2.52 0.22 1.20 2.11 2.37 0.33 1.28 2.07 2.49 0.27 1.22 2.07
Self-critical 2.03 -0.28 0.89 1.73 1.77 -0.33 0.99 1.84 1.90 -0.30 0.94 1.79
Note. a refers to the discrimination (slope) parameter of an IRT model; b1, b2, and b3 refer to the threshold (location) parameters of an IRT model.

The same criteria used in selecting the items for the CAARS 2–Short scales was used to evaluate the shortened forms with the validation sample. Correlations were computed with Kendall’s tau to evaluate the relationship between the full-length and shortened forms. The full-length CAARS 2 and CAARS 2–Short showed very strong, positive, and statistically significant (p < .001) correlations for all scales, ranging from .83 to .93 across forms (see Table 11.6).

Click to expand

Table 11.6. Correlations Between CAARS 2 and CAARS 2–Short Scales: Validation Sample

Scale Correlations: Full-Length & Short (τ)
Self-Report Observer
Inattention/​Executive Dysfunction .87 .86
Hyperactivity .88 .87
Impulsivity .84 .83
Emotional Dysregulation .90 .91
Negative Self-Concept .93 .89
Note. N = 1,057 Self-Report; N = 1,064 Observer; τ = tau correlation coefficient. All correlations significant, p < .001. Guidelines for interpreting |τ|: weak ≤ .20; medium = .21 to .34; strong ≥ .35

Estimates of internal consistency for the CAARS 2–Short scale scores within the validation sample all demonstrated high reliability, with alpha and omega values at or above .85 for Self-Report and Observer (see Table 11.7). The maximum decrease in internal consistency from the CAARS 2 to the CAARS 2–Short was .05 (see Table 11.7), indicating minimal compromises when using the shortened version.

Test information was also explored in the validation samples for Self-Report and Observer, as seen in Figure 11.2. Nearly all scales on both forms had test information values above 10 at 2 SD above the mean; for Self-Report, Impulsivity showed moderate test information, with a peak value greater than 5. These results are similar to the full-length CAARS 2 (see Test Information in chapter 8, Reliability). The test information of the CAARS 2–Short shows minimal loss in reliability compared to the full-length CAARS 2, providing strong evidence for the validity of the shortened scales.

Click to expand

Table 11.7. Internal Consistency of CAARS 2–Short and Full-Length CAARS 2 Content Scales: Validation Sample

Scale Self-Report Observer
Full Short Full Short
α ω α ω α ω α ω
Inattention/​Executive Dysfunction .97 .97 .95 .95 .97 .97 .94 .94
Hyperactivity .92 .92 .88 .88 .91 .92 .87 .87
Impulsivity .92 .92 .90 .90 .91 .91 .87 .87
Emotional Dysregulation .93 .93 .91 .91 .92 .92 .89 .89
Negative Self-Concept .90 .90 .85 .86 .91 .91 .88 .88
Note. N = 1,057 Self-Report; N = 1,064 Observer. α = coefficient alpha; ω = coefficient omega; Full = full-length CAARS 2; Short = CAARS 2–Short
Click to expand

The ability of the shortened scales to distinguish between the General Population and individuals diagnosed with ADHD (Predominantly Inattentive or Combined Presentation) was explored in the validation sample. Results, as measured by Cliff’s d effect sizes of group differences, are presented in Table 11.8, and show that the full-length CAARS 2 and the CAARS 2–Short are comparable with respect to how well they differentiate between General Population and ADHD groups. For both Self-Report and Observer, effect sizes are only marginally different between the two versions, and the overlapping confidence intervals indicate that differences between the form lengths are not significant. Replicating the discriminating ability of the CAARS 2 scales with the CAARS 2–Short scales provides additional evidence that the selected items for the CAARS 2–Short perform well.

Click to expand

Table 11.8. Clinical Group Differences: CAARS 2–Short Validation Sample

Form Scale ADHD Inattentive vs. General Population ADHD Combined vs. General Population
Full-Length Short Form Full-Length Short Form
Cliff's d 95% CI Cliff's d 95% CI Cliff's d 95% CI Cliff's d 95% CI
Self-Report Inattention/​Executive Dysfunction .81 .68, .89 .78 .63, .87 .79 .66, .87 .75 .62, .85
Hyperactivity .44 .24, .60 .45 .26, .60 .70 .56, .80 .71 .58, .80
Impulsivity .35 .13, .53 .38 .19, .57 .59 .41, .72 .60 .41, .74
Emotional Dysregulation .27 .05, .47 .26 .04, .46 .61 .44, .73 .59 .42, .71
Negative Self-Concept .62 .43, .76 .65 .47, .77 .65 .48, .77 .66 .49, .77
Observer Inattention/​Executive Dysfunction .93 .90, .96 .90 .85, .94 .97 .94, .98 .95 .92, .97
Hyperactivity .63 .48, .75 .65 .52, .75 .95 .93, .97 .93 .90, .95
Impulsivity .71 .56, .81 .68 .53, .78 .94 .91, .96 .93 .90, .95
Emotional Dysregulation .52 .35, .65 .50 .33, .64 .84 .76, .89 .85 .78, .90
Negative Self-Concept .63 .49, .74 .65 .51, .76 .75 .64, .84 .76 .64, .84
Note. Self-Report: N = 858 General Population, N = 49 ADHD Inattentive, and N = 55 ADHD Combined; Observer: N = 912 General Population, N = 30 ADHD Inattentive, N = 36 ADHD Combined. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates greater endorsement by the listed ADHD group, relative to the General Population group.

Mirroring what was tested in the full-length CAARS 2 (see Internal Structure in chapter 9, Validity), a CFA was used to determine whether the structure of the full-length CAARS 2 was retained in the CAARS 2–Short. Results from the CFA, presented in Table 11.9, indicate that the 5-factor model is an excellent fit for both the full-length and shortened forms of the CAARS 2. The fit statistics for the full-length and shortened versions support the replicated factor structure across the two lengths.

Click to expand

Table 11.9. Confirmatory Factor Analysis Model Fit Comparison: CAARS 2 and CAARS 2–Short

Form Version χ2 df CFI TLI RMSEA RMSEA 95% CI SRMR
Self-Report Full-Length CAARS 2 7328.74 2474 .960 .959 .043 .042, .044 .044
CAARS 2–Short 1536.57 619 .974 .972 .049 .047, .051 .039
Observer Full-Length CAARS 2 8546.31 2474 .954 .952 .045 .044, .046 .051
CAARS 2–Short 2142.23 619 .965 .962 .057 .055, .059 .047
Note. N = 1,057 Self-Report; N = 1,064 Observer. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual. All χ2 are significant at p < .001.

The development phase goal was to create a shortened version of the CAARS 2 that measured symptoms associated with ADHD efficiently and with minimal reduction of empirical psychometric properties. Overall, the results from this validation sample demonstrated that the CAARS 2–Short has psychometric properties that are comparable to the full-length CAARS 2, in terms of correspondence, internal consistency, test information, ability to distinguish between General Population and ADHD groups, and internal structure.


1 Raters were asked to indicate what languages the individual speaks, and response options included English only, English and Non-English, and Non-English only. For ease of presentation in this chapter, this variable will be referred to as “language(s) spoken.”

< Back Next >