Chapter 11: CAARS 2–Short

Manual

CAARS 2 Manual

Chapter 11: Development

Development

view all chapter tables | print this section

Samples
Analyses and Results

The goal of the CAARS 2–Short is to assess core and associated symptoms of ADHD. Therefore, all five CAARS 2 Content Scales were included–Inattention/Executive Dysfunction, Hyperactivity, Impulsivity, Emotional Dysregulation, and Negative Self-Concept. Note that the CAARS 2–Short also includes the Response Style Analysis, ADHD Index, and Additional Questions; these components are parallel to the full-length form and are not analyzed or discussed separately in this chapter. Please see Table 1.1 in chapter 1, Introduction, for a comparison of the content included in the full-length CAARS 2 and CAARS 2–Short. In order to mitigate the risks to measurement precision, reliability, and validity that can occur with abbreviated versions of scales, recommended practices for developing short forms were followed for the CAARS 2–Short (Emons et al., 2007; Kruyen et al., 2013; Smith et al., 2000; Ziegler et al., 2014).

Samples

The CAARS 2–Short was derived and validated using the CAARS 2 Total Sample (see Table 6.4 in chapter 6, Development). The Total Sample, which included individuals from the general population and from clinical groups, comprised a sample of 2,232 individuals aged 18 or older who completed the CAARS 2 Self-Report and 2,150 observers who rated adults aged 18 or older. The Total Sample was used to select and validate the items for the shortened Content Scales. Note that two individuals from the Observer Total Sample were excluded from analyses as they had omitted items that affected analyses relevant to the creation of a shortened form. The samples were split into calibration and validation subsamples (see Tables 11.1a and 11.1b for demographic characteristics of the rated individuals and the raters, respectively).

Click to expand

Table 11.1a. Demographic Characteristics of the Rated Individuals: CAARS 2–Short Calibration and Validation Samples

Demographic		Self-Report				Observer
		Calibration		Validation		Calibration		Validation
		N	%	N	%	N	%	N	%
Gender	Male	544	46.3	487	46.1	517	47.6	504	47.4
	Female	621	52.9	569	53.8	565	52.0	559	52.5
	Other	10	0.9	1	0.1	4	0.4	1	0.1
U.S. Race/Ethnicity	Hispanic	117	10.0	112	10.6	126	11.6	106	10.0
	Asian	50	4.3	42	4.0	34	3.1	35	3.3
	Black	94	8.0	93	8.8	97	8.9	102	9.6
	White	693	59.0	643	60.8	650	59.9	630	59.2
	Other	28	2.4	15	1.4	20	1.8	21	2.0
U.S. Region	Northeast	165	14.0	176	16.7	165	15.2	172	16.2
	Midwest	232	19.7	198	18.7	219	20.2	198	18.6
	South	382	32.5	328	31.0	333	30.7	328	30.8
	West	203	17.3	203	19.2	210	19.3	196	18.4
Canadian Region	Central	124	10.6	84	7.9	105	9.7	111	10.4
	East	9	0.8	15	1.4	11	1.0	14	1.3
	West	60	5.1	53	5.0	43	4.0	45	4.2
Canadian Race/Ethnicity	Not a visible minority	161	13.7	125	11.8	129	11.9	133	12.5
Canadian Race/Ethnicity	Visible minority	32	2.7	27	2.6	30	2.8	37	3.5
Education Level	No high school diploma	87	7.4	67	6.3	88	8.1	85	8.0
	High school diploma/GED	291	24.8	268	25.4	301	27.7	322	30.3
	Some college or associate degree	389	33.1	369	34.9	340	31.3	325	30.5
	Bachelor's degree	251	21.4	222	21.0	214	19.7	204	19.2
	Graduate or professional degree	157	13.4	131	12.4	143	13.2	128	12.0
Diagnosis	ADHD Inattentive	64	5.4	50	4.7	35	3.2	30	2.8
	ADHD Hyperactive/Impulsive	0	0.0	0	0.0	1	0.1	8	0.8
	ADHD Combined	76	6.5	55	5.2	60	5.5	36	3.4
	Anxiety	105	8.9	86	8.1	71	6.5	67	6.3
	Depression	90	7.7	75	7.1	58	5.3	61	5.7
	Other Diagnosis	69	5.9	45	4.3	49	4.5	44	4.1
	No Diagnosis	930	79.1	863	81.6	923	85.0	912	85.7
Age in years M (SD)		47.4 (19.3)		47.6 (19.3)		47.7 (19.8)		47.8 (19.6)
Total		1,175	100.0	1,057	100.0	1,086	100.0	1,064	100.0

Note. Anxiety includes Generalized Anxiety Disorder, Panic Disorder, Separation Anxiety, Specific Phobia, and Social Anxiety Disorder. Depression includes Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring diagnoses count towards more than one diagnostic group.

Click to expand

Table 11.1b. Demographic Characteristics of Raters: CAARS 2–Short Calibration and Validation Samples

Rater Demographic		Calibration		Validation
Rater Demographic		N	%	N	%
Gender	Male	693	63.8	371	34.9
	Female	393	36.2	691	64.9
	Other	0	0.0	2	0.2
U.S. Race/Ethnicity	Hispanic	120	11.0	118	11.1
	Asian	25	2.3	34	3.2
	Black	95	8.7	89	8.4
	White	665	61.2	626	58.8
	Other	19	1.7	23	2.2
Canadian Race/Ethnicity	Not a visible minority	134	12.3	137	12.9
Canadian Race/Ethnicity	Visible minority	28	2.6	37	3.5
U.S. Region	Northeast	160	14.7	168	15.8
	Midwest	217	20.0	192	18.0
	South	356	32.8	351	33.0
	West	191	17.6	179	16.8
Canadian Region	Central	109	10.0	115	10.8
	East	10	0.9	15	1.4
	West	43	4.0	44	4.1
Education Level	No high school diploma	38	3.5	30	2.8
	High school diploma/GED	257	23.7	252	23.7
	Some college or associate degree	403	37.1	405	38.1
	Bachelor's degree	259	23.8	260	24.4
	Graduate or professional degree	129	11.9	117	11.0
Relation to Individual Being Rated	Spouse	334	30.8	286	26.9
	Friend	257	23.7	273	25.7
	Other Family Member	482	44.4	491	46.1
	Other	13	1.2	14	1.3
Length of Relationship	1-5 months	8	0.7	6	0.6
	6-11 months	11	1.0	5	0.5
	1-3 years	66	6.1	74	7.0
	More than 3 years	1,001	92.2	979	92.0
How well does the rater know the individual being rated?	Moderately well	61	5.6	80	7.5
How well does the rater know the individual being rated?	Very well	1,025	94.4	984	92.5
How often does the rater interact with the individual being rated?	Monthly	64	5.9	86	8.1
	Weekly	315	29.0	292	27.4
	Daily	707	65.1	686	64.5
Age in years M (SD)		43.7 (16.0)		44.0 (15.5)
Total		1,086	100.0	1,064	100.0

Analyses and Results

Identical procedures were used to develop the CAARS 2–Short for the Self-Report and Observer forms. Consistent with recommended practice in developing shortened forms, both statistical methods and expert judgment were employed to ensure breadth of coverage of the target construct was retained in the shortened forms (Kruyen et al., 2013; Smith et al., 2000; Ziegler et al., 2014). The steps involved in item selection and subsequent validation of the shortened forms were as follows:

Step 1–Core items selected. Five experts in adult ADHD (see Acknowledgements) were asked to identify items from the full-length CAARS 2 that best represented the core construct for each scale. Experts were asked to identify core items for Self-Report and Observer separately. The number of experts who endorsed each item as core was summed, producing a score that ranged from 0 to 5 for each item. Expert consensus on a core item was defined as an item score of 4 or higher, which represented agreement from at least 4 out of 5 experts. All core items were initially included in the shortened scales, though a small subset of core items were later excluded due to statistical considerations as outlined in Step 2.

Step 2–Items excluded due to statistical considerations. Items were examined for local dependence (LD) as well as differential item functioning (DIF) across demographic groups (i.e., gender, race/ethnicity, language(s)¹ spoken, and education level [EL]) in the Total Sample. LD refers to the assumption of an IRT model that the items in a scale share variance only due to a common factor and are not related to one another in other ways (e.g., a response to one question depends on a response from an earlier question; Embretson & Reise, 2000). DIF refers to the assumption of an IRT model that there is no statistical item bias in terms of group differences (Embreton & Reise, 2000). If there was evidence for significant and meaningful LD or DIF, the item was excluded from consideration on the CAARS 2–Short. Including the items could mean the scale score would be affected by factors other than the construct being measured (that is, significant DIF would indicate item responses are unduly influenced by group characteristics, while meaningful LD would suggest that responses are influenced by item similarity such that the items may be related for a reason beyond their shared latent construct). While LD and DIF had negligible impact on the full-length CAARS 2 (see chapter 6, Development, for item selection procedures that evaluated these same statistics and found little evidence for the meaningful influence of either statistic in the full-length CAARS 2 items), LD and DIF can have a larger influence on shorter scales and therefore more stringent criteria were set for the CAARS 2–Short.

Items with a medium DIF effect size in terms of the tested demographic groups were excluded from consideration for the short form. Using this criterion, no items were excluded from the Self-Report form and only one item was excluded from the Observer form (as there was a moderate effect size between Hispanic and White individuals for Observer).

LD was assessed using (a) residual correlations among items greater than .15, (b) modification indices for 1-factor confirmatory factor analysis (CFA) models for each of the scales to assess residual correlation pairs, and (c) the presence of a significant χ² test (Chen & Thissen, 1997). When LD was detected for an item pair, the item with the better measurement properties overall was considered for inclusion on the shortened forms.

Step 3–Remaining items selected. Item selection was done using the calibration sample, by systematically adding items one at a time to the core set of items for each scale, based on the following considerations:

Statistical Properties. Additional items were added based on item discrimination, precision of measurement, and ability to discriminate between the General Population and ADHD samples.
- Item discrimination assesses an item’s ability to distinguish individuals at low versus high levels of the trait. Item discrimination was measured using the slope parameter of each item from an IRT model. Higher values (e.g., > .75) were favored, as they indicate better discrimination (Embretson & Reise, 2000).
- Precision of measurement is inversely related to the amount of error, so that an item with low error has high precision. Precision of measurement for items was assessed using item information curves (IICs). An IIC graphically shows precision of measurement across the range of the construct being measured, also known as theta. Precision at or above 1.5 SD from the average level of the construct was targeted, to best capture both subclinical and clinical levels of the construct. Greater amounts of information indicate higher precision of measurement and lower standard error (more details can be found in Test Information in chapter 8, Reliability).
- Cliff’s delta (Cliff’s d; Cliff, 1993) was employed to examine how well each item distinguished between the General Population and ADHD samples. Cliff’s d is a measure of effect size used for non-parametric data. Items with higher effect sizes were preferred as they indicate better discrimination between groups.
Expert ratings. When items had similar statistics, the item with the higher expert rating was retained
Content represented. Many scales assess different content areas or facets within the construct. For example, for the full-length CAARS 2 Hyperactivity scale, 61% of the items assess behavioral aspects of hyperactivity, 31% assess verbal hyperactivity, and 8% assess both behavioral and verbal aspects. A similar ratio of items was retained for the shortened form, and across both raters, to ensure proportional coverage of all facets of the construct measured.

Note that the CAARS 2–Short Self-Report and Observer were developed separately; while they both cover the same core content areas, they differ at the item-level. Experts identified different items as core for the different rater types, and statistical analysis dictated empirical selection of certain items for the Self-Report and other items for the Observer form. As a result, the CAARS 2–Short Self-Report and Observer items are overlapping, but not completely aligned.

Step 4–Alternate shortened versions compared. The development team set a minimum and maximum length for each scale on the short form (see Table 11.2). The minimum was the fewest number of items that would still allow for reasonable breadth of coverage; the maximum was approximately two-thirds of the full-length scale. For example, the Inattention/Executive Dysfunction is the longest scale with the most content to cover. Therefore, it required more items than other scales (as seen in Table 11.3). Starting with the minimum length, alternate-length short forms were created sequentially by adding one item at a time; therefore, for example, a 7-item version was compared to an 8-item version, which only differed by one additional item. This approach enabled testing for the ideal length that balanced efficiency with reliability and validity (Smith et al., 2000).

Click to expand

Table 11.2. Minimum and Maximum Potential Scale Length for CAARS 2–Short

Scale	Full-Length Item Count	Minimum Short Item Count	Maximum Short Item Count
Inattention/Executive Dysfunction	30	7	12
Hyperactivity	13	4	7
Impulsivity	13	4	7
Emotional Dysregulation	9	4	6
Negative Self-Concept	7	4	5

The following criteria were used to assess reliability and validity:

Measurement precision of the scale, with an emphasis on peak precision at 1.5 or 2 standard deviations above the mean for a given construct. Ensuring precision at this range was the focus, as that is typically understood to capture the clinical range of the constructs measured (see also Test Information in chapter 8, Reliability). Information values greater than 10 indicate high precision, values below 10 are moderately precise, and values near 5 are considered adequate (Flannery et al., 1995; Reeve & Fayers, 2005).
Goodness-of-fit statistics were explored to ensure consistency in the factor structure between shortened and full-length scales. This comparison is helpful for ensuring that construct validity is retained (Rammstedt & Beierlein, 2014) and that all dimensions of the construct are proportionally represented in the short form (Maloney et al., 2011). A detailed discussion of the multiple fit indices considered is provided in Internal Structure in chapter 9, Validity.
Internal consistency, as measured by alpha and omega, was evaluated (see Internal Consistency in chapter 8, Reliability, for a detailed discussion of these metrics).
Correlations between raw scores on the shortened scales and the full-length scales were assessed (via Kendall’s tau coefficient, given the non-normality of the distribution of the scales). High correlation coefficients provide evidence that the scales are measuring the same construct. Reliability, validity, and construct coverage were prioritized over correlation between form lengths.

The statistical properties for each of the alternate versions were evaluated, and results for each were compared against the full-length CAARS 2 as a reference point. In instances where a shorter version performed as well statistically as a version with more items, the version that included the fewest items was favored. The process is illustrated with the CAARS 2–Short Observer Impulsivity scale as an example. As seen in Table 11.3, 4-, 5-, 6-, and 7-item versions of this scale were compared, and the analyses revealed acceptable and similar results for all versions in terms of correlations to the full-length scale, internal consistency, and model fit. However, compared to the other versions, the 6-item version had slightly less desirable fit statistics (higher RMSEA and SRMR and lower CFI and TLI), and the 4-item version had slightly lower internal consistency estimates. The precision of measurement, as seen in Figure 11.1, showed that the 7-item version was the only one to surpass a value of 10. Based on these results, the 7-item version was selected for the CAARS 2–Short Impulsivity scale. This process of comparing various scale lengths for each scale on the CAARS 2–Short was repeated until a final set of items was selected for all scales.

Click to expand

Table 11.3. Comparison of Short Form Options: CAARS 2 Observer Impulsivity Scale

Form	Number of Items	Correlation with Full-Length	Internal Consistency		Goodness-of-Fit Statistics						General Population & ADHD Group Differences
Form	Number of Items	τ	α	ω	X2	df	CFI	TLI	RMSEA (95% CI)	SRMR	Cliff's d (95% CI)
Full-Length	13	--	.91	.91	232.661***	65	.973	.968	.072 (.066, .079)	.045	.60 (.50, .69)
Short Form Options	7	.83	.88	.88	47.454***	14	.988	.982	.076 (.062, .090)	.034	.61 (.51, .70)
	6	.81	.86	.86	41.435***	9	.986	.977	.089 (.073, .107)	.036	.63 (.54, .71)
	5	.79	.85	.85	16.712**	5	.993	.986	.079 (.057, .103)	.025	.65 (.55, .72)
	4	.77	.82	.83	6.88	2	.996	.987	.085 (.052, 123)	.020	.65 (.56, .73)

Note. N = 1,362. τ = Kendall's tau correlation coefficient; guidelines for interpreting |τ|: weak ≤ .20; medium = .21 to .34; strong ≥ .35.; CFI = Comparative Fit Index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual. All χ² results are non-significant; p > .05.

Click to expand

Figure 11.1. Comparison of Short Form Options: CAARS 2–Short Observer Impulsivity Scale

Step 5–Final short form tested. The final set of items selected for the CAARS 2–Short Content Scales included 37 items each for Self-Report and Observer. Once the final versions for each scale were selected, items were recalibrated using IRT analyses for both the calibration and validation samples. The items selected for the CAARS 2–Short Content Scales, along with the slope (a) and location (b) parameters of the recalibration, can be found in Tables 11.4 and 11.5. Overall, the CAARS 2–Short demonstrated strong item discrimination, with a minimum slope greater than 1.0 for all samples tested. These results suggest that the selected items distinguish well between low and high levels of the construct being measured by each scale.

Click to expand

Table 11.4. IRT Parameters: CAARS 2–Short Self-Report

CAARS 2–Short Scale	Item Stem	CAARS 2–Short: Calibration Sample				CAARS 2–Short: Validation Sample				Full-Length CAARS 2: Total Sample
CAARS 2–Short Scale	Item Stem	a	b1	b2	b3	a	b1	b2	b3	a	b1	b2	b3
Inattention/Executive Dysfunction	Loses focus in conversations	2.47	-0.07	1.18	2.02	2.72	-0.02	1.26	2.07	3.92	0.05	0.77	1.34
	Has trouble with multi-step tasks	2.65	0.30	1.23	2.06	2.70	0.40	1.44	2.17	3.69	-0.33	0.58	1.27
	Difficulty prioritizing	3.24	0.17	1.09	1.78	3.14	0.26	1.19	1.79	2.72	0.01	0.99	1.84
	Has difficulty paying attention to details	3.06	0.24	1.29	1.99	3.07	0.37	1.35	2.18	2.69	-0.05	1.01	1.78
	Difficulty organizing	2.87	-0.09	0.98	1.71	2.61	0.03	1.04	1.80	2.31	-0.02	1.13	2.02
	Makes careless mistakes	2.25	-0.23	1.26	2.16	2.35	-0.12	1.33	2.22	2.27	-0.26	0.77	1.70
	Difficulty planning ahead	2.31	0.26	1.26	2.19	2.18	0.34	1.29	2.08	1.67	0.41	1.71	2.84
	Misses deadlines	2.29	0.44	1.56	2.34	2.41	0.43	1.59	2.22	2.15	0.66	1.89	2.78
	Forgets to do things	2.54	-0.39	1.10	1.92	2.71	-0.34	1.15	1.96	2.08	-0.30	1.25	2.20
	Distracted easily	3.07	-0.35	0.72	1.45	2.65	-0.27	0.84	1.52	3.13	-0.32	0.75	1.47
	Difficulty following instructions	3.20	0.33	1.40	2.25	3.33	0.48	1.45	2.21	2.86	0.40	1.48	2.33
	Inattentive	2.61	0.29	1.31	2.06	2.93	0.35	1.38	2.25	2.61	0.06	1.14	1.97
Hyperactivity	Distracts others	1.91	0.56	1.75	2.58	1.76	0.56	1.80	2.74	1.99	0.10	1.39	2.43
	Taps hands or feet	1.60	-0.10	1.02	1.80	1.50	-0.06	1.09	2.00	2.60	-0.06	1.22	2.06
	Feels restless when still	2.39	-0.38	0.74	1.75	2.42	-0.26	0.88	1.75	2.53	-0.07	1.09	1.92
	Difficulty staying still	2.91	0.01	0.96	1.81	3.05	0.03	0.98	1.74	3.18	0.19	1.15	1.82
	Moves around when they should not	3.49	0.18	1.08	1.84	3.83	0.18	1.12	1.84	2.26	0.30	1.47	2.32
	Struggles with being quiet	1.72	0.26	1.36	2.38	1.55	0.31	1.58	2.70	2.75	-0.38	1.11	1.93
	Leaves seat when they shouldn't	2.04	0.80	1.90	2.67	2.10	0.84	1.96	2.94	1.66	-1.31	0.03	0.91
Impulsivity	Speaks without thinking first	2.00	-0.28	1.22	2.28	2.16	-0.30	1.28	2.11	2.01	0.53	1.71	2.56
	Intrudes	2.38	0.59	1.75	2.62	2.05	0.75	2.02	2.88	2.64	0.17	1.36	2.25
	Risky behavior	1.84	0.38	1.64	2.64	1.80	0.43	1.67	2.79	1.70	-0.08	1.00	1.81
	Difficulty with turn-taking	2.02	0.26	1.46	2.39	2.20	0.37	1.56	2.37	1.93	-0.19	1.15	2.09
	Impulsive	2.00	-0.16	1.09	2.06	2.14	-0.22	1.15	1.99	3.51	-0.16	0.86	1.68
	Interrupt others	2.31	0.14	1.38	2.33	2.52	0.22	1.42	2.29	2.89	0.29	1.35	2.12
	Rushes	1.91	-0.29	1.26	2.34	2.18	-0.31	1.21	2.34	3.12	0.18	1.13	1.92
Emotional Dysregulation	Difficulty controlling anger	2.03	0.07	1.37	2.36	1.86	0.14	1.45	2.54	2.38	0.35	1.37	2.21
	Moods change quickly	2.35	-0.07	1.10	1.95	2.71	-0.05	1.10	1.91	2.45	-0.32	0.80	1.75
	Easily frustrated	3.14	-0.20	0.86	1.68	3.38	-0.11	0.89	1.72	2.24	-0.19	1.31	2.23
	Overreacts	2.94	-0.18	1.07	1.92	2.85	-0.14	1.17	1.91	2.10	0.29	1.31	2.21
	Difficulty controlling emotions	2.31	-0.09	1.09	1.99	2.41	0.06	1.16	2.01	2.26	0.43	1.61	2.34
	Difficulty calming down	2.70	0.07	1.05	1.96	2.93	0.06	1.18	1.90	2.68	-0.11	0.88	1.69
Negative Self-Concept	Lacks confidence from past failures	2.88	0.03	0.80	1.44	3.72	0.09	0.79	1.35	2.02	-0.31	1.24	2.35
	Lacks confidence	5.04	-0.33	0.53	1.16	3.77	-0.31	0.59	1.35	2.87	-0.16	1.12	1.92
	Feels inferior	2.47	-0.29	0.72	1.59	2.21	-0.23	0.79	1.80	1.78	0.27	1.41	2.42
	Avoids challenges	2.78	-0.17	0.84	1.58	2.68	-0.05	0.90	1.79	2.10	0.81	1.93	2.80
	Self-critical	1.64	-1.33	-0.01	0.92	1.76	-1.27	0.09	0.90	2.76	0.30	1.35	2.16

Note. a refers to the discrimination (slope) parameter of an IRT model; b1, b2, and b3 refer to the threshold (location) parameters of an IRT model.

Click to expand

Table 11.5. IRT Parameters: CAARS 2–Short Observer

CAARS 2–Short Scale	Item Stem	CAARS 2–Short: Calibration Sample				CAARS 2–Short: Validation Sample				Full-Length CAARS 2: Total Sample
CAARS 2–Short Scale	Item Stem	a	b1	b2	b3	a	b1	b2	b3	a	b1	b2	b3
Inattention/Executive Dysfunction	Loses focus in conversations	2.24	0.51	1.77	2.56	2.46	0.45	1.56	2.31	2.38	0.47	1.65	2.42
	Has trouble with multi-step tasks	2.91	0.59	1.49	2.15	3.03	0.54	1.42	2.11	2.82	0.56	1.48	2.16
	Difficulty prioritizing	3.82	0.36	1.11	1.77	3.37	0.30	1.17	1.94	3.71	0.32	1.13	1.85
	Has difficulty paying attention to details	3.30	0.54	1.41	2.22	3.91	0.45	1.31	1.96	3.43	0.49	1.37	2.11
	Difficulty organizing	2.75	0.28	1.19	1.95	3.02	0.20	1.20	1.95	2.93	0.23	1.18	1.94
	Makes careless mistakes	2.26	0.30	1.52	2.28	2.74	0.22	1.43	2.13	2.36	0.25	1.50	2.24
	Difficulty planning ahead	2.62	0.20	1.25	1.96	2.65	0.19	1.18	1.88	2.60	0.19	1.22	1.93
	Misses deadlines	2.49	0.52	1.57	2.28	2.67	0.46	1.50	2.31	2.56	0.48	1.54	2.30
	Forgets to do things	2.71	-0.11	1.34	2.10	2.77	-0.03	1.30	2.04	2.92	-0.07	1.30	2.03
	Distracted easily	2.64	0.14	1.13	1.81	3.02	0.09	1.18	1.86	2.90	0.11	1.14	1.83
	Difficulty following instructions	3.29	0.56	1.51	2.19	3.92	0.46	1.38	2.12	3.25	0.51	1.47	2.21
	Inattentive	2.05	0.60	1.79	2.57	2.69	0.57	1.55	2.29	2.35	0.58	1.66	2.42
Hyperactivity	Distracts others	1.84	0.72	1.82	2.75	1.53	0.77	1.97	2.82	1.96	0.69	1.75	2.57
	Taps hands or feet	1.61	0.63	1.69	2.31	1.61	0.57	1.76	2.51	1.70	0.59	1.67	2.33
	Appears restless when still	2.73	0.27	1.22	1.98	3.17	0.27	1.17	1.94	2.70	0.28	1.22	2.00
	Difficulty staying still	4.03	0.47	1.31	2.01	4.73	0.46	1.27	1.89	3.73	0.48	1.32	1.99
	Moves around when they should not	4.55	0.57	1.38	2.05	3.78	0.57	1.40	1.98	3.72	0.58	1.41	2.05
	Struggles with being quiet	1.69	0.37	1.40	2.26	1.55	0.34	1.44	2.48	1.92	0.33	1.31	2.17
	Leaves seat when they shouldn't	2.34	1.06	2.01	2.75	2.26	0.97	1.91	2.73	2.48	0.99	1.90	2.66
Impulsivity	Rushes	2.01	0.22	1.60	2.43	2.28	0.13	1.35	2.20	2.08	0.17	1.48	2.34
	Interrupts others	3.01	0.44	1.32	2.11	3.09	0.37	1.48	2.08	3.28	0.40	1.37	2.07
	Impulsive	2.11	0.17	1.35	2.19	2.13	0.21	1.34	2.22	2.02	0.19	1.37	2.26
	Difficulty with turn-taking	3.07	0.51	1.39	2.21	2.92	0.61	1.53	2.19	3.18	0.54	1.44	2.18
	Risky behavior	2.05	0.59	1.66	2.33	2.07	0.52	1.64	2.46	1.92	0.57	1.69	2.47
	Intrudes	2.36	0.68	1.64	2.30	2.39	0.64	1.63	2.49	2.34	0.66	1.65	2.42
	Speaks without thinking first	2.28	-0.04	1.24	2.03	2.38	-0.14	1.22	1.93	2.39	-0.10	1.22	1.96
Emotional Dysregulation	Difficulty controlling anger	2.38	0.14	1.20	1.94	2.47	0.18	1.22	2.11	2.45	0.15	1.20	2.02
	Moods change quickly	3.06	0.11	1.14	1.94	3.32	0.20	1.22	1.90	3.22	0.14	1.18	1.93
	Easily frustrated	3.10	-0.03	1.02	1.78	3.14	-0.03	1.04	1.91	3.18	-0.04	1.03	1.84
	Overreacts	3.45	0.04	1.04	1.70	3.41	0.03	1.12	1.84	3.47	0.03	1.07	1.77
	Difficulty controlling emotions	2.64	0.09	1.28	2.04	2.80	0.11	1.26	2.01	2.76	0.09	1.27	2.03
	Difficulty calming down	3.29	0.27	1.21	1.94	3.07	0.23	1.22	1.90	2.97	0.25	1.23	1.96
Negative Self-Concept	Lacks confidence from past failures	3.47	0.32	1.14	1.77	3.57	0.29	1.17	1.74	3.94	0.30	1.13	1.71
	Lacks confidence	3.74	0.03	0.93	1.59	3.44	0.04	1.00	1.65	3.23	0.04	0.98	1.65
	Feels inferior	1.92	0.40	1.47	2.32	1.86	0.37	1.58	2.53	1.83	0.39	1.54	2.46
	Avoids challenges	2.52	0.22	1.20	2.11	2.37	0.33	1.28	2.07	2.49	0.27	1.22	2.07
	Self-critical	2.03	-0.28	0.89	1.73	1.77	-0.33	0.99	1.84	1.90	-0.30	0.94	1.79

Note. a refers to the discrimination (slope) parameter of an IRT model; b1, b2, and b3 refer to the threshold (location) parameters of an IRT model.

The same criteria used in selecting the items for the CAARS 2–Short scales was used to evaluate the shortened forms with the validation sample. Correlations were computed with Kendall’s tau to evaluate the relationship between the full-length and shortened forms. The full-length CAARS 2 and CAARS 2–Short showed very strong, positive, and statistically significant (p < .001) correlations for all scales, ranging from .83 to .93 across forms (see Table 11.6).

Click to expand

Table 11.6. Correlations Between CAARS 2 and CAARS 2–Short Scales: Validation Sample

Scale	Correlations: Full-Length & Short (τ)
Scale	Self-Report	Observer
Inattention/Executive Dysfunction	.87	.86
Hyperactivity	.88	.87
Impulsivity	.84	.83
Emotional Dysregulation	.90	.91
Negative Self-Concept	.93	.89

Note. N = 1,057 Self-Report; N = 1,064 Observer; τ = tau correlation coefficient. All correlations significant, p < .001. Guidelines for interpreting |τ|: weak ≤ .20; medium = .21 to .34; strong ≥ .35

Estimates of internal consistency for the CAARS 2–Short scale scores within the validation sample all demonstrated high reliability, with alpha and omega values at or above .85 for Self-Report and Observer (see Table 11.7). The maximum decrease in internal consistency from the CAARS 2 to the CAARS 2–Short was .05 (see Table 11.7), indicating minimal compromises when using the shortened version.

Test information was also explored in the validation samples for Self-Report and Observer, as seen in Figure 11.2. Nearly all scales on both forms had test information values above 10 at 2 SD above the mean; for Self-Report, Impulsivity showed moderate test information, with a peak value greater than 5. These results are similar to the full-length CAARS 2 (see Test Information in chapter 8, Reliability). The test information of the CAARS 2–Short shows minimal loss in reliability compared to the full-length CAARS 2, providing strong evidence for the validity of the shortened scales.

Click to expand

Table 11.7. Internal Consistency of CAARS 2–Short and Full-Length CAARS 2 Content Scales: Validation Sample

Scale	Self-Report				Observer
	Full		Short		Full		Short
	α	ω	α	ω	α	ω	α	ω
Inattention/Executive Dysfunction	.97	.97	.95	.95	.97	.97	.94	.94
Hyperactivity	.92	.92	.88	.88	.91	.92	.87	.87
Impulsivity	.92	.92	.90	.90	.91	.91	.87	.87
Emotional Dysregulation	.93	.93	.91	.91	.92	.92	.89	.89
Negative Self-Concept	.90	.90	.85	.86	.91	.91	.88	.88

Note. N = 1,057 Self-Report; N = 1,064 Observer. α = coefficient alpha; ω = coefficient omega; Full = full-length CAARS 2; Short = CAARS 2–Short

Click to expand

Figure 11.2. Test Information for CAARS 2–Short: Validation Sample

a) Inattention/Executive Dysfunction

b) Hyperactivity

c) Impulsivity

d) Emotional Dysregulation

e) Negative Self-Concept

The ability of the shortened scales to distinguish between the General Population and individuals diagnosed with ADHD (Predominantly Inattentive or Combined Presentation) was explored in the validation sample. Results, as measured by Cliff’s d effect sizes of group differences, are presented in Table 11.8, and show that the full-length CAARS 2 and the CAARS 2–Short are comparable with respect to how well they differentiate between General Population and ADHD groups. For both Self-Report and Observer, effect sizes are only marginally different between the two versions, and the overlapping confidence intervals indicate that differences between the form lengths are not significant. Replicating the discriminating ability of the CAARS 2 scales with the CAARS 2–Short scales provides additional evidence that the selected items for the CAARS 2–Short perform well.

Click to expand

Table 11.8. Clinical Group Differences: CAARS 2–Short Validation Sample

Form	Scale	ADHD Inattentive vs. General Population				ADHD Combined vs. General Population
		Full-Length		Short Form		Full-Length		Short Form
		Cliff's d	95% CI	Cliff's d	95% CI	Cliff's d	95% CI	Cliff's d	95% CI
Self-Report	Inattention/Executive Dysfunction	.81	.68, .89	.78	.63, .87	.79	.66, .87	.75	.62, .85
	Hyperactivity	.44	.24, .60	.45	.26, .60	.70	.56, .80	.71	.58, .80
	Impulsivity	.35	.13, .53	.38	.19, .57	.59	.41, .72	.60	.41, .74
	Emotional Dysregulation	.27	.05, .47	.26	.04, .46	.61	.44, .73	.59	.42, .71
	Negative Self-Concept	.62	.43, .76	.65	.47, .77	.65	.48, .77	.66	.49, .77
Observer	Inattention/Executive Dysfunction	.93	.90, .96	.90	.85, .94	.97	.94, .98	.95	.92, .97
	Hyperactivity	.63	.48, .75	.65	.52, .75	.95	.93, .97	.93	.90, .95
	Impulsivity	.71	.56, .81	.68	.53, .78	.94	.91, .96	.93	.90, .95
	Emotional Dysregulation	.52	.35, .65	.50	.33, .64	.84	.76, .89	.85	.78, .90
	Negative Self-Concept	.63	.49, .74	.65	.51, .76	.75	.64, .84	.76	.64, .84

Note. Self-Report: N = 858 General Population, N = 49 ADHD Inattentive, and N = 55 ADHD Combined; Observer: N = 912 General Population, N = 30 ADHD Inattentive, N = 36 ADHD Combined. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates greater endorsement by the listed ADHD group, relative to the General Population group.

Mirroring what was tested in the full-length CAARS 2 (see Internal Structure in chapter 9, Validity), a CFA was used to determine whether the structure of the full-length CAARS 2 was retained in the CAARS 2–Short. Results from the CFA, presented in Table 11.9, indicate that the 5-factor model is an excellent fit for both the full-length and shortened forms of the CAARS 2. The fit statistics for the full-length and shortened versions support the replicated factor structure across the two lengths.

Click to expand

Table 11.9. Confirmatory Factor Analysis Model Fit Comparison: CAARS 2 and CAARS 2–Short

Form	Version	χ²	df	CFI	TLI	RMSEA	RMSEA 95% CI	SRMR
Self-Report	Full-Length CAARS 2	7328.74	2474	.960	.959	.043	.042, .044	.044
Self-Report	CAARS 2–Short	1536.57	619	.974	.972	.049	.047, .051	.039
Observer	Full-Length CAARS 2	8546.31	2474	.954	.952	.045	.044, .046	.051
Observer	CAARS 2–Short	2142.23	619	.965	.962	.057	.055, .059	.047

Note. N = 1,057 Self-Report; N = 1,064 Observer. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual. All χ² are significant at p < .001.

The development phase goal was to create a shortened version of the CAARS 2 that measured symptoms associated with ADHD efficiently and with minimal reduction of empirical psychometric properties. Overall, the results from this validation sample demonstrated that the CAARS 2–Short has psychometric properties that are comparable to the full-length CAARS 2, in terms of correspondence, internal consistency, test information, ability to distinguish between General Population and ADHD groups, and internal structure.

¹ Raters were asked to indicate what languages the individual speaks, and response options included English only, English and Non-English, and Non-English only. For ease of presentation in this chapter, this variable will be referred to as “language(s) spoken.”

< Back

Next >