Chapter 9: Validity

Manual

CAARS 2 Manual

Chapter 9: Internal Structure

Internal Structure

view all chapter tables | print this section

The internal structure of the CAARS 2 Content Scales were explored to provide evidence for the validity of the measurement of the intended constructs. The extent to which items interrelate and conform to the theoretical framework can provide evidence for intended interpretation and use of the instrument (AERA, APA, & NCME, 2014). The structure of the CAARS 2 was examined through confirmatory factor analysis (CFA), and alternative and competing measurement models were tested to determine the best fit of the data for the CAARS 2 Self-Report and Observer.

The underlying relationships among the CAARS 2 Content Scales (i.e., Inattention/Executive Dysfunction, Hyperactivity, Impulsivity, Emotional Dysregulation, and Negative Self-Concept) were inspected to provide evidence concerning the internal structure of the CAARS 2. The nature of the multidimensionality of constructs measured in the CAARS has been the subject of some debate in the literature (e.g., Adler et al., 2017; Martel et al., 2012; Park et al., 2018), specifically regarding the separation or unification of Inattention and Executive Dysfunction, as well as Hyperactivity and Impulsivity. To address these considerations, the models described in Table 9.1 were tested.

Click to expand

Table 9.1. Alternative Models Tested for the CAARS 2 Scale Structure

CAARS 2 Content	4-Factor Model	5-Factor Model	6-Factor Model
Inattention	Inattention/Executive Dysfunction	Inattention/Executive Dysfunction	Inattention
Executive Dysfunction	Inattention/Executive Dysfunction	Inattention/Executive Dysfunction	Executive Dysfunction
Hyperactivity	Hyperactivity/Impulsivity	Hyperactivity	Hyperactivity
Impulsivity	Hyperactivity/Impulsivity	Impulsivity	Impulsivity
Emotional Dysregulation	Emotional Dysregulation	Emotional Dysregulation	Emotional Dysregulation
Negative Self-Concept	Negative Self-Concept	Negative Self-Concept	Negative Self-Concept

The following criteria for goodness-of-fit statistics were used to evaluate these models:

Comparative Fit Index (CFI; Bentler, 1990): ≥ .90 for acceptable fit and ≥ .95 for good fit (Hu & Bentler, 1999; McDonald & Ho, 2002).
Tucker-Lewis Index (TLI; Tucker & Lewis, 1973): ≥ .90 for acceptable fit and ≥ .95 for good fit (Hu & Bentler, 1999; McDonald & Ho, 2002).
Root mean square error of approximation (RMSEA; Browne & Cudeck, 1992): ≤ .08 for acceptable fit and ≤ .06 for good fit.
Standardized root mean square residual (SRMR; Bentler, 1995): ≤ .08 representing good fit.

CFI and TLI range from 0 to 1, with higher values indicating greater fit; conversely, RMSEA and SRMR range from 0 to 1, with lower values indicating better fit. The model was evaluated for statistically significant differences, given its nested structure. The results of the model were evaluated by examining overall fit indices, factor loadings, and correlations among factors. When examining intercorrelations, a correlation at or above .95 indicates that the factors are not meaningfully distinct, and parsimony should be favored (i.e., in this case, the selection of the model in which those factors are combined, rather than separated).

The models were evaluated for statistically significant differences, given their nested structure. A scaled chi-square (χ²) difference statistic with a conservative statistical significance level of p ≤ .01 was deemed meaningful for comparing models, as there were multiple comparisons to be examined and χ² is known to be sensitive to large sample sizes (Tanaka, 1987). The difference in CFI was also evaluated, such that CFI had to improve by more than .01 to be considered a meaningful difference between models (Cheung & Rensvold, 2002). In addition to comparing nested models, the results of each model were evaluated by first examining overall fit indices, and then examining factor loadings and correlations among factors. When examining the intercorrelations among factors in the last step of this analysis, a correlation between factors at or above .95 indicates that the factors are not meaningfully distinct, and parsimony should be favored (i.e., in this case, the selection of the model in which those factors are combined, rather than separated). Additionally, to further investigate this last analysis step, confidence intervals around the coefficient of the inter-factor correlations are examined, and confidence intervals of the correlations that do not include a value of 1 (that is, a perfect correlation that shows the two are completely interrelated and therefore overlapping) are understood to indicate distinct constructs (Brown, 2006) and can therefore be retained as two separate factors.

Analyses were conducted with complete cases from Total Samples, including all available data from the clinical and general population groups (N = 2,226 for Self-Report; N = 2,150 for Observer; see Standardization Phase in chapter 6, Development, for details about these samples), using correlated-factor models with robust estimation methods for ordinal items via the lavaan package in R (Rosseel, 2012). As can be seen in Table 9.2, results for these competing models for Self-Report and Observer all demonstrated strong fit and performed similarly to one another. The fit indices met or exceeded typical guidelines for good fit. Model fit improved (i.e., CFI and TLI increased, and SRMR and RMSEA decreased) as more factors were added to the model.

Click to expand

Table 9.2. Fit Indices for Confirmatory Factor Analysis Models: CAARS 2 Content Scales

Form	Model	χ²	df	CFI	TLI	SRMR	RMSEA	RMSEA Confidence Interval
Self-Report	4-factor	15621.08	2478	.949	.947	.044	.049	.049, .050
	5-factor	14321.28	2474	.953	.951	.042	.047	.047, .048
	6-factor	12875.15	2469	.957	.956	.041	.045	.045, .046
Observer	4-factor	17910.77	2478	.940	.938	.051	.051	.050, .051
	5-factor	16013.27	2474	.945	.943	.048	.049	.048, .049
	6-factor	13388.60	2469	.953	.951	.044	.045	.044, .046

Note. CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; RMSEA = Root mean square error of approximation. All χ² values are significant, p < .01.

All three models displayed good fit to the data, so additional analyses were conducted to determine which model had the best fit across CAARS 2 Self-Report and Observer. Results of the χ2 difference test for nested models can be seen in Table 9.3. All model comparisons displayed statistically significant differences (p < .01), indicating that a greater number of factors did significantly improve fit, yet models with additional factors did not show a meaningful gain in CFI (change in CFI less than or very close to .01 for all comparisons). Therefore, further investigation was warranted to determine the most appropriate model for the data.

Click to expand

Table 9.3. Comparison of Nested Confirmatory Factor Analysis Models: CAARS 2 Content Scales

Form	Models Compared	χ²	df	p	ΔCFI
Self-Report	4-factor vs. 5-factor	108.47	4	< .01	.004
Self-Report	5-factor vs. 6-factor	135.32	5	< .01	.008
Observer	4-factor vs. 5-factor	72.40	4	< .01	.005
Observer	5-factor vs. 6-factor	132.77	5	< .01	.013

Note. ΔCFI = change in Comparative Fit Index value. All χ² values are significant, p < .01.

Inspection of the inter-factor correlations of the models was the next step in this series of analyses. In the 6-factor model, Inattention and Executive Dysfunction were correlated close to the recommended threshold of .95 for meaningfully distinct factors (Self-Report r = .951, Observer r = .942), and the confidence intervals for these estimates, when rounded to three decimals, included a value of 1, indicating possible overlap. Given this finding, the 6-factor model was rejected, as separating Inattention and Executive Dysfunction was not supported by the data. All goodness-of-fit statistics and the χ2 difference tests indicated that the 4-factor model performed worse than the 5-factor model; therefore, the 5-factor model was inspected further. Close examination of the inter-factor correlations of the 5-factor model, as seen in Tables 9.4 and 9.5, indicated that Hyperactivity and Impulsivity were strongly correlated but were not entirely overlapping constructs (Self-Report r = .897, Observer r = .910). In addition, the confidence intervals for these correlations did not include a correlation of 1, providing further evidence that they could be viewed as distinct. Therefore, the 5-factor model was chosen as the best fit for the CAARS 2 Content Scales for both Self-Report and Observer.

Examination of the factor loadings provided additional support for the 5-factor model. All factor loadings were positive, statistically significant, and exceeded a typical minimum threshold (loading ≥ .40; Tabachnick & Fidell, 2007). For Self-Report, loadings ranged from .470 to .951 (median = .799). For Observer, loadings ranged from .519 to .968 (median = .815). The strength of this model provides strong evidence for the structural validity of the CAARS 2 domains.

Click to expand

Table 9.4. Five-Factor Model Inter-Factor Correlations: CAARS 2 Self-Report Content Scales

Scale	Inattention/Executive Dysfunction	Hyperactivity	Impulsivity	Emotional Dysregulation	Negative Self-Concept
Inattention/Executive Dysfunction	--	--	--	--	--
Hyperactivity	.818	--	--	--	--
Impulsivity	.877	.910	--	--	--
Emotional Dysregulation	.777	.765	.873	--	--
Negative Self-Concept	.774	.621	.664	.738	--

Note. N = 2,226. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80.

Click to expand

Table 9.5. Five-Factor Model Inter-Factor Correlations: CAARS 2 Observer Content Scales

Scale	Inattention/Executive Dysfunction	Hyperactivity	Impulsivity	Emotional Dysregulation	Negative Self-Concept
Inattention/Executive Dysfunction	--	--	--	--	--
Hyperactivity	.778	--	--	--	--
Impulsivity	.838	.897	--	--	--
Emotional Dysregulation	.738	.729	.871	--	--
Negative Self-Concept	.695	.480	.544	.634	--

Note. N = 2,150. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80.

< Back

Next >