Manual

CAARS 2 Manual

Chapter 11: Fairness


Fairness

As with the CAARS 2, a goal for the CAARS 2–Short was to provide fair and unbiased measurement for diverse populations. In order to establish that this goal was met, differences between demographic groups (e.g., gender, race/ethnicity, country of residence, and education level) were examined to ensure that background characteristics of the individual being rated do not affect their test scores. Two main methods of evaluating bias were employed: invariance tests and mean group difference tests. Analyses related to fairness included measurement invariance (MI), differential test functioning (DTF), and comparisons of mean group differences; more information about the methodologies used can be found in appendix M. Visual inspections of DTF results were carried out for all scales (see Figure 10.1 in chapter 10, Fairness, for an example); however, for ease of presentation, effect size statistics are provided within this chapter to summarize the results. Because invariance testing relies on modeled data (i.e., estimating the population, rather than describing the sample), larger sample sizes are required, and a greater range of responses is desired. Therefore, the Total Sample (as described in the Standardization Phase in chapter 6, Development) was used for these invariance analyses, because it includes a considerable number of individuals from the general population, as well as individuals with clinical diagnoses (which will extend the variability of responses). Mean group differences were calculated on subsets of the Normative Sample (see the Fairness section in appendix J for demographic characteristics of the samples).

Gender

Gender, for the purposes of these fairness-related analyses, is defined as the rated individual’s gender identity. Analyses were conducted to compare males and females on the CAARS 2–Short in terms of MI, DTF, and mean group differences. The very small sample size for individuals who are non-binary or indicated “Other” for gender (Self-Report N = 11 and Observer N = 5) did not allow for meaningful testing. Therefore, when assessing invariance by gender, only males (Self-Report N = 1,028; Observer N = 1,021) and females (Self-Report N = 1,186; Observer N = 1,123) were included.

Invariance between males and females for the CAARS 2–Short was first explored via MI analyses (see Table 11.25). For Self-Report and Observer, there are some models with a significant Satorra-Bentler χ2 test (e.g., strict models); however, the absence of any decline in many other fit statistics, such as CFI and SRMR, suggest invariance was upheld. As more constraints were added throughout the process of testing MI, model fit did not change in a meaningful way, indicating that the factor model is invariant between males and females

Click to expand

In addition to MI results, DTF analyses were conducted to explore the invariance of the CAARS 2–Short for males and females through a different framework. The effect sizes of the DTF statistics, as measured by the expected test score standardized difference (ETSSD), are summarized in Table 11.26 for the CAARS 2–Short. Negligible differences (i.e., ETTSD ≤ |.06|) between males and females were observed across all scales and both forms, demonstrating invariance by gender.

Click to expand

Table 11.26. Differential Test Functioning Effect Sizes by Gender

CAARS 2–Short Scale Self-Report Observer
Inattention/​Executive Dysfunction .00 .00
Hyperactivity .03 .02
Impulsivity -.01 .05
Emotional Dysregulation -.01 .04
Negative Self-Concept -.06 .01
Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that females had higher scores than males who had the same level of the construct being measured.

To examine observed group differences between genders, a matched sample of male individuals was selected at random from the Normative Samples. Individuals were matched by education level (EL), language(s) spoken, clinical status, race/ethnicity, and age. The demographic characteristics of the rated individuals in the matched samples (and their raters, where applicable) are presented in appendix J.

The paired samples of males and females were then compared for significant differences across mean scores. Results of the ANOVAs and descriptive statistics for each scale are presented in Tables 11.27 and 11.28. When comparing ratings of male and female individuals, for Self-Report, there were statistically significant differences observed for Inattention/Executive Dysfunction, Hyperactivity, and Impulsivity scales, wherein males yielded slightly higher scores than females; however, the effect sizes were small (Cohen’s d = |0.23 to 0.29|). For the Observer results, the only statistically significant effect observed was on the Negative Self-Concept scale, wherein ratings of females yielded slightly higher scores than males; however, the effect size was small (Cohen’s d = |0.26|). The effect of gender on the remainder of the scale scores was not statistically significant.

Overall, these results support the absence of meaningful gender differences. Taken together, results from the MI, DTF, and mean group difference analyses indicate psychometric equivalence between males and females for the CAARS 2–Short Content Scales. There was no strong evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the two gender groups, and scores for males and females were not meaningfully different. Although results of these analyses suggest that use of Combined Gender as the primary reference group is appropriate, there may be instances where specific comparisons to a gender group are desired. Gender Specific and Combined Gender normative scoring options are available; please see Scoring and Report Options in chapter 3, Administration and Scoring, for details.

Click to expand

Table 11.27. Group Differences by Gender (Male vs. Female): CAARS 2–Short Self-Report

CAARS 2–Short Scale Male
(N = 463)
Female
(N = 463)
Cohen's d F
(1, 924)
P Partial η2
Inattention/​Executive Dysfunction M 50.8 48.6 0.23 12.18 .001 .01
SD 9.8 9.2
Hyperactivity M 50.7 48.5 0.23 12.52 < .001 .01
SD 9.9 9.5
Impulsivity M 51.1 48.3 0.29 19.67 < .001 .02
SD 10.1 9.1
Emotional Dysregulation M 50.0 49.1 0.09 2.07 .151 .00
SD 9.7 9.7
Negative Self-Concept M 49.6 49.9 -0.03 0.16 .685 .00
SD 10.0 9.7
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size=.01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for males than females.
Click to expand

Table 11.28. Group Differences by Gender (Male vs. Female): CAARS 2–Short Observer

CAARS 2–Short Scale Male
(N = 444)
Female
(N = 444)
Cohen's d F
(1, 886)
P Partial η2
Inattention/​Executive Dysfunction M 49.9 49.3 0.07 0.94 .332 .00
SD 9.9 9.8
Hyperactivity M 49.7 49.4 0.03 0.22 .642 .00
SD 9.4 10.1
Impulsivity M 50.0 49.1 0.09 1.81 .178 .00
SD 9.5 10.1
Emotional Dysregulation M 49.9 49.9 -0.01 0.01 .934 .00
SD 10.0 10.3
Negative Self-Concept M 48.4 50.9 -0.26 14.46 < .001 .02
SD 9.2 10.4
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size=.01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for males than females.

Race/Ethnicity

Establishing that the CAARS 2 operates equivalently and produces equally valid scores that can be interpreted in the same way across racial and ethnic groups is a critical component of ensuring that psychometric fairness standards are met. For the U.S. portion of the CAARS 2 Normative Sample, race and ethnicity were categorized according to the U.S. Census Bureau classifications into the following five groups: Hispanic (regardless of race), Black, Asian, White, and Other (which includes Native, Multiracial, and other racial identities not otherwise listed). Individuals whose race/ethnicity was classified as Other were excluded from MI and DTF analyses due to small sample sizes (meaningful interpretation of group-level scores is challenging given that this category includes a multitude of race groups). Additionally, race/ethnicity analyses in this section are limited to individuals who live in the U.S., as the Canadian sample sizes were too small to permit meaningful analyses. More details about the correspondence to the Census classifications can be found in Race/Ethnicity in chapter 7, Standardization.

Differences among the U.S. racial and ethnic groups were explored with regard to the CAARS 2–Short structure and scores. It was expected that there would be negligible differences in terms of race/ethnicity, as the test was designed to minimize the impact of an individual’s background, with the goal of generalizing to diverse populations.

First, MI was explored within the U.S. subsamples of the CAARS 2–Short Total Sample. Due to the smaller sample sizes for Black individuals on the CAARS 2–Short forms (Self-Report N = 186; Observer N = 199), Hispanic individuals (Self-Report N = 229; Observer N = 232) and Black individuals were combined into a larger group (Self-Report N = 415; Observer N = 431), and the combined group was compared to White individuals (Self-Report N = 1,311; Observer N = 1,021) for MI analysis. It should be acknowledged that combining the groups in this way, while necessitated by small samples sizes, may nonetheless limit interpretability of results; It is important for future analyses to explore White vs. Hispanic and White vs. Black comparisons in larger samples to confirm the pattern of results presented within this section.

Table 11.29 presents the MI results for Self-Report and Observer, respectively. For Self-Report, there were no significant decreases in model fit (in particular, the Satorra-Bentler χ2 test) when comparing the increasingly strict models tested at each subsequent level. For Observer, although the comparison between strong (third level of testing) and strict models (fourth level of testing) was significantly different when examined with the Satorra-Bentler χ2 test, no other fit statistics showed any meaningful decreases in model fit, indicating that strong invariance is met. This evidence supports the CAARS 2–Short’s ability to measure the construct with the same structure for White individuals as it does for a combined sample of Black and Hispanic individuals.

Click to expand

Table 11.29. Measurement Invariance by U.S. Race/Ethnicity (Hispanic/Black vs. White): CAARS 2–Short

Form Model χ2 df RMSEA CFI TLI SRMR Comparison Satorra-Bentler χ2 df ∆ CFI
Self-Report Configural 3602.11*** 1238 .047 .973 .971 .043 --
Weak 3635.97*** 1273 .046 .973 .972 .043 configural vs. weak 36.25 35 .000
Strong 3607.36*** 1305 .045 .974 .974 .043 weak vs. strong 56.78 32 .001
Strict 3546.37*** 1337 .044 .975 .975 .043 strong vs. strict 48.65* 32 .001
Observer Configural 474.66*** 1238 .058 .948 .944 .051 --
Weak 4754.45*** 1268 .057 .949 .946 .051 configural vs. weak 26.37 30 .001
Strong 430.51*** 1300 .052 .956 .955 .051 weak vs. strong 39.70 32 .007
Strict 4251.14*** 1332 .051 .957 .957 .051 strong vs. strict 7.99*** 32 .001
Note. N = 415 Black and Hispanic individuals; N = 1,331 White individuals for Self-Report. N = 431 Black and Hispanic individuals; N = 1,280 White individuals for Observer. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.

DTF analyses were examined next, to explore the invariance of the CAARS 2–Short across race/ethnicity, specifically between Hispanic and White individuals and between Black and White individuals (note that Black and Hispanic groups were not directly compared, as the investigation here was more concerned with comparisons of historically marginalized groups to a majority group; instead, analyses permitted separating Black and Hispanic groups apart for DTF, rather than combining them as had been done for MI). The effect size of the DTF statistics measured by the ETSSD are summarized in Table 11.30. The largest effect size across Self-Report and Observer, when comparing either Black or Hispanic to White individuals, was ETSSD = |0.12|. The differences between groups are negligible, demonstrating a lack of measurement bias between White, Black, and Hispanic groups and reinforcing the generalizability of the CAARS 2.

Click to expand

Table 11.30. Differential Test Functioning Effect Sizes by U.S. Race/Ethnicity

CAARS 2–Short Scale Hispanic/​White Comparisons Black/​White Comparisons
Self-Report Observer Self-Report Observer
Inattention/​Executive Dysfunction -.02 -.01 -.01 -.02
Hyperactivity -.02 .02 -.02 .00
Impulsivity -.05 -.03 -.09 -.07
Emotional Dysregulation -.01 .02 -.02 -.03
Negative Self-Concept -.01 -.03 -.12 -.07
Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that Black or Hispanic individuals received higher scores than White individuals who had the same level of the construct being measured.

To investigate racial/ethnic groups in terms of their mean score differences, subsamples of the U.S. portion of the Normative Sample of Hispanic, Black, and Asian individuals, respectively, were compared, respectively, to a corresponding subsample of White individuals from the Normative Sample that were matched on gender, education level (EL), language(s) spoken, clinical status, and age. The demographic characteristics of the rated individuals in the matched samples (and their raters, where applicable) are presented in appendix J.

Comparisons between the matched Hispanic and White samples, Black and White samples, and Asian and White samples were analyzed via a series of ANOVAs, and results are presented in Tables 11.31 to 11.36. These tables present the means and standard deviations of the CAARS 2 scale scores, along with the significance tests and effect sizes.

When comparing ratings of Hispanic and White individuals, Observer results indicated there were no statistically significant differences observed across all scales, with negligible to small effect sizes between groups (median Cohen’s d = |0.08|). For Self-Report, a statistically significant difference was observed between Hispanic and White individuals on the Negative Self-Concept scale, with a small effect size (Cohen’s d = 0.43), wherein White individuals endorsed slightly more symptoms or slightly greater severity than Hispanic individuals.

When comparing ratings of Black and White individuals, Observer results indicated there were no statistically significant differences across all scales, with negligible to small effects (median Cohen’s d = |0.07|). For Self-Report, there were statistically significant differences for the Hyperactivity, Impulsivity, and Negative Self-Concept scales. Lower scores were observed for Black individuals than White individuals, with small to medium effect sizes for the significant differences (median Cohen’s d = |0.32|), indicating that White individuals endorsed more symptoms or greater severity than Black individuals.

When comparing ratings of Asian and White individuals across both forms, results indicated that there were no statistically significant effects observed across all scales. Cohen’s d effect sizes, capturing the size of the difference between group means, demonstrated negligible to small effects (Cohen’s d = |0.04| to |0.41|).

Overall, there were small differences, the majority of which were not significant, observed between White and Hispanic, Black, and Asian individuals on the CAARS 2–Short. Taken together, results from the MI, DTF, and mean group difference analyses indicate psychometric equivalence for Hispanic and White individuals, Black and White individuals, and Asian and White individuals. Together with the absence of evidence for measurement bias, there is support for equity in terms of race/ethnic groups for the CAARS 2–Short and its appropriate use in racially and ethnically diverse populations.

Click to expand

Table 11.31. Group Difference by U.S. Race/Ethnicity (White vs. Hispanic): CAARS 2–Short Self-Report

CAARS 2–Short Scale White
(N = 93)
Hispanic
(N = 93)
Cohen's d F
(1, 184)
p η2
Inattention/​Executive Dysfunction M 51.8 48.8 0.30 4.19 .042 .02
SD 11.5 8.3
Hyperactivity M 51.6 48.4 0.34 5.19 .024 .03
SD 10.8 8.3
Impulsivity M 51.1 47.8 0.36 6.09 .014 .03
SD 10.4 7.4
Emotional Dysregulation M 50.8 47.9 0.31 4.30 .039 .02
SD 10.5 8.5
Negative Self-Concept M 52.6 48.4 0.43 8.35 .004 .04
SD 11.1 8.2
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Hispanic individuals.
Click to expand

Table 11.32. Group Difference by U.S. Race/Ethnicity (White vs. Hispanic): CAARS 2–Short Observer

CAARS 2–Short Scale White
(N = 105)
Hispanic
(N = 105)
Cohen's d F
(1, 204)
p η2
Inattention/​Executive Dysfunction M 47.9 49.5 -0.19 1.79 .182 .01
SD 8.4 8.5
Hyperactivity M 47.7 49.9 -0.24 2.91 .089 .01
SD 8.9 10.1
Impulsivity M 49.1 49.2 -0.01 0.01 .918 .00
SD 9.2 8.4
Emotional Dysregulation M 49.1 48.4 0.08 0.32 .571 .00
SD 9.3 7.9
Negative Self-Concept M 48.9 48.8 0.02 0.01 .908 .00
SD 8.4 7.7
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Hispanic individuals.
Click to expand

Table 11.33. Group Difference by U.S. Race/Ethnicity (White vs. Black): CAARS 2–Short Self-Report

CAARS 2–Short Scale White
(N = 138)
Black
(N = 138)
Cohen's d F
(1, 274)
p η2
Inattention/​Executive Dysfunction M 51.1 48.2 0.29 5.82 .016 .02
SD 10.6 9.0
Hyperactivity M 51.2 47.8 0.36 9.01 .003 .03
SD 10.7 8.1
Impulsivity M 50.7 47.7 0.32 7.00 .009 .02
SD 10.5 8.3
Emotional Dysregulation M 50.7 48.6 0.21 3.04 .083 .01
SD 10.3 9.7
Negative Self-Concept M 53.0 45.8 0.71 34.33 < .001 .11
SD 11.1 9.4
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Black individuals.
Click to expand

Table 11.34. Group Difference by U.S. Race/Ethnicity (White vs. Black): CAARS 2–Short Observer

CAARS 2–Short Scale White
(N = 152)
Black
(N = 152)
Cohen's d F
(1, 302)
p η2
Inattention/​Executive Dysfunction M 49.0 49.4 -0.04 0.11 .736 .00
SD 9.7 10.6
Hyperactivity M 48.7 48.8 -0.01 0.01 .935 .00
SD 9.2 9.7
Impulsivity M 49.7 49.0 0.07 0.33 .564 .00
SD 9.3 10.9
Emotional Dysregulation M 50.9 49.2 0.17 2.07 .151 .01
SD 10.0 11.4
Negative Self-Concept M 49.1 47.5 0.17 2.25 .135 .01
SD 10.2 9.0
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Black individuals.
Click to expand

Table 11.35. Group Difference by U.S. Race/Ethnicity (White vs. Asian): CAARS 2–Short Self-Report

CAARS 2–Short Scale White
(N = 52)
Asian
(N = 52)
Cohen's d F
(1, 102)
p η2
Inattention/​Executive Dysfunction M 50.4 49.6 0.08 0.18 .673 .00
SD 10.4 10.4
Hyperactivity M 51.2 49.4 0.17 0.72 .399 .01
SD 11.2 10.9
Impulsivity M 50.1 50.5 -0.04 0.04 .836 .00
SD 10.4 11.0
Emotional Dysregulation M 50.5 48.8 0.17 0.77 .383 .01
SD 10.1 9.5
Negative Self-Concept M 51.9 48.7 0.31 2.41 .124 .02
SD 11.5 9.1
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Asian individuals.
Click to expand

Table 11.36. Group Difference by U.S. Race/Ethnicity (White vs. Asian): CAARS 2–Short Observer

CAARS 2–Short Scale White
(N = 35)
Asian
(N = 35)
Cohen's d F
(1, 68)
p η2
Inattention/​Executive Dysfunction M 46.8 47.6 -0.10 0.18 .675 .00
SD 7.7 8.0
Hyperactivity M 46.4 49.8 -0.41 2.85 .096 .04
SD 6.0 10.1
Impulsivity M 46.6 49.9 -0.40 2.69 .106 .04
SD 6.7 9.6
Emotional Dysregulation M 47.6 49.2 -0.17 0.51 .477 .01
SD 9.8 9.2
Negative Self-Concept M 50.5 48.2 0.25 1.08 .302 .02
SD 11.6 6.2
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for White individuals than Asian individuals.

Country of Residence

To address equivalence of scores across different countries, ratings of individuals in the U.S. and Canada were compared on the CAARS 2–Short. Cross-cultural differences were expected to be minimal, and the lack of meaningful differences would support the generalizability and utility of the CAARS 2–Short in both countries.

The invariance of the factor structure was compared on the individual’s country of residence. Results examining MI between the U.S. (Self-Report N = 1,881; Observer N = 1,821 for Observer) and Canada (Self-Report N = 344; Observer N = 329) are found in Table 11.37.

Across the CAARS 2–Short Self-Report and Observer scales, there were no statistically significant reductions in the Satorra-Bentler χ2 test or any declines in model fit statistics when comparing different levels of invariance. Because the CAARS 2–Short meets the most stringent level of invariance tested in terms of country of residence, these results support generalizability of the CAARS 2–Short Self-Report and Observer forms to individuals who live in the U.S. or Canadal. Next, DTF was evaluated with regard to country of residence. Effect sizes of the DTF analyses for both CAARS 2–Short forms are presented in Table 11.38. Negligible differences between countries were found (i.e., ETSSD ≤ |.06|), which further supports the generalizability of the CAARS 2–Short to U.S. and Canadian populations alike.

Click to expand
Click to expand

Table 11.38. Differential Test Functioning Effect Sizes by Country of Residence

CAARS 2–Short Scale Self-Report Observer
Inattention/​Executive Dysfunction -.01 -.01
Hyperactivity -.03 .01
Impulsivity -.01 .00
Emotional Dysregulation .03 -.01
Negative Self-Concept .06 -.03
Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size=0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that Canadians received higher scores than Americans who had the same level of the construct being measured.

To examine group differences between countries, a subsample of the individuals from the U.S. portion of the Normative Sample was randomly selected to match a sample of Canadian individuals from the Normative Sample in terms of gender, education level (EL), language(s) spoken, clinical status, and age (Self-Report N = 172 and Observer N = 162). Refer to appendix J for the demographic characteristics of the samples.

The paired U.S. and Canadian samples were then compared for significant differences across mean scores. Results of the ANOVAs and descriptive statistics for each scale are presented in Table 11.39 and Table 11.40. Across both rater forms and all scales, there were no statistically significant effects of country of residence. Ratings of individuals from the U.S. and Canada resulted in very similar mean scores as described by Cohen’s d; all effect sizes were negligible to small (median Cohen’s d = 0.19 and 0.07 for Self-Report and Observer, respectively). These results indicate that country of residence (specifically, U.S. vs. Canada) had no significant effect on the CAARS 2–Short scales scores.

Taken together, results from the MI, DTF, and mean group difference analyses indicate psychometric equivalence between individuals from the U.S. or Canada alike on the CAARS 2–Short scales. There was no evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the groups, and scores were not meaningfully different.

Click to expand

Table 11.39. Group Differences by Country of Residence: CAARS 2–Short Self-Report

CAARS 2–Short Scale U.S.
(N = 86)
Canada
(N = 86)
Cohen's d F
(1, 170)
p η2
Inattention/​Executive Dysfunction M 50.9 49.2 0.19 1.54 .216 .01
SD 10.3 7.7
Hyperactivity M 51.2 47.6 0.39 6.40 .012 .04
SD 11.2 6.9
Impulsivity M 51.2 48.8 0.27 3.22 .075 .02
SD 10.0 8.1
Emotional Dysregulation M 49.7 49.5 0.03 0.03 .855 .00
SD 8.5 9.1
Negative Self-Concept M 51.8 50.2 0.17 1.18 .279 .01
SD 10.4 9.1
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for individuals from the U.S. than individuals from Canada.
Click to expand

Table 11.40. Group Differences by Country of Residence: CAARS 2–Short Observer

CAARS 2–Short Scale U.S.
(N = 81)
Canada
(N = 81)
Cohen's d F
(1, 160)
p η2
Inattention/​Executive Dysfunction M 49.5 48.7 0.10 0.39 .535 .00
SD 9.9 8.3
Hyperactivity M 48.5 48.4 0.02 0.01 .907 .00
SD 8.8 8.0
Impulsivity M 49.2 48.3 0.10 0.43 .514 .00
SD 8.3 8.9
Emotional Dysregulation M 49.5 49.0 0.06 0.13 .717 .00
SD 9.7 8.1
Negative Self-Concept M 49.8 49.1 0.07 0.19 .660 .00
SD 10.4 8.5
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates that scores were higher for individuals from the U.S. than individuals from Canada.

Education Level

An individual’s education level (EL) can sometimes be considered a proxy for or a contributing factor to one’s socioeconomic status (SES), which is among the sociocultural variables that may influence the fairness of a test. It was expected that the constructs measured on the CAARS 2–Short would be independent of influence from EL. To test this hypothesis and ensure generalizability of scores from the CAARS 2–Short Content Scales, individuals in the Self-Report and Observer samples reported the EL of the rated individual using one of five options: No high school diploma (EL 1), High school diploma/GED (EL 2), Some college/university or associate degree (EL 3), Bachelor’s degree (EL 4), or Graduate or professional degree (EL 5; more information about the representativeness of these groups can be found in Education Level in chapter 7, Standardization). For the sake of invariance analyses, EL was re-categorized into two groups comprising individuals with and without post-secondary education (i.e., Group 1 consists of EL 1 and EL 2: N = 1,515 for Self-Report and N = 1,134 for Observer; Group 2 consists of EL 3, EL 4, and EL 5: N = 710 for Self-Report and N = 796 for Observer).

First, differences in the factor structure based on EL groups were evaluated with MI. With more stringent models tested at each level, neither the CAARS 2–Short Self-Report nor the CAARS 2–Short Observer displayed meaningful deterioration in model fit (see Table 11.41). For Self-Report, some comparisons were significant using the Satorra-Bentler χ2 test (e.g., the loading versus intercept model comparison; p < .001); however, the indicators must be considered together, and no other model fit statistics indicated meaningful change. Therefore, the observed change in model fit is minor and not meaningful, such that invariance between the EL groups on the construct assessed by the CAARS 2–Short can reasonably be assumed. These results support the invariance of the CAARS 2–Short across factor structure, thresholds, loadings, and intercepts between individuals with and without post-secondary education, meeting the first-step criteria for establishing its unbiased and generalizable use across these populations.

Click to expand

Table 11.41. Measurement Invariance by Education Level: CAARS 2–Short

Form Model χ2 df RMSEA CFI TLI SRMR Comparison Satorra-Bentler χ2 df ∆ CFI
Self-Report Configural 4755.71*** 1238 .051 .970 .968 .041 --
Weak 4796.63*** 1273 .050 .970 .969 .041 configural vs. weak 34.98 35 .000
Strong 4688.43*** 1305 .048 .971 .970 .041 weak vs. strong 38.16 32 .001
Strict 4688.11*** 1337 .047 .971 .971 .041 strong vs. strict 85.94*** 32 .000
Observer Configural 5556.52*** 1238 .057 .949 .946 .048 --
Weak 5596.72*** 1268 .056 .949 .947 .048 configural vs. weak 37.16 30 .000
Strong 5136.21*** 1300 .052 .955 .954 .048 weak vs. strong 39.57 32 .006
Strict 5142.45*** 1332 .052 .955 .955 .048 strong vs. strict 109.40*** 32 .000
Note. N = 710 individuals with high school education or less (EL 1 and EL 2); N = 1,515 individuals with post-secondary education (EL 3, EL 4, and EL 5) for Self-Report. N = 796 individuals with high school education or less (EL 1 and EL 2); N = 1,134 individuals with post-secondary education (EL 3, EL 4, and EL 5) for Observer. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; RMSEA = Root mean square error of approximation; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.

Next, differences in the CAARS 2–Short scales’ functioning for the two broad EL groups were explored with DTF. The effect sizes from DTF analyses are summarized in Table 11.42. Results from both Self-Report and Observer show negligible differences between EL groups (e.g., maximum ETSSD = |.07|). The lack of differential functioning of the CAARS 2–Short scales between the different EL groups is further evidence of the test’s equivalence across demographic subgroups.

Click to expand

Table 11.42. Differential Test Functioning Effect Sizes by Education Level

CAARS 2–Short Scale Self-Report Observer
Inattention/​Executive Dysfunction .00 -.03
Hyperactivity .02 .03
Impulsivity .03 -.01
Emotional Dysregulation .01 .01
Negative Self-Concept .07 .01
Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that higher scores would be observed for individuals who do not have post-secondary education (EL 1 and EL 2) relative to individuals who have post-secondary education (EL 3, EL 4, and EL 5) with the same level of the construct being measured.

Next, the five EL groups were compared in terms of the observed mean score group differences. These group differences were analyzed using the entire Normative Sample. EL was compared via a series of ANCOVAs, with covariates to statistically control for the effects of other demographic factors (i.e., gender, age, race/ethnicity, primary language spoken, clinical status). Significant ANCOVA results (i.e., p < .01) were followed up with Tukey’s Honestly Significant Difference (HSD) post-hoc test to evaluate pairwise comparisons, alongside estimates of effect sizes for the omnibus test and pairwise differences.

Results of the ANCOVAs for Self-Report are provided in Table 11.43a, with effect sizes of the pairwise comparisons between group means provided in Table 11.43b. Corresponding results for Observer are provided in Tables 11.44a and 11.44b. For both forms, significant differences between EL groups were only observed for the CAARS 2–Short Emotional Dysregulation scale, but the size of these effects was negligible (partial η2 = .01 for both forms). The post-hoc analysis and pairwise comparisons revealed that ratings of individuals from the EL 1 and EL 4 groups resulted in statistically significantly different mean scores, but effect sizes were small (Cohen’s d = 0.32 for Self-Report and 0.27 for Observer). This effect manifests as a difference of approximately 3 T-score points higher for individuals in the EL 1 group on the CAARS 2–Short Emotional Dysregulation scale.

Overall, these results support the absence of meaningful differences in the measurement properties of the test across low and high EL groups. Taken together, results from the MI, DTF, and mean group difference analyses indicate that the CAARS 2–Short can generalize across EL groups for the Content Scales. There was no strong evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the two groups, and scores were not meaningfully different, supporting the unbiased use of the CAARS 2–Short for individuals with a range of educational backgrounds and levels.

Click to expand

Table 11.43a. Group Differences by Education Level: CAARS 2–Short Self-Report

CAARS 2–Short Scale EL 1
(N = 127)
EL 2
(N = 378)
EL 3
(N = 385)
EL 4
(N = 281)
EL 5
(N = 149)
F
(4, 1298)
p Partial η2
Inattention/​Executive Dysfunction EMM 56.8 55.2 56.1 55.7 55.8 0.93 .448 .00
SD 11.8 16.0 15.0 13.5 11.3
Hyperactivity EMM 55.3 52.8 53.9 52.8 52.8 2.28 .059 .01
SD 12.2 16.6 15.6 14.0 11.7
Impulsivity EMM 55.9 53.0 53.9 53.8 53.8 2.16 .072 .01
SD 12.2 16.6 15.6 14.0 11.7
Emotional Dysregulation EMM 57.2 a 54.1 a,b 54.0 a,b 52.9 b 53.6 a,b 4.47 .001 .01
SD 12.3 16.8 15.7 14.1 11.9
Negative Self-Concept EMM 55.0 54.2 54.8 55.2 54.3 0.51 .726 .00
SD 12.0 16.3 15.3 13.8 11.5
Note. EMM = estimated marginal means. EL = Education level; EL 1 = No high school diploma; EL 2 = High school diploma/GED; EL 3 = Some college or associate degree; EL 4 = Bachelor's degree; EL 5 = Graduate or professional degree. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. EMMs without a common superscript letter differ (p < .01) as per Tukey's HSD post-hoc tests; values with common superscript letters are not significantly different.
Click to expand

Table 11.43b. Group Differences by Education Level: CAARS 2–Short Self-Report Effect Sizes

CAARS 2–Short Scale EL 1
vs.
EL 2
EL 1
vs.
EL 3
EL 1
vs.
EL 4
EL 1
vs.
EL 5
EL 2
vs.
EL 3
EL 2
vs.
EL 4
EL 2
vs.
EL 5
EL 3
vs.
EL 4
EL 3
vs.
EL 5
EL 4
vs.
EL 5
Inattention/​Executive Dysfunction 0.11 0.04 0.09 0.08 -0.06 -0.03 -0.04 0.03 0.02 -0.01
Hyperactivity 0.16 0.09 0.19 0.21 -0.07 0.00 0.00 0.08 0.07 0.00
Impulsivity 0.18 0.14 0.15 0.18 -0.05 -0.05 -0.05 0.00 0.01 0.00
Emotional Dysregulation 0.20 0.22 0.32 0.30 0.01 0.08 0.04 0.07 0.03 -0.05
Negative Self-Concept 0.05 0.01 -0.01 0.06 -0.03 -0.06 -0.01 -0.03 0.03 0.07
Note. EL = Education level; EL 1 = No high school diploma; EL 2 = High school diploma/GED; EL 3 = Some college or associate degree; EL 4 = Bachelor's degree; EL 5 = Graduate or professional degree. Values reported are Cohen's d effect size estimates; Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80.
Click to expand

Table 11.44a. Mean Group Differences by Education Level: CAARS 2–Short Observer

CAARS 2–Short Scale EL 1
(N = 130)
EL 2
(N = 386)
EL 3
(N = 380)
EL 4
(N = 268)
EL 5
(N = 156)
F
(4, 1303)
p Partial η2
Inattention/​Executive Dysfunction EMM 56.2 53.4 53.3 52.9 53.3 2.64 .032 .01
SD 12.6 17.1 15.9 14.2 11.9
Hyperactivity EMM 53.7 51.7 51.7 51.5 52.2 1.26 .284 .00
SD 12.9 17.5 16.3 14.5 12.2
Impulsivity EMM 53.7 50.8 50.6 50.3 51.5 2.87 .022 .01
SD 12.8 17.4 16.2 14.5 12.2
Emotional Dysregulation EMM 54.9a 52.0 a,b 51.7 a,b 51.1 b 52.7 a,b 3.44 .008 .01
SD 12.9 17.5 16.3 14.5 12.2
Negative Self-Concept EMM 56.3 53.7 54.9 53.9 54.4 2.30 .056 .01
SD 12.2 16.6 15.4 13.7 11.5
Note. EMM = estimated marginal means. EL = Education level; EL 1 = No high school diploma; EL 2 = High school diploma/GED; EL 3 = Some college or associate degree; EL 4 = Bachelor's degree; EL 5 = Graduate or professional degree. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .059; medium effect size = .06 to .13; large effect size ≥ .14. EMMs without a common superscript letter differ (p < .01) as per Tukey's HSD post-hoc tests; values with common superscript letters are not significantly different.
Click to expand

Table 11.44b. Effect Sizes of Mean Group Differences by Education Level: CAARS 2–Short Observer Effect Sizes

CAARS 2–Short Scale EL 1
vs.
EL 2
EL 1
vs.
EL 3
EL 1
vs.
EL 4
EL 1
vs.
EL 5
EL 2
vs.
EL 3
EL 2
vs.
EL 4
EL 2
vs.
EL 5
EL 3
vs.
EL 4
EL 3
vs.
EL 5
EL 4
vs.
EL 5
Inattention/​Executive Dysfunction 0.17 0.19 0.24 0.24 0.01 0.03 0.01 0.02 0.00 -0.02
Hyperactivity 0.12 0.13 0.16 0.12 0.00 0.01 -0.03 0.02 -0.03 -0.05
Impulsivity 0.18 0.20 0.24 0.18 0.01 0.03 -0.04 0.02 -0.06 -0.09
Emotional Dysregulation 0.18 0.21 0.27 0.18 0.02 0.06 -0.04 0.04 -0.06 -0.12
Negative Self-Concept 0.17 0.10 0.18 0.16 -0.08 -0.01 -0.05 0.07 0.03 -0.04
Note. EL = Education level; EL 1 = No high school diploma; EL 2 = High school diploma/GED; EL 3 = Some college or associate degree; EL 4 = Bachelor's degree; EL 5 = Graduate or professional degree. Values presented are Cohen's d effect size estimates; Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80.
< Back Next >