Manual

CAARS 2 Manual

Chapter 12: Fairness


Fairness

To provide evidence that the CAARS 2–ADHD Index provides a fair and unbiased measurement for diverse populations, differences were examined between demographic groups. Measurement invariance was explored via differential test functioning (DTF), with a visual inspection of test characteristic curves for each group (see appendix M for details about the methodology). A summary of corresponding effect sizes for the differences in test characteristic curves is provided throughout this section, presented as the estimated test score standardized difference (ETSSD). Given the multidimensional nature of ADHD (as evidenced by the factor structure presented in Internal Structure in chapter 9, Validity) and that the CAARS 2–ADHD Index comprises items from multiple content domains, measurement invariance testing with nested confirmatory factor analyses (CFAs) was not practical, as the CAARS 2–ADHD Index was not designed to mirror the full multidimensional structure of the CAARS 2. The items on the CAARS 2–ADHD Index are indicators that together produce a unidimensional measure of ADHD symptoms and a corresponding score that can reflect one’s group membership (i.e., likelihood of resembling scores from the ADHD Reference Sample); therefore, DTF alone was a sufficient method of evaluating invariance. Prior to conducting the DTF analyses, CFAs were conducted to ensure the statistical requirement for unidimensionality was met. Results supported a 1-factor model for both the Self-Report (CFI = .986, TLI = .983, RMSEA = .075, SRMR = .037) and Observer (CFI = .964, TLI = .957, RMSEA = .109, SRMR = .058). Because invariance testing relies on modeled data (i.e., estimating the population, rather than describing the sample), larger sample sizes are required and a greater range of responses is desirable. Therefore, the Total Sample (as described in the Standardization Phase in chapter 6, Development) is used for these invariance analyses, because it includes a considerable number of individuals from the general population, as well as individuals with clinical diagnoses (which will extend the variability of responses).

To examine the generalizability of the obtained scores, the effects of demographic group membership were analyzed by a comparison of effect sizes between response distributions of the groups. Group differences were calculated on subsets of the Normative Sample (see appendix J for demographic characteristics of the samples used in this section).

Gender

The invariance between males and females for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form. In these comparisons, a positive estimated test score standardized difference (ETSSD) value denotes that females would receive a higher score when males and females were equal with respect to ADHD-related symptoms, while negative values indicate females would receive a lower score than males. The effect size of the DTF statistic measured by the ETSSD was -0.01 for both Self-Report and Observer. Effects of this magnitude are all trivial in size. These results from the DTF analyses indicate that the CAARS 2–ADHD Index demonstrates psychometric equivalence across males and females, as there was no evidence for meaningful differences in test functioning between the two groups.

To further explore potential gender differences, observed group differences between scores on the CAARS 2–ADHD Index were also investigated. These group differences were analyzed using a matched sample (matched by age, race/ethnicity, education level, language[s] spoken, and clinical status; refer to appendix J for demographic characteristics of the matched samples). Differences are presented as Cliff’s d values, comparing the frequency of each probability score category within each gender group. Results of these analyses are presented in Table 12.18 for both forms. Effect sizes were negligible for both comparisons (Cliff’s d = -.04 for Self-Report and Cliff’s d = .00 for Observer).

Overall, the very small differences observed between gender groups add support for the generalizable use of the CAARS 2–ADHD Index across males and females. These findings, together with the DTF results, provide evidence for the equitable measurement by gender when using the CAARS 2–ADHD Index.

Click to expand

Table 12.18. Group Differences by Gender: CAARS 2–ADHD Index

Form Gender Percentage of Normative Samples in Probability Score Category Cliff's d
Very Low Low Borderline High Very High
Self-Report Male 81.2 6.5 4.1 1.5 6.7 -.04
Female 77.3 8.2 1.7 3.7 9.1
Observer Male 72.5 11.0 3.8 5.9 6.8 .00
Female 72.5 9.9 5.4 5.9 6.3
Note. For Self-Report, N = 463 males and N = 463 females. For Observer, N = 444 males and N = 444 females. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for males than females.

Race/Ethnicity

Invariance between Hispanic and White individuals and between Black and White individuals for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form (the sample size for Asian individuals was too small to permit DTF). When comparing Hispanic and White individuals, the ETSSD effect size was -0.04 for Self-Report and 0.00 for Observer. When comparing Black and White individuals, effect sizes were 0.05 for Self-Report and 0.06 for Observer. All effects were trivial in size, indicating no evidence of DTF by race/ethnicity. These results demonstrate a lack of race/ethnicity-based measurement bias between White, Black, and Hispanic groups, reinforcing the generalizability of the CAARS 2–ADHD Index by demonstrating psychometric equivalence for White, Hispanic, and Black individuals.

To further explore potential race/ethnicity differences, observed group differences between scores on the CAARS 2–ADHD Index were also investigated. These group differences were analyzed using matched samples from the Normative Sample (matched on gender, language[s] spoken, clinical status, and EL; refer to appendix J for demographic characteristics of the matched samples). Results of these analyses for all forms are presented in Table 12.19. Effects were negligible when comparing ratings of White and Black, White and Hispanic, and White and Asian individuals (Cliff’s d ranging from -.10 to -.14 for Self-Report and .00 to .08 for Observer).

Overall, the very small differences observed between race/ethnicity groups add support for the fair and generalizable use of the CAARS 2–ADHD Index across these populations, and, together with the DTF results, provide evidence for equitable measurement when using the CAARS 2–ADHD Index.

Click to expand

Table 12.19. Group Differences by U.S. Race/Ethnicity: CAARS 2–ADHD Index

Form Comparison Group Race/Ethnicity of Individual Being Rated Percentage of Matched Samples in Probability Score Category Cliff's d
Very Low Low Borderline High Very High
Self-Report White vs. Black White 71.0 9.4 4.3 3.6 11.6 -.11
Black 81.2 7.2 1.4 4.3 5.8
White vs. Hispanic White 73.1 7.5 2.2 5.4 11.8 -.10
Hispanic 82.8 7.5 2.2 1.1 6.5
White vs. Asian White 71.2 11.5 3.8 5.8 7.7 -.14
Asian 86.5 3.8 0.0 1.9 7.7
Observer White vs. Black White 73.7 10.5 3.9 5.3 6.6 .00
Black 74.3 8.6 4.6 5.3 7.2
White vs. Hispanic White 80.6 7.8 2.9 1.9 6.8 .08
Hispanic 71.8 11.7 4.9 5.8 5.8
White vs. Asian White 80.0 5.7 5.7 5.7 2.9 .01
Asian 80.0 5.7 0.0 11.4 2.9
Note. For Self-Report matched samples, N = 138 per group for White vs. Black, N = 93 per group for White vs. Hispanic, and N = 52 per group for White vs. Asian. For Observer matched samples, N = 152 per group for White vs. Black, N = 103 per group for White vs. Hispanic, and N = 35 per group for White vs. Asian. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates higher score for the first listed race/ethnic group than the second group.

Country of Residence

To address equivalence of scores across countries, individuals in the U.S. and Canada were compared on the CAARS 2–ADHD Index. Cross-cultural differences were expected to be minimal, and the lack of meaningful differences would support the generalizability and utility of the CAARS 2–ADHD Index for use in both the U.S. and Canada.

The invariance by country of residence for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form. The effect size of the DTF statistic was negligible for both forms (ETSSD = -0.01 for Self-Report and 0.00 for Observer; negative ETSSD values indicate higher expected scores for Americans than Canadians even when matched on the construct being measured). Trivial differences were found (effect sizes at or below |.01|), which demonstrates the invariance of the CAARS 2–ADHD Index across countries and supports its generalizability to U.S. and Canadian populations.

To further explore potential country of residence differences, the CAARS 2–ADHD Index was also investigated in terms of observed group differences between scores. These group differences were analyzed using a matched sample (matched on gender, language[s] spoken, clinical status, and EL; refer to appendix J for demographic characteristics of the matched samples). Group differences were analyzed via Cliff’s d effect size values, which compared the two countries in terms of the percentage of the sample scoring within each category of the CAARS 2–ADHD Index probability score. Results of these analyses are presented in Table 12.20 for both forms. Effect sizes were negligible for all comparisons (Cliff’s d = .05 for Self-Report and Cliff’s d = .05 for Observer).

Overall, the very small differences observed between the country of residence groups add support for the fair and generalizable use of the CAARS 2–ADHD Index with individuals from the U.S. and Canada, and together with the DTF results, provide evidence for equitable measurement of individuals from the U.S. and Canada, using the CAARS 2-ADHD Index.

Click to expand

Table 12.20. Group Differences by Country of Residence: CAARS 2–ADHD Index

Form Country of Residence Percentage of Matched Samples in Probability Score Category Cliff's d
Very Low Low Borderline High Very High
Self-Report Canada 80.2 14.0 1.2 1.2 3.5 .05
U.S. 76.7 7.0 2.3 3.5 10.5
Observer Canada 77.8 14.8 2.5 2.5 2.5 .05
U.S. 74.1 11.1 4.9 4.9 4.9
Note. For Self-Report matched sample, N = 86 per group, and for Observer matched sample, N = 81 per group. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for individuals from the U.S. than individuals from Canada.

Education Level

Education level (EL) can sometimes be considered a proxy for, or a contributing factor to, one’s socioeconomic status (SES), which is among the sociocultural variables that may influence the fairness of a test. It was expected that the constructs measured on the CAARS 2 would be independent of influence from EL. To test this hypothesis and ensure generalizability of scores from the CAARS 2–ADHD Index, individuals in the Self-Report and Observer samples reported the EL of the rated individual from one of five options: No high school diploma (EL 1), High school diploma/GED (EL 2), Some college or associate degree (EL 3), Bachelor’s degree (EL 4), and Graduate or professional degree (EL 5). Within the normative samples, the proportion of individuals in each of these groups matched recent U.S. and Canadian census values (more information can be found in Education Level in chapter 7, Standardization).

Equivalence across the EL groups was investigated with DTF analyses using the Total Sample for the Self-Report and Observer. For the sake of DTF analyses, which require binary variables, EL was re-categorized into two groups: individuals without post-secondary education (EL 1 and EL 2; N = 505 for Self-Report and N = 515 for Observer) and individuals with post-secondary education (EL 3, EL 4, and EL 5; N = 815 for Self-Report and N = 804 for Observer). The DTF analysis revealed trivial effect sizes (ETSSD = 0.02 for Self-Report and 0.06 for Observer). The very small values observed in this analysis support the invariance of the CAARS 2–ADHD Index with regard to the educational background of the individual being rated.

To further explore potential EL differences, the CAARS 2–ADHD Index was also investigated for potential group differences in terms of the frequency distribution of scores for each group. These group differences were analyzed using the entire Normative Sample. Although previous sections of this chapter looked at group differences based on matched subsamples of the Normative Sample, it was not possible to create demographically matched groups for the EL analysis due to small sample sizes of the EL groups. Given the lack of differences found for other demographic variables (as evidenced by results earlier in this section), results for the EL analysis are unlikely to be influenced by the inclusion of covariates. Differences are presented as Cliff’s d values, which compare the percentage of the sample scoring within each category of the CAARS 2–ADHD Index probability score between the EL groups. Results of this analysis are presented in Table 12.21a and Table 12.21b for Self-Report and Observer. Effect sizes were negligible for all of the comparisons (Cliff’s d ranges from -.06 to .05 for Self-Report and -.02 to .08 for Observer).

Overall, the very small differences observed between EL groups demonstrate a lack of influence of education level on the CAARS 2–ADHD Index. This finding adds support for the fair and generalizable use of the CAARS 2–ADHD Index, and together with the DTF results, provides evidence for equitable measurement across EL groups when using the CAARS 2–ADHD Index.

Click to expand

Table 12.21a. Group Differences by Education Level: CAARS 2–ADHD Index

Form Education Level Percentage of Normative Samples in Probability Score Category
Very Low Low Borderline High Very High
Self-Report EL 1 74.8 10.2 4.7 3.9 6.3
EL 2 80.2 9.8 1.6 2.9 5.6
EL 3 76.6 7.0 1.8 2.9 11.7
EL 4 75.8 8.5 2.8 3.2 9.6
EL 5 78.5 3.4 4.7 3.4 10.1
Observer EL 1 66.2 16.2 1.5 9.2 6.9
EL 2 74.1 9.3 4.4 4.1 8.0
EL 3 69.5 11.1 4.2 7.4 7.9
EL 4 71.6 10.4 4.1 6.3 7.5
EL 5 67.3 10.3 7.1 7.7 7.7
Note. EL = Education level; EL 1 = No high school diploma (N = 127 for Self-Report and 130 for Observer); EL 2 = High school diploma/GED (N = 378 for Self-Report and 386 for Observer); EL 3 = Some college or associate degree (N = 385 for Self-Report and 380 for Observer); EL 4 = Bachelor's degree (N = 281 for Self-Report and 268 for Observer); EL 5 = Graduate or professional degree (N = 149 for Self-Report and 156 for Observer).
Click to expand

Table 12.21b. Group Differences Effect Sizes by Education Level: CAARS 2–ADHD Index

Form EL1 vs. EL2 EL1 vs. EL3 EL1 vs. EL4 EL1 vs. EL5 EL2 vs. EL3 EL2 vs. EL4 EL2 vs. EL5 EL3 vs. EL4 EL3 vs. EL5 EL4 vs. EL5
Self-Report -.06 .00 .00 -.02 -.05 .05 .03 .00 -.02 .02
Observer -.02 .00 .08 .01 .06 .02 .00 .00 .00 .00
Note. EL = Education level; EL 1 = No high school diploma (N = 127 for Self-Report and 130 for Observer); EL 2 = High school diploma/GED (N = 378 for Self-Report and 386 for Observer); EL 3 = Some college or associate degree (N = 385 for Self-Report and 380 for Observer); EL 4 = Bachelor's degree (N = 281 for Self-Report and 268 for Observer); EL 5 = Graduate or professional degree (N = 149 for Self-Report and 156 for Observer). Values presented are Cliff's d. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for the group listed first in the heading than the group listed second.
< Back Next >