Chapter 12: CAARS 2–ADHD Index

Manual

CAARS 2 Manual

Chapter 12: Fairness

Fairness

view all chapter tables | print this section

Gender
Race/Ethnicity
Country of Residence
Education Level

To provide evidence that the CAARS 2–ADHD Index provides a fair and unbiased measurement for diverse populations, differences were examined between demographic groups. Measurement invariance was explored via differential test functioning (DTF), with a visual inspection of test characteristic curves for each group (see appendix M for details about the methodology). A summary of corresponding effect sizes for the differences in test characteristic curves is provided throughout this section, presented as the estimated test score standardized difference (ETSSD). Given the multidimensional nature of ADHD (as evidenced by the factor structure presented in Internal Structure in chapter 9, Validity) and that the CAARS 2–ADHD Index comprises items from multiple content domains, measurement invariance testing with nested confirmatory factor analyses (CFAs) was not practical, as the CAARS 2–ADHD Index was not designed to mirror the full multidimensional structure of the CAARS 2. The items on the CAARS 2–ADHD Index are indicators that together produce a unidimensional measure of ADHD symptoms and a corresponding score that can reflect one’s group membership (i.e., likelihood of resembling scores from the ADHD Reference Sample); therefore, DTF alone was a sufficient method of evaluating invariance. Prior to conducting the DTF analyses, CFAs were conducted to ensure the statistical requirement for unidimensionality was met. Results supported a 1-factor model for both the Self-Report (CFI = .986, TLI = .983, RMSEA = .075, SRMR = .037) and Observer (CFI = .964, TLI = .957, RMSEA = .109, SRMR = .058). Because invariance testing relies on modeled data (i.e., estimating the population, rather than describing the sample), larger sample sizes are required and a greater range of responses is desirable. Therefore, the Total Sample (as described in the Standardization Phase in chapter 6, Development) is used for these invariance analyses, because it includes a considerable number of individuals from the general population, as well as individuals with clinical diagnoses (which will extend the variability of responses).

To examine the generalizability of the obtained scores, the effects of demographic group membership were analyzed by a comparison of effect sizes between response distributions of the groups. Group differences were calculated on subsets of the Normative Sample (see appendix J for demographic characteristics of the samples used in this section).

Gender

The invariance between males and females for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form. In these comparisons, a positive estimated test score standardized difference (ETSSD) value denotes that females would receive a higher score when males and females were equal with respect to ADHD-related symptoms, while negative values indicate females would receive a lower score than males. The effect size of the DTF statistic measured by the ETSSD was -0.01 for both Self-Report and Observer. Effects of this magnitude are all trivial in size. These results from the DTF analyses indicate that the CAARS 2–ADHD Index demonstrates psychometric equivalence across males and females, as there was no evidence for meaningful differences in test functioning between the two groups.

To further explore potential gender differences, observed group differences between scores on the CAARS 2–ADHD Index were also investigated. These group differences were analyzed using a matched sample (matched by age, race/ethnicity, education level, language[s] spoken, and clinical status; refer to appendix J for demographic characteristics of the matched samples). Differences are presented as Cliff’s d values, comparing the frequency of each probability score category within each gender group. Results of these analyses are presented in Table 12.18 for both forms. Effect sizes were negligible for both comparisons (Cliff’s d = -.04 for Self-Report and Cliff’s d = .00 for Observer).

Overall, the very small differences observed between gender groups add support for the generalizable use of the CAARS 2–ADHD Index across males and females. These findings, together with the DTF results, provide evidence for the equitable measurement by gender when using the CAARS 2–ADHD Index.

Click to expand

Table 12.18. Group Differences by Gender: CAARS 2–ADHD Index

Form	Gender	Percentage of Normative Samples in Probability Score Category					Cliff's d
Form	Gender	Very Low	Low	Borderline	High	Very High	Cliff's d
Self-Report	Male	81.2	6.5	4.1	1.5	6.7	-.04
Self-Report	Female	77.3	8.2	1.7	3.7	9.1	-.04
Observer	Male	72.5	11.0	3.8	5.9	6.8	.00
Observer	Female	72.5	9.9	5.4	5.9	6.3	.00

Note. For Self-Report, N = 463 males and N = 463 females. For Observer, N = 444 males and N = 444 females. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for males than females.

Race/Ethnicity

Invariance between Hispanic and White individuals and between Black and White individuals for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form (the sample size for Asian individuals was too small to permit DTF). When comparing Hispanic and White individuals, the ETSSD effect size was -0.04 for Self-Report and 0.00 for Observer. When comparing Black and White individuals, effect sizes were 0.05 for Self-Report and 0.06 for Observer. All effects were trivial in size, indicating no evidence of DTF by race/ethnicity. These results demonstrate a lack of race/ethnicity-based measurement bias between White, Black, and Hispanic groups, reinforcing the generalizability of the CAARS 2–ADHD Index by demonstrating psychometric equivalence for White, Hispanic, and Black individuals.

To further explore potential race/ethnicity differences, observed group differences between scores on the CAARS 2–ADHD Index were also investigated. These group differences were analyzed using matched samples from the Normative Sample (matched on gender, language[s] spoken, clinical status, and EL; refer to appendix J for demographic characteristics of the matched samples). Results of these analyses for all forms are presented in Table 12.19. Effects were negligible when comparing ratings of White and Black, White and Hispanic, and White and Asian individuals (Cliff’s d ranging from -.10 to -.14 for Self-Report and .00 to .08 for Observer).

Overall, the very small differences observed between race/ethnicity groups add support for the fair and generalizable use of the CAARS 2–ADHD Index across these populations, and, together with the DTF results, provide evidence for equitable measurement when using the CAARS 2–ADHD Index.

Click to expand

Table 12.19. Group Differences by U.S. Race/Ethnicity: CAARS 2–ADHD Index

Form	Comparison Group	Race/Ethnicity of Individual Being Rated	Percentage of Matched Samples in Probability Score Category					Cliff's d
Form	Comparison Group	Race/Ethnicity of Individual Being Rated	Very Low	Low	Borderline	High	Very High	Cliff's d
Self-Report	White vs. Black	White	71.0	9.4	4.3	3.6	11.6	-.11
	White vs. Black	Black	81.2	7.2	1.4	4.3	5.8	-.11
	White vs. Hispanic	White	73.1	7.5	2.2	5.4	11.8	-.10
	White vs. Hispanic	Hispanic	82.8	7.5	2.2	1.1	6.5	-.10
	White vs. Asian	White	71.2	11.5	3.8	5.8	7.7	-.14
	White vs. Asian	Asian	86.5	3.8	0.0	1.9	7.7	-.14
Observer	White vs. Black	White	73.7	10.5	3.9	5.3	6.6	.00
	White vs. Black	Black	74.3	8.6	4.6	5.3	7.2	.00
	White vs. Hispanic	White	80.6	7.8	2.9	1.9	6.8	.08
	White vs. Hispanic	Hispanic	71.8	11.7	4.9	5.8	5.8	.08
	White vs. Asian	White	80.0	5.7	5.7	5.7	2.9	.01
	White vs. Asian	Asian	80.0	5.7	0.0	11.4	2.9	.01

Note. For Self-Report matched samples, N = 138 per group for White vs. Black, N = 93 per group for White vs. Hispanic, and N = 52 per group for White vs. Asian. For Observer matched samples, N = 152 per group for White vs. Black, N = 103 per group for White vs. Hispanic, and N = 35 per group for White vs. Asian. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates higher score for the first listed race/ethnic group than the second group.

Country of Residence

To address equivalence of scores across countries, individuals in the U.S. and Canada were compared on the CAARS 2–ADHD Index. Cross-cultural differences were expected to be minimal, and the lack of meaningful differences would support the generalizability and utility of the CAARS 2–ADHD Index for use in both the U.S. and Canada.

The invariance by country of residence for the CAARS 2–ADHD Index was explored via DTF analyses using the Total Sample for each rater form. The effect size of the DTF statistic was negligible for both forms (ETSSD = -0.01 for Self-Report and 0.00 for Observer; negative ETSSD values indicate higher expected scores for Americans than Canadians even when matched on the construct being measured). Trivial differences were found (effect sizes at or below |.01|), which demonstrates the invariance of the CAARS 2–ADHD Index across countries and supports its generalizability to U.S. and Canadian populations.

To further explore potential country of residence differences, the CAARS 2–ADHD Index was also investigated in terms of observed group differences between scores. These group differences were analyzed using a matched sample (matched on gender, language[s] spoken, clinical status, and EL; refer to appendix J for demographic characteristics of the matched samples). Group differences were analyzed via Cliff’s d effect size values, which compared the two countries in terms of the percentage of the sample scoring within each category of the CAARS 2–ADHD Index probability score. Results of these analyses are presented in Table 12.20 for both forms. Effect sizes were negligible for all comparisons (Cliff’s d = .05 for Self-Report and Cliff’s d = .05 for Observer).

Overall, the very small differences observed between the country of residence groups add support for the fair and generalizable use of the CAARS 2–ADHD Index with individuals from the U.S. and Canada, and together with the DTF results, provide evidence for equitable measurement of individuals from the U.S. and Canada, using the CAARS 2-ADHD Index.

Click to expand

Table 12.20. Group Differences by Country of Residence: CAARS 2–ADHD Index

Form	Country of Residence	Percentage of Matched Samples in Probability Score Category					Cliff's d
Form	Country of Residence	Very Low	Low	Borderline	High	Very High	Cliff's d
Self-Report	Canada	80.2	14.0	1.2	1.2	3.5	.05
Self-Report	U.S.	76.7	7.0	2.3	3.5	10.5	.05
Observer	Canada	77.8	14.8	2.5	2.5	2.5	.05
Observer	U.S.	74.1	11.1	4.9	4.9	4.9	.05

Note. For Self-Report matched sample, N = 86 per group, and for Observer matched sample, N = 81 per group. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for individuals from the U.S. than individuals from Canada.

Education Level

Education level (EL) can sometimes be considered a proxy for, or a contributing factor to, one’s socioeconomic status (SES), which is among the sociocultural variables that may influence the fairness of a test. It was expected that the constructs measured on the CAARS 2 would be independent of influence from EL. To test this hypothesis and ensure generalizability of scores from the CAARS 2–ADHD Index, individuals in the Self-Report and Observer samples reported the EL of the rated individual from one of five options: No high school diploma (EL 1), High school diploma/GED (EL 2), Some college or associate degree (EL 3), Bachelor’s degree (EL 4), and Graduate or professional degree (EL 5). Within the normative samples, the proportion of individuals in each of these groups matched recent U.S. and Canadian census values (more information can be found in Education Level in chapter 7, Standardization).

Equivalence across the EL groups was investigated with DTF analyses using the Total Sample for the Self-Report and Observer. For the sake of DTF analyses, which require binary variables, EL was re-categorized into two groups: individuals without post-secondary education (EL 1 and EL 2; N = 505 for Self-Report and N = 515 for Observer) and individuals with post-secondary education (EL 3, EL 4, and EL 5; N = 815 for Self-Report and N = 804 for Observer). The DTF analysis revealed trivial effect sizes (ETSSD = 0.02 for Self-Report and 0.06 for Observer). The very small values observed in this analysis support the invariance of the CAARS 2–ADHD Index with regard to the educational background of the individual being rated.

To further explore potential EL differences, the CAARS 2–ADHD Index was also investigated for potential group differences in terms of the frequency distribution of scores for each group. These group differences were analyzed using the entire Normative Sample. Although previous sections of this chapter looked at group differences based on matched subsamples of the Normative Sample, it was not possible to create demographically matched groups for the EL analysis due to small sample sizes of the EL groups. Given the lack of differences found for other demographic variables (as evidenced by results earlier in this section), results for the EL analysis are unlikely to be influenced by the inclusion of covariates. Differences are presented as Cliff’s d values, which compare the percentage of the sample scoring within each category of the CAARS 2–ADHD Index probability score between the EL groups. Results of this analysis are presented in Table 12.21a and Table 12.21b for Self-Report and Observer. Effect sizes were negligible for all of the comparisons (Cliff’s d ranges from -.06 to .05 for Self-Report and -.02 to .08 for Observer).

Overall, the very small differences observed between EL groups demonstrate a lack of influence of education level on the CAARS 2–ADHD Index. This finding adds support for the fair and generalizable use of the CAARS 2–ADHD Index, and together with the DTF results, provides evidence for equitable measurement across EL groups when using the CAARS 2–ADHD Index.

Click to expand

Table 12.21a. Group Differences by Education Level: CAARS 2–ADHD Index

Form	Education Level	Percentage of Normative Samples in Probability Score Category
Form	Education Level	Very Low	Low	Borderline	High	Very High
Self-Report	EL 1	74.8	10.2	4.7	3.9	6.3
	EL 2	80.2	9.8	1.6	2.9	5.6
	EL 3	76.6	7.0	1.8	2.9	11.7
	EL 4	75.8	8.5	2.8	3.2	9.6
	EL 5	78.5	3.4	4.7	3.4	10.1
Observer	EL 1	66.2	16.2	1.5	9.2	6.9
	EL 2	74.1	9.3	4.4	4.1	8.0
	EL 3	69.5	11.1	4.2	7.4	7.9
	EL 4	71.6	10.4	4.1	6.3	7.5
	EL 5	67.3	10.3	7.1	7.7	7.7

Click to expand

Table 12.21b. Group Differences Effect Sizes by Education Level: CAARS 2–ADHD Index

Form	EL1 vs. EL2	EL1 vs. EL3	EL1 vs. EL4	EL1 vs. EL5	EL2 vs. EL3	EL2 vs. EL4	EL2 vs. EL5	EL3 vs. EL4	EL3 vs. EL5	EL4 vs. EL5
Self-Report	-.06	.00	.00	-.02	-.05	.05	.03	.00	-.02	.02
Observer	-.02	.00	.08	.01	.06	.02	.00	.00	.00	.00

Note. EL = Education level; EL 1 = No high school diploma (N = 127 for Self-Report and 130 for Observer); EL 2 = High school diploma/GED (N = 378 for Self-Report and 386 for Observer); EL 3 = Some college or associate degree (N = 385 for Self-Report and 380 for Observer); EL 4 = Bachelor's degree (N = 281 for Self-Report and 268 for Observer); EL 5 = Graduate or professional degree (N = 149 for Self-Report and 156 for Observer). Values presented are Cliff's d. Guidelines for interpreting Cliff's |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff's d value indicates that scores were higher for the group listed first in the heading than the group listed second.

< Back

Next >