Conners 4 Manual

Chapter 10: Gender

Gender

view all chapter tables | print this section

Gender, for the purposes of fairness-related analyses, is defined as the rated youth’s gender identity. Analyses were conducted to compare males and females on the Conners 4 in terms of MI, DTF, and mean group differences. The sample size for youth who are non-binary or indicated “Other” for gender (N = 6 for Parent, N = 1 for Teacher, and N = 3 for Self-Report) did not allow for meaningful testing. Therefore, when assessing invariance by gender, only males (N = 1,705 for Parent; N = 1,473 for Teacher; and N = 788 for Self-Report) and females (N = 1,539 for Parent; N = 1,404 for Teacher; and N = 796 for Self-Report) from the Total Sample were included.

Invariance between males and females for the Conners 4 was first explored via MI analyses (see Tables 10.1 to 10.3). There were no meaningful differences in the fit of progressively stringent models in the Parent form. While the Satorra-Bentler χ² was statistically significant for some comparisons (e.g., within the Content Scales, as seen in Table 10.1), the other fit statistics, such as CFI and SRMR, did not show any decline in model fit for any of the comparisons, which does not clearly violate assumptions of invariance. Similar results were found in both Teacher and Self-Report forms. There are some models with a significant Satorra-Bentler χ² test, (e.g., the intercept models tested within the Content Scales); however, the absence of any decline in many other fit statistics suggests invariance was upheld. As more constraints were added throughout the process of testing MI, model fit did not change in a meaningful way, indicating that the factor structure, loadings, thresholds, and intercepts are invariant between males and females.

Click to expand

Table 10.1. Measurement Invariance by Gender: Conners 4 Parent

Scales	Model	χ2	df	RMSEA	CFI	TLI	SRMR	Comparison	Satorra-Bentler χ2	df	∆CFI
Content Scales	Configural	12632.05***	3274	.042	.969	.967	.040	—
	Threshold	12720.44***	3333	.042	.969	.968	.040	configural v. threshold	78.53*	59	.000
	Loading	12699.78***	3386	.041	.969	.969	.040	threshold v. loading	85.00**	53	.000
	Intercept	12831.59***	3439	.041	.969	.969	.040	loading v. intercept	281.95***	53	.000
Impairment & Functional Outcome Scales	Configural	2364.94***	298	.065	.980	.977	.044	—
	Threshold	2417.30***	317	.064	.980	.978	.044	configural v. threshold	11.54	19	.000
	Loading	2379.56***	333	.062	.981	.980	.044	threshold v. loading	20.28	16	.001
	Intercept	2351.25***	349	.059	.981	.981	.044	loading v. intercept	33.19**	16	.000
DSM Oppositional Defiant Disorder Symptoms Scale	Configural	949.39***	70	.088	.980	.974	.038	—
	Threshold	985.38***	80	.084	.980	.977	.038	configural v. threshold	5.34	10	.000
	Loading	942.70***	89	.077	.981	.981	.038	threshold v. loading	8.05	9	.001
	Intercept	884.31***	98	.070	.982	.984	.038	loading v. intercept	12.07	9	.001
DSM Conduct Disorder Symptoms Scale	Configural	1219.26***	180	.060	.964	.958	.080	—
	Threshold	1259.88***	195	.058	.963	.960	.080	configural v. threshold	15.67	15	.001
	Loading	1245.62***	209	.055	.964	.964	.080	threshold v. loading	16.17	14	.001
	Intercept	1112.03***	223	.050	.969	.971	.080	loading v. intercept	21.14	14	.005

Note. N = 1,705 males; N = 1,539 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.

Click to expand

Table 10.2. Measurement Invariance by Gender: Conners 4 Teacher

Scales	Model	χ2	df	RMSEA	CFI	TLI	SRMR	Comparison	Satorra-Bentler χ2	df	∆CFI
Content Scales	Configural	14556.81***	3274	.049	.964	.963	.053	—
	Threshold	14643.71***	3333	.049	.964	.963	.053	configural v. threshold	62.02	59	.000
	Loading	14578.55***	3386	.048	.964	.964	.053	threshold v. loading	67.82	53	.000
	Intercept	14514.18***	3439	.047	.965	.965	.053	loading v. intercept	216.76***	53	.001
Impairment & Functional Outcome Scales	Configural	1457.23***	106	.094	.979	.974	.055	—
	Threshold	1537.77***	118	.092	.978	.975	.055	configural v. threshold	34.24**	12	.001
	Loading	1544.13***	128	.088	.978	.977	.055	threshold v. loading	15.51	10	.000
	Intercept	1607.12***	138	.086	.977	.978	.055	loading v. intercept	71.53***	10	.001
DSM Oppositional Defiant Disorder Symptoms Scale	Configural	1047.94***	70	.099	.985	.980	.037	—
	Threshold	1092.69***	80	.094	.984	.982	.037	configural v. threshold	12.76	10	.001
	Loading	1064.38***	89	.087	.985	.985	.037	threshold v. loading	5.69	9	.001
	Intercept	1067.34***	98	.083	.985	.986	.037	loading v. intercept	53.28***	9	.000
DSM Conduct Disorder Symptoms Scale	Configural	916.88***	130	.065	.963	.955	.111	—
	Threshold	949.97***	143	.063	.962	.958	.111	configural v. threshold	8.51	13	.001
	Loading	921.39***	155	.059	.964	.964	.111	threshold v. loading	8.88	12	.002
	Intercept	806.74***	167	.052	.970	.972	.112	loading v. intercept	16.66	12	.006

Note. N = 1,473 males; N = 1,404 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.

Click to expand

Table 10.3. Measurement Invariance by Gender: Conners 4 Self-Report Scales Model χ2 df RMSEA CFI TLI SRMR Comparison Satorra-Bentler χ2 df ∆CFI

Scales	Model	χ2	df	RMSEA	CFI	TLI	SRMR	Comparison	Satorra-Bentler χ2	df	∆CFI
Content Scales	Configural	7142.83***	3390	.037	.956	.954	.051	—
	Threshold	7213.38***	3450	.037	.956	.955	.051	configural v. threshold	69.33	60	.000
	Loading	7189.31***	3504	.036	.957	.956	.051	threshold v. loading	60.12	54	.001
	Intercept	7266.83***	3558	.036	.956	.957	.051	loading v. intercept	165.33***	54	.001
Impairment & Functional Outcome Scales	Configural	1301.77***	298	.065	.945	.937	.065	—
	Threshold	1342.24***	317	.064	.944	.940	.065	configural v. threshold	22.48	19	.001
	Loading	1315.20***	333	.061	.946	.945	.065	threshold v. loading	30.58*	16	.002
	Intercept	1404.73***	349	.062	.942	.944	.066	loading v. intercept	95.42***	16	.004
DSM Oppositional Defiant Disorder Symptom Scale	Configural	615.73***	70	.099	.942	.926	.066	—
	Threshold	645.42***	80	.095	.94	.933	.066	configural v. threshold	5.22	10	.002
	Loading	601.67***	89	.085	.946	.945	.067	threshold v. loading	11.09	9	.006
	Intercept	597.72***	98	.080	.947	.951	.067	loading v. intercept	26.87**	9	.001
DSM Conduct Disorder Symptom Scale	Configural	595.10***	180	.054	.946	.937	.091	—
	Threshold	616.51***	194	.052	.945	.940	.091	configural v. threshold	13.70	14	.001
	Loading	601.42***	208	.049	.949	.948	.091	threshold v. loading	9.94	14	.004
	Intercept	558.75***	222	.044	.956	.958	.094	loading v. intercept	21.34	14	.007

Note. N = 788 males; N = 796 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.

In addition to MI results, DTF analyses were conducted to explore the invariance of the Conners 4 for males and females through a different framework. An example of a DTF graph is provided in Figure 10.1. Test functioning curves for males and females are depicted, along with a shaded band to display a 95% confidence interval, and the two groups’ curves are almost completely overlapping, demonstrating a lack of difference for the Inattention/Executive Dysfunction scale. Similar findings were found for all scales across all forms in terms of gender.

The effect size of the DTF analyses for all scales, as measured by the ETSSD, are summarized in Table 10.4. There was a small effect of gender on the DSM Conduct Disorder Symptoms scale for Teacher (ETSSD = -.23). The value is negative, indicating females would score slightly higher than males when females and males actually had an equal standing in terms of Conduct Disorder symptoms as a construct. The test-level effect appears to result from an accumulation of negligible to small effects on the test items, and together with the small size of this effect, there is little support for a concerning lack of invariance on this scale. All other differences were trivial in nature across the Conners 4 scales and across all forms, demonstrating invariance by gender.

Figure 10.1. Differential Test Functioning by Gender: Inattention/Executive Dysfunction

a) Parent	b) Teacher
c) Self-Report

Click to expand

Table 10.4. Differential Test Functioning Effect Sizes by Gender

Scale		Parent	Teacher	Self-Report
Content Scales	Inattention/Executive Dysfunction	.00	.01	−.02
	Hyperactivity	.02	.06	.01
	Impulsivity	.01	.00	−.03
	Emotional Dysregulation	.01	.01	−.08
	Depressed Mood	−.01	.01	−.02
	Anxious Thoughts	−.02	−.03	−.03
Impairment & Functional Outcome Scales	Schoolwork	−.03	−.05	−.03
	Peer Interactions	.01	−.08	.05
	Family Life	−.03	—	−.12
DSM Symptom Scales	Oppositional Defiant Disorder Symptoms	−.01	.00	.10
DSM Symptom Scales	Conduct Disorder Symptoms	−.01	−.23	.01

Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that female youth received higher scores than male youth who had the same level of the construct being measured.

Taken together, results from both MI and DTF analyses indicate psychometric equivalence between males and females for the Conners 4 scales, as there was no strong evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the two gender groups. Although there was a small effect for scores from the DSM Conduct Disorder Symptoms scale on the Teacher form when examined with DTF, the effect was small, likely stemming from the accumulated trivial effects at the item-level. The effect was not corroborated through MI analyses, which investigated a similar question in a slightly different method.

To examine observed group differences between gender, a subsample of male youth were selected at random to match a sample of females from the Normative Samples. Youth were matched by PEL (for Parent and Self-Report only), language(s) spoken, clinical status, race/ethnicity, and age (see Table F.36 in appendix F for the demographic characteristics of the youth being rated and Table F.37 for demographic characteristics of the parent and teacher raters).

The paired samples of males and females were then compared for significant differences across mean scores. Results of the ANOVAs and descriptive statistics for each scale are presented in Tables 10.5 to 10.7. When comparing ratings of male and female youth, the Parent results showed no statistically significant effects across all scales. Cohen’s d effect sizes, capturing the size of the difference between group means, demonstrated negligible effects (with Cohen’s d ranging from 0.00 to |0.13|). For the Teacher results, statistically significant effects were observed for all scales except Depressed Mood and Anxious Thoughts. For scales with statistically significant effects, ratings of males resulted in slightly higher scores than females and the effect sizes were negligible to small (with Cohen’s d ranging from |0.04| to |0.40|), yielding scores up to approximately 4 points higher for male students. For the Self-Report results, the only statistically significant effect observed was for the Anxious Thoughts scale, wherein females yielded slightly higher scores than males; however, the effect size was small (Cohen’s d = -0.21).

Overall, these results support the absence of meaningful gender differences, and together with the MI and DTF results, there is evidence for equivalent measurement for males and females when using the Conners 4. Additionally, for Parent and Self-Report, scores for males and females were not meaningfully different; for Teacher, some differences were observed that might reflect differences in teachers’ perceptions of students that are independent of the test. Assessors may wish to make note of these differences when interpreting scores from teacher raters. Note that Gender Specific and Combined Gender normative scoring options are available; please see chapter 3, Scoring and Reports, for details.

Click to expand

Table 10.5. Group Differences by Gender (Male vs. Female): Conners 4 Parent

Scales			Male (N = 492)	Female (N = 492)	Cohen’s d	F (1, 982)	p	η2
Content Scales	Inattention/Executive Dysfunction	M	49.9	49.3	0.06	0.97	.326	.00
		SD	10.1	9.9
	Hyperactivity	M	49.7	49.6	0.00	0.00	.950	.00
		SD	10.0	9.7
	Impulsivity	M	49.4	49.8	−0.05	0.51	.477	.00
		SD	10.0	10.1
	Emotional Dysregulation	M	49.8	49.7	0.00	0.01	.943	.00
		SD	10.4	9.6
	Depressed Mood	M	49.6	49.8	−0.02	0.10	.753	.00
		SD	9.8	9.8
	Anxious Thoughts	M	49.1	50.1	−0.11	2.87	.090	.00
		SD	9.2	9.8
Impairment & Functional Outcome Scales	Schoolwork	M	50.5	49.2	0.13	4.17	.041	.00
		SD	10.6	9.8
	Peer Interactions	M	50.3	49.5	0.08	1.43	.231	.00
		SD	10.5	9.4
	Family Life	M	49.8	49.8	0.00	0.01	.940	.00
		SD	10.1	9.9
DSM Symptom Scales	Oppositional Defiant Disorder Symptoms	M	50.2	49.4	0.08	1.63	.202	.00
		SD	10.6	9.5
	Conduct Disorder Symptoms	M	49.9	50.0	−0.01	0.03	.862	.00
		SD	10.2	10.6

Note. Guidelines for interpreting η²: negligible effect size < .01; small effect size = .01 to .05; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen’s |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen’s d value indicates that ratings of males resulted in higher scores than ratings of females.

Click to expand

Table 10.6. Group Differences by Gender (Male vs. Female): Conners 4 Teacher

Scale			Male (N = 644)	Female (N = 644)	Cohen’s d	F (1, 1286)	p	η2
Content Scales	Inattention/Executive Dysfunction	M	51.8	47.9	0.40	52.53	< .001	.04
		SD	10.5	8.9
	Hyperactivity	M	51.8	47.9	0.40	50.76	< .001	.04
		SD	10.7	8.8
	Impulsivity	M	51.4	48.5	0.29	26.70	< .001	.02
		SD	10.5	9.3
	Emotional Dysregulation	M	50.7	49.1	0.16	7.79	.005	.01
		SD	10.8	9.2
	Depressed Mood	M	50.5	49.5	0.10	3.14	.077	.00
		SD	10.6	9.6
	Anxious Thoughts	M	49.8	50.2	−0.04	0.59	.443	.00
		SD	10.0	10.0
Impairment & Functional Outcome Scales	Schoolwork	M	51.9	47.9	0.40	51.75	< .001	.04
		SD	10.7	8.9
	Peer Interactions	M	50.9	48.7	0.22	15.47	< .001	.01
		SD	10.6	9.2
DSM Symptom Scales	Oppositional Defiant Disorder Symptoms	M	51.2	48.6	0.26	22.21	< .001	.02
		SD	11.0	9.0
	Conduct Disorder Symptoms	M	51.0	49.1	0.18	10.42	.001	.01
		SD	11.6	9.0

Click to expand

Table 10.7. Group Differences by Gender (Male vs. Female): Conners 4 Self-Report

Scale			Male (N = 322)	Female (N = 322)	Cohen’s d	F (1, 642)	p	η2
Content Scales	Inattention/Executive Dysfunction	M	50.9	49.5	0.15	3.46	.063	.01
	Inattention/Executive Dysfunction	SD	10.6	9.3	0.15	3.46	.063	.01
	Hyperactivity	M	51.2	49.8	0.14	3.13	.077	.00
	Hyperactivity	SD	10.6	9.5	0.14	3.13	.077	.00
	Impulsivity	M	50.7	50.0	0.06	0.60	.438	.00
	Impulsivity	SD	10.9	9.9	0.06	0.60	.438	.00
	Emotional Dysregulation	M	50.5	49.9	0.05	0.48	.488	.00
	Emotional Dysregulation	SD	10.4	9.6	0.05	0.48	.488	.00
	Depressed Mood	M	49.6	50.2	−0.06	0.64	.426	.00
	Depressed Mood	SD	9.4	10.4	−0.06	0.64	.426	.00
	Anxious Thoughts	M	49.0	51.1	−0.21	6.98	.008	.01
	Anxious Thoughts	SD	9.1	10.2	−0.21	6.98	.008	.01
Impairment & Functional Outcome Scales	Schoolwork	M	50.9	49.2	0.17	4.80	.029	.01
	Schoolwork	SD	10.5	9.2	0.17	4.80	.029	.01
	Peer Interactions	M	50.5	49.9	0.06	0.62	.430	.00
	Peer Interactions	SD	9.9	10.0	0.06	0.62	.430	.00
	Family Life	M	50.2	50.2	0.00	0.00	.974	.00
	Family Life	SD	10.7	10.3	0.00	0.00	.974	.00
DSM Symptom Scales	Oppositional Defiant Disorder Symptoms	M	51.1	49.5	0.16	4.07	.044	.01
	Oppositional Defiant Disorder Symptoms	SD	11.0	9.3	0.16	4.07	.044	.01
	Conduct Disorder Symptoms	M	51.2	49.5	0.16	4.02	.045	.01
	Conduct Disorder Symptoms	SD	12.2	9.5	0.16	4.02	.045	.01

< Back

Next >

Chapter 10: Fairness

Conners 4 Manual

Chapter 10: Gender

Gender