Manual

Conners 4 Manual

Chapter 10: Gender


Gender

view all chapter tables | print this section

Gender, for the purposes of fairness-related analyses, is defined as the rated youth’s gender identity. Analyses were conducted to compare males and females on the Conners 4 in terms of MI, DTF, and mean group differences. The sample size for youth who are non-binary or indicated “Other” for gender (N = 6 for Parent, N = 1 for Teacher, and N = 3 for Self-Report) did not allow for meaningful testing. Therefore, when assessing invariance by gender, only males (N = 1,705 for Parent; N = 1,473 for Teacher; and N = 788 for Self-Report) and females (N = 1,539 for Parent; N = 1,404 for Teacher; and N = 796 for Self-Report) from the Total Sample were included.

Invariance between males and females for the Conners 4 was first explored via MI analyses (see Tables 10.1 to 10.3). There were no meaningful differences in the fit of progressively stringent models in the Parent form. While the Satorra-Bentler χ2 was statistically significant for some comparisons (e.g., within the Content Scales, as seen in Table 10.1), the other fit statistics, such as CFI and SRMR, did not show any decline in model fit for any of the comparisons, which does not clearly violate assumptions of invariance. Similar results were found in both Teacher and Self-Report forms. There are some models with a significant Satorra-Bentler χ2 test, (e.g., the intercept models tested within the Content Scales); however, the absence of any decline in many other fit statistics suggests invariance was upheld. As more constraints were added throughout the process of testing MI, model fit did not change in a meaningful way, indicating that the factor structure, loadings, thresholds, and intercepts are invariant between males and females.


Click to expand

Table 10.1. Measurement Invariance by Gender: Conners 4 Parent

Scales

Model

χ2

df

RMSEA

CFI

TLI

SRMR

Comparison

Satorra-Bentler χ2

df

CFI

Content Scales

Configural

12632.05***

3274

.042

.969

.967

.040

Threshold

12720.44***

3333

.042

.969

.968

.040

configural v. threshold

78.53*

59

.000

Loading

12699.78***

3386

.041

.969

.969

.040

threshold v. loading

85.00**

53

.000

Intercept

12831.59***

3439

.041

.969

.969

.040

loading v. intercept

281.95***

53

.000

Impairment &
Functional
Outcome Scales

Configural

2364.94***

298

.065

.980

.977

.044

Threshold

2417.30***

317

.064

.980

.978

.044

configural v. threshold

11.54

19

.000

Loading

2379.56***

333

.062

.981

.980

.044

threshold v. loading

20.28

16

.001

Intercept

2351.25***

349

.059

.981

.981

.044

loading v. intercept

33.19**

16

.000

DSM Oppositional Defiant Disorder Symptoms Scale

Configural

949.39***

70

.088

.980

.974

.038

Threshold

985.38***

80

.084

.980

.977

.038

configural v. threshold

5.34

10

.000

Loading

942.70***

89

.077

.981

.981

.038

threshold v. loading

8.05

9

.001

Intercept

884.31***

98

.070

.982

.984

.038

loading v. intercept

12.07

9

.001

DSM Conduct Disorder Symptoms Scale

Configural

1219.26***

180

.060

.964

.958

.080

Threshold

1259.88***

195

.058

.963

.960

.080

configural v. threshold

15.67

15

.001

Loading

1245.62***

209

.055

.964

.964

.080

threshold v. loading

16.17

14

.001

Intercept

1112.03***

223

.050

.969

.971

.080

loading v. intercept

21.14

14

.005

Note. N = 1,705 males; N = 1,539 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.


Click to expand

Table 10.2. Measurement Invariance by Gender: Conners 4 Teacher

Scales

Model

χ2

df

RMSEA

CFI

TLI

SRMR

Comparison

Satorra-Bentler χ2

df

CFI

Content Scales

Configural

14556.81***

3274

.049

.964

.963

.053

Threshold

14643.71***

3333

.049

.964

.963

.053

configural v. threshold

62.02

59

.000

Loading

14578.55***

3386

.048

.964

.964

.053

threshold v. loading

67.82

53

.000

Intercept

14514.18***

3439

.047

.965

.965

.053

loading v. intercept

216.76***

53

.001

Impairment & Functional Outcome Scales

Configural

1457.23***

106

.094

.979

.974

.055

Threshold

1537.77***

118

.092

.978

.975

.055

configural v. threshold

34.24**

12

.001

Loading

1544.13***

128

.088

.978

.977

.055

threshold v. loading

15.51

10

.000

Intercept

1607.12***

138

.086

.977

.978

.055

loading v. intercept

71.53***

10

.001

DSM Oppositional Defiant Disorder Symptoms Scale

Configural

1047.94***

70

.099

.985

.980

.037

Threshold

1092.69***

80

.094

.984

.982

.037

configural v. threshold

12.76

10

.001

Loading

1064.38***

89

.087

.985

.985

.037

threshold v. loading

5.69

9

.001

Intercept

1067.34***

98

.083

.985

.986

.037

loading v. intercept

53.28***

9

.000

DSM Conduct Disorder Symptoms Scale

Configural

916.88***

130

.065

.963

.955

.111

Threshold

949.97***

143

.063

.962

.958

.111

configural v. threshold

8.51

13

.001

Loading

921.39***

155

.059

.964

.964

.111

threshold v. loading

8.88

12

.002

Intercept

806.74***

167

.052

.970

.972

.112

loading v. intercept

16.66

12

.006

Note. N = 1,473 males; N = 1,404 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.


Click to expand

Table 10.3. Measurement Invariance by Gender: Conners 4 Self-Report Scales Model χ2 df RMSEA CFI TLI SRMR Comparison Satorra-Bentler χ2 df ∆CFI

Scales

Model

χ2

df

RMSEA

CFI

TLI

SRMR

Comparison

Satorra-Bentler χ2

df

∆CFI

Content Scales

Configural

7142.83***

3390

.037

.956

.954

.051

Threshold

7213.38***

3450

.037

.956

.955

.051

configural v. threshold

69.33

60

.000

Loading

7189.31***

3504

.036

.957

.956

.051

threshold v. loading

60.12

54

.001

Intercept

7266.83***

3558

.036

.956

.957

.051

loading v. intercept

165.33***

54

.001

Impairment & Functional Outcome Scales

Configural

1301.77***

298

.065

.945

.937

.065

Threshold

1342.24***

317

.064

.944

.940

.065

configural v. threshold

22.48

19

.001

Loading

1315.20***

333

.061

.946

.945

.065

threshold v. loading

30.58*

16

.002

Intercept

1404.73***

349

.062

.942

.944

.066

loading v. intercept

95.42***

16

.004

DSM Oppositional Defiant Disorder Symptom Scale

Configural

615.73***

70

.099

.942

.926

.066

Threshold

645.42***

80

.095

.94

.933

.066

configural v. threshold

5.22

10

.002

Loading

601.67***

89

.085

.946

.945

.067

threshold v. loading

11.09

9

.006

Intercept

597.72***

98

.080

.947

.951

.067

loading v. intercept

26.87**

9

.001

DSM Conduct Disorder Symptom Scale

Configural

595.10***

180

.054

.946

.937

.091

Threshold

616.51***

194

.052

.945

.940

.091

configural v. threshold

13.70

14

.001

Loading

601.42***

208

.049

.949

.948

.091

threshold v. loading

9.94

14

.004

Intercept

558.75***

222

.044

.956

.958

.094

loading v. intercept

21.34

14

.007

Note. N = 788 males; N = 796 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.


In addition to MI results, DTF analyses were conducted to explore the invariance of the Conners 4 for males and females through a different framework. An example of a DTF graph is provided in Figure 10.1. Test functioning curves for males and females are depicted, along with a shaded band to display a 95% confidence interval, and the two groups’ curves are almost completely overlapping, demonstrating a lack of difference for the Inattention/Executive Dysfunction scale. Similar findings were found for all scales across all forms in terms of gender.

The effect size of the DTF analyses for all scales, as measured by the ETSSD, are summarized in Table 10.4. There was a small effect of gender on the DSM Conduct Disorder Symptoms scale for Teacher (ETSSD = -.23). The value is negative, indicating females would score slightly higher than males when females and males actually had an equal standing in terms of Conduct Disorder symptoms as a construct. The test-level effect appears to result from an accumulation of negligible to small effects on the test items, and together with the small size of this effect, there is little support for a concerning lack of invariance on this scale. All other differences were trivial in nature across the Conners 4 scales and across all forms, demonstrating invariance by gender.


Figure 10.1. Differential Test Functioning by Gender: Inattention/Executive Dysfunction

a) Parent

Parent

b) Teacher

Teacher

c) Self-Report

Self-Report


Click to expand

Table 10.4. Differential Test Functioning Effect Sizes by Gender

Scale

Parent

Teacher

Self-Report

Content Scales

Inattention/Executive Dysfunction

.00

.01

−.02

Hyperactivity

.02

.06

.01

Impulsivity

.01

.00

−.03

Emotional Dysregulation

.01

.01

−.08

Depressed Mood

−.01

.01

−.02

Anxious Thoughts

−.02

−.03

−.03

Impairment & Functional Outcome Scales

Schoolwork

−.03

−.05

−.03

Peer Interactions

.01

−.08

.05

Family Life

−.03

−.12

DSM Symptom Scales

Oppositional Defiant Disorder Symptoms

−.01

.00

.10

Conduct Disorder Symptoms

−.01

−.23

.01

Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that female youth received higher scores than male youth who had the same level of the construct being measured.


Taken together, results from both MI and DTF analyses indicate psychometric equivalence between males and females for the Conners 4 scales, as there was no strong evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the two gender groups. Although there was a small effect for scores from the DSM Conduct Disorder Symptoms scale on the Teacher form when examined with DTF, the effect was small, likely stemming from the accumulated trivial effects at the item-level. The effect was not corroborated through MI analyses, which investigated a similar question in a slightly different method.

To examine observed group differences between gender, a subsample of male youth were selected at random to match a sample of females from the Normative Samples. Youth were matched by PEL (for Parent and Self-Report only), language(s) spoken, clinical status, race/ethnicity, and age (see Table F.36 in appendix F for the demographic characteristics of the youth being rated and Table F.37 for demographic characteristics of the parent and teacher raters).

The paired samples of males and females were then compared for significant differences across mean scores. Results of the ANOVAs and descriptive statistics for each scale are presented in Tables 10.5 to 10.7. When comparing ratings of male and female youth, the Parent results showed no statistically significant effects across all scales. Cohen’s d effect sizes, capturing the size of the difference between group means, demonstrated negligible effects (with Cohen’s d ranging from 0.00 to |0.13|). For the Teacher results, statistically significant effects were observed for all scales except Depressed Mood and Anxious Thoughts. For scales with statistically significant effects, ratings of males resulted in slightly higher scores than females and the effect sizes were negligible to small (with Cohen’s d ranging from |0.04| to |0.40|), yielding scores up to approximately 4 points higher for male students. For the Self-Report results, the only statistically significant effect observed was for the Anxious Thoughts scale, wherein females yielded slightly higher scores than males; however, the effect size was small (Cohen’s d = -0.21).

Overall, these results support the absence of meaningful gender differences, and together with the MI and DTF results, there is evidence for equivalent measurement for males and females when using the Conners 4. Additionally, for Parent and Self-Report, scores for males and females were not meaningfully different; for Teacher, some differences were observed that might reflect differences in teachers’ perceptions of students that are independent of the test. Assessors may wish to make note of these differences when interpreting scores from teacher raters. Note that Gender Specific and Combined Gender normative scoring options are available; please see chapter 3, Scoring and Reports, for details.





< Back Next >