-
Chapter 1: Introduction
-
Chapter 2: Administration
-
Chapter 3: Scoring and Reports
-
Chapter 4: Interpretation
-
Chapter 5: Case Studies
-
Chapter 6: Development
-
Chapter 7: Standardization
-
Chapter 8: Reliability
-
Chapter 9: Validity
-
Chapter 10: Fairness
-
Chapter 11: Conners 4–Short
-
Chapter 12: Conners 4–ADHD Index
-
Appendices
Conners 4 ManualChapter 10: Gender |
Gender |
Gender, for the purposes of fairness-related analyses, is defined as the rated youth’s gender identity. Analyses were conducted to compare males and females on the Conners 4 in terms of MI, DTF, and mean group differences. The sample size for youth who are non-binary or indicated “Other” for gender (N = 6 for Parent, N = 1 for Teacher, and N = 3 for Self-Report) did not allow for meaningful testing. Therefore, when assessing invariance by gender, only males (N = 1,705 for Parent; N = 1,473 for Teacher; and N = 788 for Self-Report) and females (N = 1,539 for Parent; N = 1,404 for Teacher; and N = 796 for Self-Report) from the Total Sample were included.
Invariance between males and females for the Conners 4 was first explored via MI analyses (see Tables 10.1 to 10.3). There were no meaningful differences in the fit of progressively stringent models in the Parent form. While the Satorra-Bentler χ2 was statistically significant for some comparisons (e.g., within the Content Scales, as seen in Table 10.1), the other fit statistics, such as CFI and SRMR, did not show any decline in model fit for any of the comparisons, which does not clearly violate assumptions of invariance. Similar results were found in both Teacher and Self-Report forms. There are some models with a significant Satorra-Bentler χ2 test, (e.g., the intercept models tested within the Content Scales); however, the absence of any decline in many other fit statistics suggests invariance was upheld. As more constraints were added throughout the process of testing MI, model fit did not change in a meaningful way, indicating that the factor structure, loadings, thresholds, and intercepts are invariant between males and females.
Click to expand |
Table 10.1. Measurement Invariance by Gender: Conners 4 Parent
Scales |
Model |
χ2 |
df |
RMSEA |
CFI |
TLI |
SRMR |
Comparison |
Satorra-Bentler χ2 |
df |
∆CFI |
Content Scales |
Configural |
12632.05*** |
3274 |
.042 |
.969 |
.967 |
.040 |
— |
|||
Threshold |
12720.44*** |
3333 |
.042 |
.969 |
.968 |
.040 |
configural v. threshold |
78.53* |
59 |
.000 |
|
Loading |
12699.78*** |
3386 |
.041 |
.969 |
.969 |
.040 |
threshold v. loading |
85.00** |
53 |
.000 |
|
Intercept |
12831.59*** |
3439 |
.041 |
.969 |
.969 |
.040 |
loading v. intercept |
281.95*** |
53 |
.000 |
|
Impairment & |
Configural |
2364.94*** |
298 |
.065 |
.980 |
.977 |
.044 |
— |
|||
Threshold |
2417.30*** |
317 |
.064 |
.980 |
.978 |
.044 |
configural v. threshold |
11.54 |
19 |
.000 |
|
Loading |
2379.56*** |
333 |
.062 |
.981 |
.980 |
.044 |
threshold v. loading |
20.28 |
16 |
.001 |
|
Intercept |
2351.25*** |
349 |
.059 |
.981 |
.981 |
.044 |
loading v. intercept |
33.19** |
16 |
.000 |
|
DSM Oppositional Defiant Disorder Symptoms Scale |
Configural |
949.39*** |
70 |
.088 |
.980 |
.974 |
.038 |
— |
|||
Threshold |
985.38*** |
80 |
.084 |
.980 |
.977 |
.038 |
configural v. threshold |
5.34 |
10 |
.000 |
|
Loading |
942.70*** |
89 |
.077 |
.981 |
.981 |
.038 |
threshold v. loading |
8.05 |
9 |
.001 |
|
Intercept |
884.31*** |
98 |
.070 |
.982 |
.984 |
.038 |
loading v. intercept |
12.07 |
9 |
.001 |
|
DSM Conduct Disorder Symptoms Scale |
Configural |
1219.26*** |
180 |
.060 |
.964 |
.958 |
.080 |
— |
|||
Threshold |
1259.88*** |
195 |
.058 |
.963 |
.960 |
.080 |
configural v. threshold |
15.67 |
15 |
.001 |
|
Loading |
1245.62*** |
209 |
.055 |
.964 |
.964 |
.080 |
threshold v. loading |
16.17 |
14 |
.001 |
|
Intercept |
1112.03*** |
223 |
.050 |
.969 |
.971 |
.080 |
loading v. intercept |
21.14 |
14 |
.005 |
Note. N = 1,705 males; N = 1,539 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.
Click to expand |
Table 10.2. Measurement Invariance by Gender: Conners 4 Teacher
Scales |
Model |
χ2 |
df |
RMSEA |
CFI |
TLI |
SRMR |
Comparison |
Satorra-Bentler χ2 |
df |
∆CFI |
Content Scales |
Configural |
14556.81*** |
3274 |
.049 |
.964 |
.963 |
.053 |
— |
|||
Threshold |
14643.71*** |
3333 |
.049 |
.964 |
.963 |
.053 |
configural v. threshold |
62.02 |
59 |
.000 |
|
Loading |
14578.55*** |
3386 |
.048 |
.964 |
.964 |
.053 |
threshold v. loading |
67.82 |
53 |
.000 |
|
Intercept |
14514.18*** |
3439 |
.047 |
.965 |
.965 |
.053 |
loading v. intercept |
216.76*** |
53 |
.001 |
|
Impairment & Functional Outcome Scales |
Configural |
1457.23*** |
106 |
.094 |
.979 |
.974 |
.055 |
— |
|||
Threshold |
1537.77*** |
118 |
.092 |
.978 |
.975 |
.055 |
configural v. threshold |
34.24** |
12 |
.001 |
|
Loading |
1544.13*** |
128 |
.088 |
.978 |
.977 |
.055 |
threshold v. loading |
15.51 |
10 |
.000 |
|
Intercept |
1607.12*** |
138 |
.086 |
.977 |
.978 |
.055 |
loading v. intercept |
71.53*** |
10 |
.001 |
|
DSM Oppositional Defiant Disorder Symptoms Scale |
Configural |
1047.94*** |
70 |
.099 |
.985 |
.980 |
.037 |
— |
|||
Threshold |
1092.69*** |
80 |
.094 |
.984 |
.982 |
.037 |
configural v. threshold |
12.76 |
10 |
.001 |
|
Loading |
1064.38*** |
89 |
.087 |
.985 |
.985 |
.037 |
threshold v. loading |
5.69 |
9 |
.001 |
|
Intercept |
1067.34*** |
98 |
.083 |
.985 |
.986 |
.037 |
loading v. intercept |
53.28*** |
9 |
.000 |
|
DSM Conduct Disorder Symptoms Scale |
Configural |
916.88*** |
130 |
.065 |
.963 |
.955 |
.111 |
— |
|||
Threshold |
949.97*** |
143 |
.063 |
.962 |
.958 |
.111 |
configural v. threshold |
8.51 |
13 |
.001 |
|
Loading |
921.39*** |
155 |
.059 |
.964 |
.964 |
.111 |
threshold v. loading |
8.88 |
12 |
.002 |
|
Intercept |
806.74*** |
167 |
.052 |
.970 |
.972 |
.112 |
loading v. intercept |
16.66 |
12 |
.006 |
Note. N = 1,473 males; N = 1,404 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.
Click to expand |
Table 10.3. Measurement Invariance by Gender: Conners 4 Self-Report Scales Model χ2 df RMSEA CFI TLI SRMR Comparison Satorra-Bentler χ2 df ∆CFI
Scales |
Model |
χ2 |
df |
RMSEA |
CFI |
TLI |
SRMR |
Comparison |
Satorra-Bentler χ2 |
df |
∆CFI |
Content Scales |
Configural |
7142.83*** |
3390 |
.037 |
.956 |
.954 |
.051 |
— |
|||
Threshold |
7213.38*** |
3450 |
.037 |
.956 |
.955 |
.051 |
configural v. threshold |
69.33 |
60 |
.000 |
|
Loading |
7189.31*** |
3504 |
.036 |
.957 |
.956 |
.051 |
threshold v. loading |
60.12 |
54 |
.001 |
|
Intercept |
7266.83*** |
3558 |
.036 |
.956 |
.957 |
.051 |
loading v. intercept |
165.33*** |
54 |
.001 |
|
Impairment & Functional Outcome Scales |
Configural |
1301.77*** |
298 |
.065 |
.945 |
.937 |
.065 |
— |
|||
Threshold |
1342.24*** |
317 |
.064 |
.944 |
.940 |
.065 |
configural v. threshold |
22.48 |
19 |
.001 |
|
Loading |
1315.20*** |
333 |
.061 |
.946 |
.945 |
.065 |
threshold v. loading |
30.58* |
16 |
.002 |
|
Intercept |
1404.73*** |
349 |
.062 |
.942 |
.944 |
.066 |
loading v. intercept |
95.42*** |
16 |
.004 |
|
DSM Oppositional Defiant Disorder Symptom Scale |
Configural |
615.73*** |
70 |
.099 |
.942 |
.926 |
.066 |
— |
|||
Threshold |
645.42*** |
80 |
.095 |
.94 |
.933 |
.066 |
configural v. threshold |
5.22 |
10 |
.002 |
|
Loading |
601.67*** |
89 |
.085 |
.946 |
.945 |
.067 |
threshold v. loading |
11.09 |
9 |
.006 |
|
Intercept |
597.72*** |
98 |
.080 |
.947 |
.951 |
.067 |
loading v. intercept |
26.87** |
9 |
.001 |
|
DSM Conduct Disorder Symptom Scale |
Configural |
595.10*** |
180 |
.054 |
.946 |
.937 |
.091 |
— |
|||
Threshold |
616.51*** |
194 |
.052 |
.945 |
.940 |
.091 |
configural v. threshold |
13.70 |
14 |
.001 |
|
Loading |
601.42*** |
208 |
.049 |
.949 |
.948 |
.091 |
threshold v. loading |
9.94 |
14 |
.004 |
|
Intercept |
558.75*** |
222 |
.044 |
.956 |
.958 |
.094 |
loading v. intercept |
21.34 |
14 |
.007 |
Note. N = 788 males; N = 796 females. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001.
In addition to MI results, DTF analyses were conducted to explore the invariance of the Conners 4 for males and females through a different framework. An example of a DTF graph is provided in Figure 10.1. Test functioning curves for males and females are depicted, along with a shaded band to display a 95% confidence interval, and the two groups’ curves are almost completely overlapping, demonstrating a lack of difference for the Inattention/Executive Dysfunction scale. Similar findings were found for all scales across all forms in terms of gender.
The effect size of the DTF analyses for all scales, as measured by the ETSSD, are summarized in Table 10.4. There was a small effect of gender on the DSM Conduct Disorder Symptoms scale for Teacher (ETSSD = -.23). The value is negative, indicating females would score slightly higher than males when females and males actually had an equal standing in terms of Conduct Disorder symptoms as a construct. The test-level effect appears to result from an accumulation of negligible to small effects on the test items, and together with the small size of this effect, there is little support for a concerning lack of invariance on this scale. All other differences were trivial in nature across the Conners 4 scales and across all forms, demonstrating invariance by gender.
Figure 10.1. Differential Test Functioning by Gender: Inattention/Executive Dysfunction
Click to expand |
Table 10.4. Differential Test Functioning Effect Sizes by Gender
Scale |
Parent |
Teacher |
Self-Report |
|
Content Scales |
Inattention/Executive Dysfunction |
.00 |
.01 |
−.02 |
Hyperactivity |
.02 |
.06 |
.01 |
|
Impulsivity |
.01 |
.00 |
−.03 |
|
Emotional Dysregulation |
.01 |
.01 |
−.08 |
|
Depressed Mood |
−.01 |
.01 |
−.02 |
|
Anxious Thoughts |
−.02 |
−.03 |
−.03 |
|
Impairment & Functional Outcome Scales |
Schoolwork |
−.03 |
−.05 |
−.03 |
Peer Interactions |
.01 |
−.08 |
.05 |
|
Family Life |
−.03 |
— |
−.12 |
|
DSM Symptom Scales |
Oppositional Defiant Disorder Symptoms |
−.01 |
.00 |
.10 |
Conduct Disorder Symptoms |
−.01 |
−.23 |
.01 |
Note. Values presented are expected test score standardized differences (ETSSD); guidelines for interpreting |ETSSD|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. Positive ETSSD values indicate that female youth received higher scores than male youth who had the same level of the construct being measured.
Taken together, results from both MI and DTF analyses indicate psychometric equivalence between males and females for the Conners 4 scales, as there was no strong evidence for meaningful differences in terms of latent structure nor in terms of test functioning between the two gender groups. Although there was a small effect for scores from the DSM Conduct Disorder Symptoms scale on the Teacher form when examined with DTF, the effect was small, likely stemming from the accumulated trivial effects at the item-level. The effect was not corroborated through MI analyses, which investigated a similar question in a slightly different method.
To examine observed group differences between gender, a subsample of male youth were selected at random to match a sample of females from the Normative Samples. Youth were matched by PEL (for Parent and Self-Report only), language(s) spoken, clinical status, race/ethnicity, and age (see Table F.36 in appendix F for the demographic characteristics of the youth being rated and Table F.37 for demographic characteristics of the parent and teacher raters).
The paired samples of males and females were then compared for significant differences across mean scores. Results of the ANOVAs and descriptive statistics for each scale are presented in Tables 10.5 to 10.7. When comparing ratings of male and female youth, the Parent results showed no statistically significant effects across all scales. Cohen’s d effect sizes, capturing the size of the difference between group means, demonstrated negligible effects (with Cohen’s d ranging from 0.00 to |0.13|). For the Teacher results, statistically significant effects were observed for all scales except Depressed Mood and Anxious Thoughts. For scales with statistically significant effects, ratings of males resulted in slightly higher scores than females and the effect sizes were negligible to small (with Cohen’s d ranging from |0.04| to |0.40|), yielding scores up to approximately 4 points higher for male students. For the Self-Report results, the only statistically significant effect observed was for the Anxious Thoughts scale, wherein females yielded slightly higher scores than males; however, the effect size was small (Cohen’s d = -0.21).
Overall, these results support the absence of meaningful gender differences, and together with the MI and DTF results, there is evidence for equivalent measurement for males and females when using the Conners 4. Additionally, for Parent and Self-Report, scores for males and females were not meaningfully different; for Teacher, some differences were observed that might reflect differences in teachers’ perceptions of students that are independent of the test. Assessors may wish to make note of these differences when interpreting scores from teacher raters. Note that Gender Specific and Combined Gender normative scoring options are available; please see chapter 3, Scoring and Reports, for details.
Click to expand |
Table 10.5. Group Differences by Gender (Male vs. Female): Conners 4 Parent
Scales |
Male |
Female |
Cohen’s d |
F |
p |
η2 |
||
Content Scales |
Inattention/Executive
|
M |
49.9 |
49.3 |
0.06 |
0.97 |
.326 |
.00 |
SD |
10.1 |
9.9 |
||||||
Hyperactivity |
M |
49.7 |
49.6 |
0.00 |
0.00 |
.950 |
.00 |
|
SD |
10.0 |
9.7 |
||||||
Impulsivity |
M |
49.4 |
49.8 |
−0.05 |
0.51 |
.477 |
.00 |
|
SD |
10.0 |
10.1 |
||||||
Emotional Dysregulation |
M |
49.8 |
49.7 |
0.00 |
0.01 |
.943 |
.00 |
|
SD |
10.4 |
9.6 |
||||||
Depressed Mood |
M |
49.6 |
49.8 |
−0.02 |
0.10 |
.753 |
.00 |
|
SD |
9.8 |
9.8 |
||||||
Anxious Thoughts |
M |
49.1 |
50.1 |
−0.11 |
2.87 |
.090 |
.00 |
|
SD |
9.2 |
9.8 |
||||||
Impairment & Functional Outcome Scales |
Schoolwork |
M |
50.5 |
49.2 |
0.13 |
4.17 |
.041 |
.00 |
SD |
10.6 |
9.8 |
||||||
Peer Interactions |
M |
50.3 |
49.5 |
0.08 |
1.43 |
.231 |
.00 |
|
SD |
10.5 |
9.4 |
||||||
Family Life |
M |
49.8 |
49.8 |
0.00 |
0.01 |
.940 |
.00 |
|
SD |
10.1 |
9.9 |
||||||
DSM Symptom Scales |
Oppositional Defiant Disorder Symptoms |
M |
50.2 |
49.4 |
0.08 |
1.63 |
.202 |
.00 |
SD |
10.6 |
9.5 |
||||||
Conduct Disorder Symptoms |
M |
49.9 |
50.0 |
−0.01 |
0.03 |
.862 |
.00 |
|
SD |
10.2 |
10.6 |
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .05; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen’s |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen’s d value indicates that ratings of males resulted in higher scores than ratings of females.
Click to expand |
Table 10.6. Group Differences by Gender (Male vs. Female): Conners 4 Teacher
Scale |
Male |
Female |
Cohen’s d |
F |
p |
η2 |
||
Content Scales |
Inattention/Executive
|
M |
51.8 |
47.9 |
0.40 |
52.53 |
< .001 |
.04 |
SD |
10.5 |
8.9 |
||||||
Hyperactivity |
M |
51.8 |
47.9 |
0.40 |
50.76 |
< .001 |
.04 |
|
SD |
10.7 |
8.8 |
||||||
Impulsivity |
M |
51.4 |
48.5 |
0.29 |
26.70 |
< .001 |
.02 |
|
SD |
10.5 |
9.3 |
||||||
Emotional Dysregulation |
M |
50.7 |
49.1 |
0.16 |
7.79 |
.005 |
.01 |
|
SD |
10.8 |
9.2 |
||||||
Depressed Mood |
M |
50.5 |
49.5 |
0.10 |
3.14 |
.077 |
.00 |
|
SD |
10.6 |
9.6 |
||||||
Anxious Thoughts |
M |
49.8 |
50.2 |
−0.04 |
0.59 |
.443 |
.00 |
|
SD |
10.0 |
10.0 |
||||||
Impairment & Functional Outcome Scales |
Schoolwork |
M |
51.9 |
47.9 |
0.40 |
51.75 |
< .001 |
.04 |
SD |
10.7 |
8.9 |
||||||
Peer Interactions |
M |
50.9 |
48.7 |
0.22 |
15.47 |
< .001 |
.01 |
|
SD |
10.6 |
9.2 |
||||||
DSM Symptom Scales |
Oppositional Defiant Disorder Symptoms |
M |
51.2 |
48.6 |
0.26 |
22.21 |
< .001 |
.02 |
SD |
11.0 |
9.0 |
||||||
Conduct Disorder Symptoms |
M |
51.0 |
49.1 |
0.18 |
10.42 |
.001 |
.01 |
|
SD |
11.6 |
9.0 |
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .05; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen’s |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen’s d value indicates that ratings of males resulted in higher scores than ratings of females.
Click to expand |
Table 10.7. Group Differences by Gender (Male vs. Female): Conners 4 Self-Report
Scale |
Male |
Female |
Cohen’s d |
F |
p |
η2 |
||
Content Scales |
Inattention/Executive
|
M |
50.9 |
49.5 |
0.15 |
3.46 |
.063 |
.01 |
SD |
10.6 |
9.3 |
||||||
Hyperactivity |
M |
51.2 |
49.8 |
0.14 |
3.13 |
.077 |
.00 |
|
SD |
10.6 |
9.5 |
||||||
Impulsivity |
M |
50.7 |
50.0 |
0.06 |
0.60 |
.438 |
.00 |
|
SD |
10.9 |
9.9 |
||||||
Emotional Dysregulation |
M |
50.5 |
49.9 |
0.05 |
0.48 |
.488 |
.00 |
|
SD |
10.4 |
9.6 |
||||||
Depressed Mood |
M |
49.6 |
50.2 |
−0.06 |
0.64 |
.426 |
.00 |
|
SD |
9.4 |
10.4 |
||||||
Anxious Thoughts |
M |
49.0 |
51.1 |
−0.21 |
6.98 |
.008 |
.01 |
|
SD |
9.1 |
10.2 |
||||||
Impairment & Functional Outcome Scales |
Schoolwork |
M |
50.9 |
49.2 |
0.17 |
4.80 |
.029 |
.01 |
SD |
10.5 |
9.2 |
||||||
Peer Interactions |
M |
50.5 |
49.9 |
0.06 |
0.62 |
.430 |
.00 |
|
SD |
9.9 |
10.0 |
||||||
Family Life |
M |
50.2 |
50.2 |
0.00 |
0.00 |
.974 |
.00 |
|
SD |
10.7 |
10.3 |
||||||
DSM Symptom Scales |
Oppositional Defiant Disorder Symptoms |
M |
51.1 |
49.5 |
0.16 |
4.07 |
.044 |
.01 |
SD |
11.0 |
9.3 |
||||||
Conduct Disorder Symptoms |
M |
51.2 |
49.5 |
0.16 |
4.02 |
.045 |
.01 |
|
SD |
12.2 |
9.5 |
Note. Guidelines for interpreting η2: negligible effect size < .01; small effect size = .01 to .05; medium effect size = .06 to .13; large effect size ≥ .14. Guidelines for interpreting Cohen’s |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen’s d value indicates that males had higher scores than females.
< Back | Next > |