CAARS 2 Manual Appendix N: Between-Subjects Measurement Invariance — Translation Study Analyses |
In addition to the within-subject invariance approach outlined in Content and DSM Symptom Scales in
chapter 13,
Translations, a between-subjects approach was taken to examine whether ratings from the French
(Canada) or
Spanish
(North American) language versions of the Conners Adult ADHD Rating Scale 2nd Edition (CAARS™ 2) differed from a
demographically matched sample taken from the English Normative Sample (individuals were matched on gender, age,
race/ethnicity, and education level). Both a multiple-group confirmatory factor analysis (CFA) and an item-response
theory (IRT; specifically differential test functioning [DTF]) approach were taken for the between-subject sample.
Details on the methodology for these analyses are covered in appendix M,
Methods of Evaluating
Measurement Bias.
Results
Results of the between-subjects MI analyses for the French and Spanish translation are presented in Table N.1 to
Table N.4. Overall, the models were found to be invariant between the French and English
and Spanish and English
versions of the forms, as evidenced by non-decreasing ΔCFI values and nonsignificant Satorra-Bentler chi-square
tests. As part of the modeling procedure, some steps required partial invariance adjustments to result in
nonsignificant chi-square tests, but the adjustments were infrequent and did not compromise the overall
comparability of the scales between the language versions (Dimitrov, 2010). The results reflect the findings from
the within-subject MI analysis reported in chapter 13,
Translations, and provide additional evidence
for the
validity of the CAARS 2 French (Canada) and Spanish (North American) translations as parallel measures to the
English version.
Table N.1. Between-Subjects Measurement Invariance by Language Version (French vs. English): CAARS 2 Self-Report
Scale
|
Invariance Model
|
χ2
|
df
|
RMSEA
|
CFI
|
TLI
|
SRMR
|
Satorra-Bentler χ2
|
df
|
ΔCFI
|
Inattention/Executive Dysfunction
|
Configural
|
1733.34***
|
810
|
.066
|
.954
|
.951
|
.070
|
--
|
Weak
|
1756.82***
|
839
|
.065
|
.955
|
.953
|
.070
|
22.72
|
29
|
.002
|
Strong
|
1755.77***
|
868
|
.063
|
.956
|
.956
|
.071
|
36.25
|
29
|
.003
|
Strict
|
1760.44***
|
891
|
.061
|
.957
|
.958
|
.071
|
33.90
|
23
|
.002
|
Hyperactivity
|
Configural
|
568.80***
|
130
|
.114
|
.937
|
.925
|
.087
|
--
|
Weak
|
583.69***
|
143
|
.109
|
.937
|
.931
|
.087
|
9.01
|
13
|
.006
|
Strong
|
570.87***
|
155
|
.101
|
.941
|
.940
|
.087
|
15.73
|
12
|
.009
|
Strict
|
563.76***
|
162
|
.098
|
.943
|
.945
|
.087
|
11.10
|
7
|
.005
|
Impulsivity
|
Configural
|
454.06***
|
130
|
.098
|
.924
|
.909
|
.079
|
--
|
Weak
|
475.31***
|
143
|
.094
|
.922
|
.915
|
.079
|
18.18
|
13
|
.006
|
Strong
|
465.68***
|
155
|
.088
|
.927
|
.927
|
.079
|
17.98
|
12
|
.012
|
Strict
|
468.31***
|
167
|
.083
|
.930
|
.934
|
.081
|
21.00
|
12
|
.012
|
Emotional Dysregulation
|
Configural
|
174.254***
|
54
|
.092
|
.986
|
.981
|
.045
|
--
|
Weak
|
180.742***
|
63
|
.085
|
.986
|
.984
|
.045
|
5.37
|
9
|
.003
|
Strong
|
178.013***
|
71
|
.076
|
.987
|
.987
|
.045
|
8.29
|
8
|
.003
|
Strict
|
186.31***
|
77
|
.074
|
.987
|
.988
|
.045
|
11.51
|
6
|
.001
|
Negative Self-Concept
|
Configural
|
141.43***
|
28
|
.125
|
.981
|
.971
|
.044
|
--
|
Weak
|
154.51***
|
35
|
.114
|
.980
|
.976
|
.044
|
7.87
|
7
|
.005
|
Strong
|
145.50***
|
41
|
.099
|
.982
|
.982
|
.045
|
3.44
|
6
|
.006
|
Strict
|
150.33***
|
44
|
.096
|
.982
|
.983
|
.045
|
6.12
|
3
|
.001
|
Note.
N = 274 French version; N = 274 English version. RMSEA = Root mean square error of approximation; CFI
= Comparative
Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p
< .05,
**p < .01, ***p < .001. Exploration of partial invariance models for the Inattention/Executive
Dysfunction,
Hyperactivity, Emotional Dysregulation, and Negative Self-Concept scale revealed that six, five, three, and two
intercepts had to be released, respectively, for the strict invariance hypothesis to hold.
Table N.2. Between-Subjects Measurement Invariance by Language Version (French vs. English): CAARS 2 Observer
Scale
|
Invariance Model
|
χ2
|
df
|
RMSEA
|
CFI
|
TLI
|
SRMR
|
Satorra-Bentler χ2
|
df
|
ΔCFI
|
Inattention/Executive
Dysfunction
|
Configural
|
1255.13***
|
810
|
.054
|
.975
|
.974
|
.065
|
--
|
Weak
|
1279.74***
|
839
|
.052
|
.976
|
.975
|
.065
|
29.14
|
29
|
.001
|
Strong
|
1281.81***
|
868
|
.050
|
.977
|
.977
|
.065
|
22.98
|
29
|
.001
|
Strict
|
1301.93***
|
893
|
.049
|
.977
|
.978
|
.065
|
37.15
|
25
|
.000
|
Hyperactivity
|
Configural
|
510.17***
|
130
|
.124
|
.933
|
.920
|
.101
|
--
|
Weak
|
528.25***
|
142
|
.119
|
.932
|
.925
|
.101
|
10.45
|
12
|
-.001
|
Strong
|
532.32***
|
153
|
.114
|
.933
|
.932
|
.101
|
14.44
|
11
|
.001
|
Strict
|
523.63***
|
165
|
.107
|
.937
|
.940
|
.101
|
17.12
|
12
|
.004
|
Impulsivity
|
Configural
|
393.54***
|
130
|
.103
|
.955
|
.946
|
.074
|
--
|
Weak
|
414.26***
|
143
|
.100
|
.954
|
.949
|
074
|
18.93
|
13
|
-.001
|
Strong
|
410.87***
|
155
|
.093
|
.956
|
.956
|
075
|
13.13
|
12
|
.002
|
Strict
|
412.59***
|
165
|
.089
|
.958
|
.960
|
075
|
14.50
|
10
|
.002
|
Emotional Dysregulation
|
Configural
|
255.60***
|
54
|
.140
|
.972
|
.963
|
.062
|
--
|
Weak
|
265.98***
|
63
|
.130
|
.972
|
.968
|
.062
|
6.77
|
9
|
.000
|
Strong
|
261.91***
|
71
|
.119
|
.974
|
.973
|
.062
|
8.62
|
8
|
.002
|
Strict
|
262.41***
|
78
|
.111
|
.974
|
.976
|
.062
|
10.74
|
7
|
.000
|
Negative Self-Concept
|
Configural
|
68.69***
|
28
|
.087
|
.986
|
.978
|
.048
|
--
|
Weak
|
74.54***
|
35
|
.077
|
.986
|
.983
|
.048
|
4.97
|
7
|
.000
|
Strong
|
83.49***
|
40
|
.075
|
.985
|
.984
|
.048
|
9.73
|
5
|
-.001
|
Strict
|
91.69***
|
46
|
.072
|
.984
|
.985
|
.048
|
10.67
|
6
|
-.001
|
Note.
N = 195 French version; N = 195 English version. RMSEA = Root mean square error of approximation; CFI
= Comparative
Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p
< .05,
**p < .01, ***p < .001. Exploration of partial invariance models for the Hyperactivity and
Negative Self-Concept
scale revealed that one factor loading in each scale had to be released to meet the strong invariance
hypothesis. To meet the strict invariance hypothesis, four item intercepts had to be released for the
Inattention/Executive Dysfunction Scale, two for the Impulsivity Scale, and one for the Emotional Dysregulation
Scale.
Table N.3. Between-Subjects Measurement Invariance by Language Version (Spanish vs. English): CAARS 2 Self-Report
Scale
|
Invariance Model
|
χ2
|
df
|
RMSEA
|
CFI
|
TLI
|
SRMR
|
Satorra-Bentler χ2
|
df
|
ΔCFI
|
Inattention/Executive
Dysfunction
|
Configural
|
1441.51***
|
810
|
.053
|
.974
|
.972
|
.056
|
--
|
Weak
|
1465.66***
|
840
|
.051
|
.974
|
.973
|
.056
|
30.15
|
30
|
.000
|
Strong
|
1468.67***
|
869
|
.050
|
.975
|
.975
|
.056
|
33.77
|
29
|
.001
|
Strict
|
1475.00***
|
896
|
.048
|
.976
|
.977
|
.056
|
36.47
|
27
|
.001
|
Hyperactivity
|
Configural
|
501.77***
|
130
|
.101
|
.944
|
.933
|
.077
|
--
|
Weak
|
514.84***
|
141
|
.097
|
.944
|
.938
|
.077
|
11.87
|
11
|
.000
|
Strong
|
510.05***
|
153
|
.091
|
.946
|
.945
|
.077
|
19.20
|
12
|
.002
|
Strict
|
499.11***
|
163
|
.086
|
.949
|
.951
|
.078
|
14.93
|
10
|
.003
|
Impulsivity
|
Configural
|
363.97***
|
130
|
.080
|
.956
|
.947
|
.066
|
--
|
Weak
|
374.31***
|
143
|
.076
|
.956
|
.952
|
.066
|
6.97
|
13
|
.000
|
Strong
|
370.15***
|
155
|
.070
|
.959
|
.959
|
.067
|
15.56
|
12
|
.003
|
Strict
|
368.81***
|
167
|
.066
|
.962
|
.964
|
.068
|
16.43
|
12
|
.003
|
Emotional Dysregulation
|
Configural
|
218.84***
|
54
|
.104
|
.969
|
.959
|
.054
|
--
|
Weak
|
229.38***
|
63
|
.097
|
.969
|
.965
|
.054
|
6.90
|
9
|
.000
|
Strong
|
223.60***
|
71
|
.087
|
.972
|
.971
|
.054
|
10.49
|
8
|
.003
|
Strict
|
223.39***
|
79
|
.081
|
.973
|
.975
|
.055
|
12.17
|
8
|
.001
|
Negative Self-Concept
|
Configural
|
94.63***
|
28
|
.092
|
.986
|
.979
|
.046
|
--
|
Weak
|
103.54***
|
35
|
.083
|
.986
|
.983
|
.046
|
6.76
|
7
|
.000
|
Strong
|
95.02***
|
41
|
.068
|
.989
|
.988
|
.047
|
5.00
|
6
|
.003
|
Strict
|
95.84***
|
45
|
.063
|
.989
|
.990
|
.047
|
4.64
|
4
|
.000
|
Note.
N = 283 Spanish version; N = 283 English version. RMSEA = Root mean square error of approximation; CFI
= Comparative
Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p
< .05,
**p < .01, ***p < .001. Exploration of partial invariance models found that two item intercepts
had to be
released for the Inattention/Executive Dysfunction, Hyperactivity, and Negative Self-Concept scale for the
strict invariance model to hold.
Table N.4. Between-Subjects Measurement Invariance by Language Version (Spanish vs. English): CAARS 2 Observer
Scale
|
Invariance Model
|
χ2
|
df
|
RMSEA
|
CFI
|
TLI
|
SRMR
|
Satorra-Bentler χ2
|
df
|
ΔCFI
|
Inattention/Executive
Dysfunction
|
Configural
|
1366.89***
|
810
|
.055
|
.980
|
.979
|
.056
|
--
|
Weak
|
1394.48***
|
840
|
.054
|
.980
|
.980
|
.056
|
31.37
|
30
|
.000
|
Strong
|
1395.46***
|
869
|
.052
|
.982
|
.981
|
.056
|
22.75
|
29
|
.002
|
Strict
|
1407.05***
|
896
|
.050
|
.982
|
.983
|
.056
|
37.69
|
27
|
.000
|
Hyperactivity
|
Configural
|
445.78***
|
130
|
.103
|
.965
|
.958
|
.073
|
--
|
Weak
|
461.34***
|
143
|
.099
|
.964
|
.961
|
.073
|
8.59
|
13
|
-.001
|
Strong
|
455.62***
|
154
|
.093
|
.966
|
.966
|
.074
|
7.18
|
11
|
.002
|
Strict
|
450.05***
|
165
|
.087
|
.968
|
.970
|
.074
|
16.51
|
11
|
.002
|
Impulsivity
|
Configural
|
225.72***
|
130
|
.057
|
.989
|
.987
|
.047
|
--
|
Weak
|
239.64***
|
143
|
.055
|
.989
|
.988
|
.047
|
12.88
|
13
|
.000
|
Strong
|
246.99***
|
154
|
.052
|
.990
|
.989
|
.047
|
12.40
|
11
|
.001
|
Strict
|
258.85***
|
165
|
.050
|
.989
|
.990
|
.048
|
16.46
|
11
|
-.001
|
Emotional Dysregulation
|
Configural
|
219.12***
|
54
|
.116
|
.982
|
.977
|
.049
|
--
|
Weak
|
231.52***
|
63
|
.109
|
.982
|
.979
|
.049
|
9.30
|
9
|
.000
|
Strong
|
233.83***
|
71
|
.101
|
.983
|
.982
|
.049
|
9.28
|
8
|
.001
|
Strict
|
239.49***
|
77
|
.096
|
.983
|
.984
|
.049
|
10.95
|
6
|
.000
|
Negative Self-Concept
|
Configural
|
87.98***
|
28
|
.097
|
.972
|
.959
|
.059
|
--
|
Weak
|
94.01***
|
35
|
.086
|
.973
|
.967
|
.059
|
3.34
|
7
|
.001
|
Strong
|
96.59***
|
38
|
.082
|
.973
|
.970
|
.059
|
4.33
|
3
|
.000
|
Strict
|
105.44***
|
43
|
.080
|
.971
|
.972
|
.060
|
10.21
|
5
|
-.002
|
Note.
N = 230 Spanish version; N = 230 English version. RMSEA = Root mean square error of approximation; CFI
= Comparative
Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p
< .05,
**p < .01, ***p < .001. Exploration of partial invariance models found that one loading had to
be released for
the Hyperactivity and Impulsivity scale, and three loadings had to be released for the Negative Self-Concept
scale for the strong invariance model to hold. Further, one item intercept had to be released for the
Hyperactivity, Impulsivity, and Negative Self-Concept scales, and two item intercepts for the
Inattention/Executive Dysfunction and Emotional Dysregulation scales for strict invariance to hold.
In addition to the MI analyses, DTF was also examined for Content Scales. Across both the French and English and
Spanish and English comparisons, test characteristic curves for Content Scales were found to be statistically
equivalent as evidenced by overlapping 95% confidence intervals (see Figure N.1 and
Figure
N.2 for examples
featuring the Inattention/Executive Dysfunction scale). More specifically, considerable overlap occurred in the area
approaching and exceeding 1.5 standard deviations above mean theta (see Test Information in chapter 8, Reliability,
for more information on how these graphs are interpreted). This pattern of results demonstrates a lack of difference
between the functioning of Content Scales for the French and English and Spanish and English language versions, a
finding further supported by the negligible effect sizes presented in Table N.5 and Table
N.6.
Table N.5. Differential Test Functioning by Language Version (French vs. English)
Scale
|
Self-Report
|
Observer
|
Inattention/Executive
Dysfunction
|
-0.03
|
-0.05
|
Hyperactivity
|
-0.11
|
-0.02
|
Impulsivity
|
-0.03
|
-0.03
|
Emotional Dysregulation
|
0.00
|
-0.11
|
Negative Self-Concept
|
0.02
|
0.10
|
Note.
Values presented are Expected Test Score Standardized Differences (ETSSD); guidelines for interpretation: small
effect size ≥ |0.20|; medium effect size ≥ |0.50|; large effect size ≥ |0.80|. Positive ETSSD values indicate that
individuals with equal amounts of the constructs being measured who took the French translation as part of the
translation study sample scored higher than individuals who took the English version.
Table N.6. Differential Test Functioning by Language Version (Spanish vs. English)
Scale
|
Self-Report
|
Observer
|
Inattention/Executive
Dysfunction
|
-0.02
|
0.00
|
Hyperactivity
|
-0.01
|
-0.03
|
Impulsivity
|
-0.07
|
0.01
|
Emotional Dysregulation
|
-0.11
|
0.00
|
Negative Self-Concept
|
-0.05
|
0.11
|
Note.
Values presented are expected test score standardized differences (ETSSD); guidelines for interpretation: small
effect size ≥ |0.20|; medium effect size ≥ |0.50|; large effect size ≥ |0.80|. Positive ETSSD values indicate that
individuals with equal amounts of the constructs being measured who took the Spanish translation as part of the
translation study sample scored higher than individuals who took the English version.
Both the muliple-group CFA and DTF approach support the invariance of the factor structure for Content Scales on the
CAARS 2 French and Spanish versions. This between-subject approach adds support to the within-subject approach
presented in chapter 13, Translations.