Chapter 13: Translations

Manual

CAARS 2 Manual

Chapter 13: French (Canada) Translation Study

French (Canada) Translation Study

view all chapter tables | print this section

Sample
Reliability
- Internal Consistency
- Test Information
Validity
Summary

Sample

A total of 307 Canadians completed the CAARS 2 Self-Report, and 215 Canadians completed the CAARS 2 Observer. Data were cleaned prior to analysis, based on data quality metrics, including indicators of careless responding (e.g., random responding) and response acquiescence (e.g., an irregular number of consecutive item responses). Data for 33 individuals were removed from the Self-Report sample and those for 20 individuals were removed from the Observer sample. The final samples included 274 individuals for Self-Report and 195 for Observer. Refer to appendix J for the demographic characteristics for the individuals being rated and for the Observers.

Reliability

The reliability of the French version of the CAARS 2 was assessed via internal consistency estimates and test information functions (see Internal Consistency and Standard Error of Measurement and Test Information in chapter 8, Reliability, for more details). Coefficients alpha and omega were used as estimates of internal consistency and information functions were generated using the mirt package in R (Chalmers, 2012).

Internal Consistency

Table 13.1 presents alpha and omega coefficients for the French version of the CAARS 2. Internal consistency estimates were excellent for both the Self-Report and Observer form; the median coefficient omega value across scales for the Self-Report was .94 (ranging from .92 to .97) and for the Observer was .96 (ranging from .89 to .98). These estimates are comparable to those found for the English version from the Normative Samples (see Tables 8.1a, 8.1b, 8.2a, and 8.2b in chapter 8, Reliability), as well as to values derived from the English version completed by individuals in the current sample (estimates are not reported as all scales were within .01 of each other across language versions). Overall, results show that the French version of the CAARS 2 is internally consistent and comparable to the English version.

Click to expand

Table 13.1. Internal Consistency: French Translation of the CAARS 2

Scale		Self-Report		Observer
Scale		Alpha	Omega	Alpha	Omega
Content Scales	Inattention/Executive Dysfunction	.97	.97	.98	.98
	Hyperactivity	.94	.94	.96	.96
	Impulsivity	.92	.92	.95	.95
	Emotional Dysregulation	.94	.94	.95	.96
	Negative Self-Concept	.91	.92	.88	.89
DSM Symptom Scales	ADHD Inattentive Symptoms	.94	.94	.95	.95
	ADHD Hyperactive/Impulsive Symptoms	.92	.93	.95	.95
	ADHD Total Symptoms	.96	.96	.97	.97

Note. Self-Report N = 274 and Observer N = 195.

Test Information

Figure 13.1 shows the test information functions for the CAARS 2 by Content and DSM Symptom Scale. As can be seen in the figure, the French versions of the scales display high information across the relevant range of the ability scale (approaching and exceeding 1.5 standard deviations above the mean). Further, the peaks of the information curves are broad with a wide area beneath them, implying the precision of measurement remains consistent across the relevant range of the scales. The peaks of the information functions are also equal to or greater than a value of 10, indicating very high precision and excellent reliability for the French version of the CAARS 2. These information functions are similar in overall shape and magnitude to the English version (see Test Information in chapter 8, Reliability).

Click to expand

Figure 13.1. Test Information Functions by Scale: CAARS 2 French Translation

a) Inattention/Executive Dysfunction

b) Hyperactivity

c) Impulsivity

d) Emotional Dysregulation

e) Negative Self-Concept

f) DSM ADHD Inattentive Symptoms

g) DSM Hyperactive/Impulsive Symptoms

h) DSM ADHD Total Symptoms

Validity

An important goal of this study was to ensure that the CAARS 2 item content was parallel across the English and French versions of the forms. As described in Creation of Translated Forms, a cultural translation was used to ensure the meaning of each item was captured in the translation, rather than just the literal wording. The expectation was that the French translated items should perform similarly to their English counterparts. Evidence of validity was explored for the Content Scales in terms of consistency in the factor structure across languages (tested via measurement invariance [MI] methods). Given the within-subjects design (in which individuals completed the CAARS 2 in both languages), a within-subjects MI approach was taken (Liu et al., 2017; however, results are also available for a between-subjects approach involving the Normative Samples; see appendix N for these results). After invariance was established, correlations were computed, and mean scale scores were compared for both the Content and DSM Symptom Scales to examine whether scores were consistent between the French and English versions. The CAARS 2–ADHD Index, Associated Clinical Concern Items, Impairment & Functional Outcome Items, and Validity Scales were analyzed for differences between the French and English translations. Results for all analyses are reported in the following sections.

Content and DSM Symptom Scales

The factor structure of the Content Scales was compared via a within-subjects MI approach (note that this differs from the methodology described in appendix M), in accordance with recommendations from Liu et al. (2017). Conducting MI with this approach involves testing the following four steps, in order:

Configural Invariance: Language versions have the same factor structure.
Weak Invariance: Language versions have the same factor structure and factor loadings,
Strong Invariance: Language versions have the same factor structure, factor loadings, and item thresholds.
Strict Invariance: Language versions have the same factor structure, factor loadings, item thresholds, and item residuals.

Results of the MI investigations are presented in Table 13.2 and Table 13.3. Overall, the CAARS 2 Content Scales were found to be invariant across the French and English versions, as evidenced by non-decreasing CFI values and nonsignificant Satorra-Bentler chi-square tests (Satorra & Bentler, 2001). As part of the modeling procedure, some steps required partial invariance adjustments to result in nonsignificant chi-square tests, but the adjustments were infrequent and did not compromise the overall comparability of the scales between the two languages (Dimitrov, 2010). The results provide strong evidence for the validity of the CAARS 2 French translation as a parallel measure to the English version.

Click to expand

Table 13.2. Within-Subjects Measurement Invariance by Language Version (French vs. English): CAARS 2 Self-Report

Scale	Invariance Model	χ²	df	RMSEA	CFI	TLI	SRMR	Satorra-Bentler χ²	df	ΔCFI
Inattention/Executive Dysfunction	Configural	2666.97***	1679	.046	.960	.957	.070	--
	Weak	2689.54***	1708	.046	.961	.958	.070	41.63	29	.000
	Strong	2733.17***	1766	.045	.961	.960	.070	69.16	58	.000
	Strict	2622.17***	1794	.041	.966	.967	.072	35.25	28	.005
Hyperactivity	Configural	633.34***	285	.067	.965	.958	.070	--
	Weak	632.81***	297	.064	.966	.961	.070	7.81	12	.001
	Strong	642.28***	322	.060	.968	.966	.070	31.82	25	.002
	Strict	623.12***	334	.056	.970	.970	.073	20.12	12	.002
Impulsivity	Configural	688.05***	285	.072	.948	.938	.073	--
	Weak	686.96***	296	.070	.950	.942	.073	7.66	11	.002
	Strong	698.19***	322	.065	.952	.949	.073	29.12	26	.002
	Strict	676.44***	335	.061	.956	.956	.077	20.73	13	.004
Emotional Dysregulation	Configural	283.10***	125	.068	.983	.978	.046	--
	Weak	288.52***	133	.065	.984	.980	.046	10.10	8	.001
	Strong	291.61***	150	.059	.985	.984	.047	14.06	17	.001
	Strict	270.69***	159	.051	.988	.988	.049	9.18	9	.003
Negative Self-Concept	Configural	247.35***	69	.097	.981	.972	.050	--
	Weak	259.48***	75	.095	.980	.973	.050	12.04	6	.000
	Strong	270.82***	88	.087	.980	.977	.050	11.03	13	.000
	Strict	254.37***	94	.079	.981	.981	.052	10.17	6	.001

Note. N = 274 French version; N = 274 English version. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001. Exploration of partial invariance models at the Strict invariance step for the Inattention/Executive Dysfunction, Hyperactivity, and Negative Self-Concept scales showed that releasing one (Hyperactivity and Negative Self-Concept) or two (Inattention/Executive Dysfunction) item residuals resulted in partial invariance results being achieved.

Click to expand

Table 13.3. Within-Subjects Measurement Invariance by Language Version (French vs. English): CAARS 2 Observer

Scale	Invariance Model	χ²	df	RMSEA	CFI	TLI	SRMR	Satorra-Bentler χ²	df	ΔCFI
Inattention/Executive Dysfunction	Configural	2325.64***	1679	.045	.975	.973	.065	--
	Weak	2347.82***	1708	.044	.975	.973	.065	30.09	29	.000
	Strong	2389.34***	1766	.043	.976	.975	.065	58.90	58	.001
	Strict	2312.57***	1796	.039	.980	.980	.070	39.41	30	.004
Hyperactivity	Configural	531.95***	285	.067	.972	.967	.068	--
	Weak	539.93***	297	.065	.973	.969	.068	9.86	12	.000
	Strong	558.69***	321	.062	.973	.972	.069	25.66	24	.001
	Strict	548.94***	333	.058	.976	.975	.073	16.52	12	.003
Impulsivity	Configural	610.89***	285	.077	.966	.960	.071	--
	Weak	623.59***	297	.075	.966	.961	.071	18.22	12	.000
	Strong	640.95***	321	.072	.967	.965	.072	24.81	24	.001
	Strict	628.81***	334	.067	.970	.969	.074	22.36	13	.003
Emotional Dysregulation	Configural	301.95***	125	.085	.986	.981	.048	--
	Weak	305.21***	133	.082	.986	.983	.048	6.21	8	.000
	Strong	320.59***	150	.077	.986	.985	.048	24.89	17	.000
	Strict	287.61***	159	.065	.990	.989	.052	10.78	9	.003
Negative Self-Concept	Configural	107.18***	69	.053	.992	.989	.046	--
	Weak	114.93***	75	.052	.992	.990	.046	9.58	6	.000
	Strong	127.57***	88	.048	.992	.991	.048	16.37	13	.000
	Strict	139.46***	95	.049	.991	.991	.055	13.84	7	.001

Note. N = 195 French version; N = 195 English version. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001. Exploration of partial invariance models for the Hyperactivity scale showed that one item residual had to be released to meet the strict invariance hypothesis.

Results from the correlation and mean group difference analyses are presented in Table 13.4 and Table 13.5. Corrected correlations (Sackett et al., 2000) revealed statistically significant and very strong relationships across scales between the French and English versions of the CAARS 2, with a median correlation of .91 for the Self-Report (range .90 to .95) and .89 for the Observer (range .86 to .91). Further, Welch’s paired t-tests (Welch, 1947) showed no statistically significant differences between scales when comparing obtained scores on the two language versions (p < 0.01); Cohen’s d effect sizes were also negligible (maximum Cohen’s d = 0.14). Taken together, the strong correlations between scales across language versions, and the lack of statistical and practical differences in mean scale scores, indicates that similar scores can be expected between the French and English versions. These results provide evidence for the validity of the French version of the CAARS 2.

Click to expand

Table 13.4. Correlations and Mean Differences by Language Version (French vs. English): CAARS 2 Self-Report

Scale	Correlations		French		English		Paired t-tests
Scale	Obtained r	Corrected r	M	SD	M	SD	Cohen's d	t (273)	p
Inattention/Executive Dysfunction	.91	.95	49.5	8.2	49.3	8.2	0.05	0.88	.378
Hyperactivity	.84	.90	47.9	8.6	48.5	8.7	0.11	-1.87	.063
Impulsivity	.87	.90	48.5	9.0	48.8	9.4	0.06	-0.99	.323
Emotional Dysregulation	.86	.90	48.1	9.1	48.4	9.1	0.05	-0.90	.368
Negative Self-Concept	.88	.91	50.4	9.2	50.1	8.9	0.07	1.13	.258
DSM ADHD Inattentive Symptoms	.89	.95	49.5	8.1	49.1	7.9	0.11	1.84	.067
DSM ADHD Hyperactive/Impulsive Symptoms	.84	.91	48.1	8.5	48.7	8.7	0.13	-2.18	.030
DSM Total ADHD Symptoms	.89	.95	48.7	8.1	48.8	8.0	0.03	-0.43	.668

Note. N = 274. All r significant, p < .001. Guidelines for interpreting |r|: very weak < .20; weak = .20 to .39; moderate = .40 to .59; strong = .60 to .79; very strong ≥ .80. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for the French version than the English version.

Click to expand

Table 13.5. Correlations and Mean Differences by Form Language (French vs. English): CAARS 2 Observer

Scale	Correlations		French		English		Paired t-tests
Scale	Obtained r	Corrected r	M	SD	M	SD	Cohen's d	t (193)	p
Inattention/Executive Dysfunction	.90	.91	49.5	9.2	49.3	10.2	0.03	0.48	.635
Hyperactivity	.88	.87	48.9	9.9	49.3	10.5	0.08	-1.13	.262
Impulsivity	.88	.88	49.6	9.8	49.3	10.7	0.05	0.69	.494
Emotional Dysregulation	.91	.91	50.8	9.9	50.2	10.2	0.14	1.98	.049
Negative Self-Concept	.84	.91	50.9	8.2	51.3	8.5	0.08	-1.11	.267
DSM ADHD Inattentive Symptoms	.87	.89	50.3	9.3	49.9	10.0	0.09	1.20	.232
DSM ADHD Hyperactive/Impulsive Symptoms	.87	.86	49.6	10.0	49.6	10.6	0.00	-0.04	.971
DSM Total ADHD Symptoms	.89	.89	50.1	9.6	49.8	10.4	0.05	0.67	.503

Note. N = 195. All r significant, p < .001. Guidelines for interpreting |r|: very weak < .20; weak = .20 to .39; moderate = .40 to .59; strong = .60 to .79; very strong ≥ .80. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for the French version than the English version.

CAARS 2–ADHD Index

The CAARS 2–ADHD Index was examined to confirm that it performed similarly across the French and English versions (see chapter 12, CAARS 2–ADHD Index, for more information on the development, scores, and psychometric properties of the CAARS 2–ADHD Index). The probability scores for the French version of the CAARS 2–ADHD Index were compared to the English version using the Wilcoxon Signed Rank Test (note that this non-parametric approach was favored, as the probability score does not follow assumptions of normality; Wilcoxon, 1945); an effect size, r, is also provided which can be interpreted using the correlation guidelines provided in this chapter (Rosenthal, 1991).

The difference in probability scores between the French and English versions was not statistically significant, and effect sizes were very weak (Self-Report: V = 7029, p = .107, r = -.10; Observer: V = 3252, p = .259, r = -.08). Thus, the CAARS 2–ADHD Index operates similarly in English as it does in French, adding to the validity evidence for the French translation.

Associated Clinical Concern Items and Impairment & Functional Outcome Items

The Associated Clinical Concern Items and Impairment & Functional Outcome Items of the CAARS 2 were also examined to ensure language versions operated similarly. To gauge this, the proportion of individuals with concordant item elevations or endorsements across language versions was calculated (see Associated Clinical Concerns: Item Selection and Scoring and Impairment & Functional Outcome Items: Item Selection and Scoring in chapter 6, Development for more information on how endorsed and elevated responses were determined for items in these scales). Item responses were considered concordant across language versions if an individual’s ratings for a given item were either elevated/endorsed or not elevated/endorsed across both languages. Conversely, if an item was elevated/endorsed on one language, but not the other, the person was considered to have discordant item elevations/endorsements. McNemar’s test (McNemar, 1947) with a continuity correction was used (yields a chi-square test statistic) to ensure item elevations/endorsements were not more frequent on one language version than the other (analyses were conducted with the stats package in R).

Results of the item-level analyses are presented in Table 13.6. For the Associated Clinical Concern Items and the Impairment & Functional Outcome Items, the percentage of individuals with concordant item endorsements or elevations was very high, with agreement above 90% for nearly all items on both forms. McNemar’s tests also showed that item elevations/endorsements were not significantly more frequent on one language version than the other (p > .01), with the exception of a single item for Self-Report (though it still displays 89.4% concordance). Taken together, the results demonstrate that elevations and endorsements on the Associated Clinical Concern Items and the Impairment & Functional Outcome Items are highly similar between the French and English versions, supporting the validity of the French translation.

Click to expand

Table 13.6. Concordance of Item Elevations/Endorsements by Language (French vs. English)

Item Set	Item Stem	Self-Report			Observer
Item Set	Item Stem	% Concordant	χ²	p	% Concordant	χ²	p
Associated Clinical Concern Items	Suicidal thoughts/attempts	97.8	1.50	.221	92.3	0.00	1.00
	Self-Injury	96.7	1.78	.182	94.4	0.36	.546
	Sadness/emptiness^*	90.5	0.04	.845	93.3	0.00	1.00
	Anxiety/Worry	85.4	0.03	.874	87.7	0.04	.838
Impairment & Functional Outcome Items	Bothered by things endorsed on the CAARS 2	91.2	0.04	.838	90.3	0.21	.646
	Things endorsed on the CAARS 2 interfere with life	89.4	13.79	< .001	88.7	3.68	.055
	Problems in romantic/marital relationship(s)	92.3	0.76	.383	91.8	0.56	.453
	Problems in relationships with family members	93.8	0.00	1.00	90.8	0.06	.814
	Problems in relationships with friends, coworkers, or neighbors	96.0	0.00	1.00	91.3	0.24	.628
	Problems at work and/or school	91.2	1.04	.307	92.8	0.07	.789
	Has a harder time with things than other people do	90.9	0.16	.689	86.7	3.12	.078
	Underachiever	89.1	0.00	1.00	93.3	0.00	1.00
	Sleep problems	93.8	0.00	1.00	87.2	0.00	1.00
	Problems with money management	92.3	1.71	.190	93.3	0.31	.579
	Neglects family or household responsibilities	90.1	1.33	.248	93.3	0.00	1.00
	Risky driving	92.0	2.23	.136	94.4	1.45	.228
	Problems due to time spent online	92.0	2.23	.136	93.3	0.31	.228

Note. The chi-square test statistic and its associated p value are for the McNemar's tests (df = 1).
^* The item stem for this Screening Item is Sadness/Emptiness for Self-Report and Sadness for Observer.

Validity Scales

The CAARS 2 Validity Scales were examined to ensure that they operated similarly in the French and English versions. For both the Negative Impression Index and the Inconsistency Index, the proportion of individuals with concordance across the French and English versions for scale elevations (that is, raw scores that exceeded the cut-off) was compared (details provided in Response Style Analysis: Item Selection and Score Creation in chapter 6, Development).

As can be seen in Table 13.7, the proportion of individuals with concordant elevations were very high (above 90%) for both the Negative Impression Index and Inconsistency Index. Further, McNemar’s tests indicated that scoring above the cut-off on one language version was not statistically significantly more likely than scoring above the cut-off on the other (p > .01). Taken together, both Validity Scales operated similarly in the French and English versions of the CAARS 2, contributing supporting evidence for the validity of the French language version

Click to expand

Table 13.7. Concordance of the Validity Scales by Language (French vs. English)

Scale	Self-Report			Observer
Scale	% Concordant	χ²	p	% Concordant	χ²	p
Negative Impression Index	92.7	0.45	.502	97.9	0.00	1.00
Inconsistency Index	93.1	0.00	1.00	91.8	0.00	1.00

Note. The chi-square test statistic and its associated p value are for the McNemar's tests (df = 1).

Summary

The reliability and validity of the French (Canada) version of the CAARS 2 was examined in a translation study where individuals completed both the French and the English versions consecutively (with order counterbalanced across individuals). Both the Self-Report and Observer forms displayed excellent internal consistency and high levels of measurement precision for all Content and DSM Symptom Scales, with coefficients and information functions comparable to those of the English version for both the current sample, as well as the normative sample. These results provide strong evidence for the reliability of the French version of the CAARS 2.

Further, it was demonstrated that the French version of the Content Scales were invariant from the English scales, indicating the measurement models of both the French and English versions of the Content Scales are statistically similar. Examination of obtained scores also supported the finding of high scale correlations and no mean differences between language versions for the Content Scales and DSM Symptom Scales. Analyses for the ADHD Index, Associated Clinical Concern Items, Impairment & Functional Outcome Items, and Validity Scales also showed high concordance on scale/item-level endorsements and on elevations across language versions. Taken together, these findings provide strong evidence for the validity of the French version of the CAARS 2 and justify expectations that scores generated from both the French and English forms should be highly similar.

< Back

Next >