Manual

CAARS 2 Manual

Chapter 13: Spanish (North America) Translation Study


Spanish (North America) Translation Study

Sample

A total of 328 Americans completed the CAARS 2 Self-Report, and 316 Americans completed the CAARS 2 Observer. Data were cleaned prior to analysis, based on data quality metrics, including indicators of careless responding (e.g., random responding) and response acquiescence (e.g., an irregular number of consecutive item responses), 45 individuals were removed from the Self-Report sample and 86 individuals were removed from the Observer sample. The final samples included 283 individuals for Self-Report and 230 for Observer. Refer to appendix J for the demographic characteristics for the individuals being rated and for the Observers.

Reliability

Reliability for the Spanish version of the CAARS 2 was examined with internal consistency estimates and test information functions (see Internal Consistency and Standard Error of Measurement and Test Information in chapter 8, Reliability, for more details). Coefficients alpha and omega were used as estimates of internal consistency and test information functions were generated using the mirt package in R (Chalmers, 2012).

Internal Consistency

Table 13.8 presents alpha and omega coefficients for the Spanish version of the CAARS 2. Internal consistency estimates were excellent for both the Self-Report and Observer form; the median coefficient omega value across scales for Self-Report was .93 (ranging from .90 to .98) and for the Observer was .97 (ranging from .90 to .98). These estimates are comparable to those found for the English version from the Normative Samples (see Tables 8.1a, 8.1b, 8.2a, and 8.2b in chapter 8, Reliability), as well as to values derived from the English version completed by individuals in the current sample (estimates are not reported as all scales were within .03 of each other across language versions). Overall, results show that the Spanish version of the CAARS 2 is internally consistent and comparable to the English version.

Click to expand

Test Information

Figure 13.2 shows the test information functions for the CAARS 2 by Content and DSM Symptom Scale. As can be seen in the figure, the Spanish versions of the scales display high information across the relevant range of the ability scale (approaching and exceeding 1.5 standard deviations above the mean). Further, the peaks of the information curves are broad with a wide area beneath them, implying the precision of measurement remains consistent across the relevant range of the scales. The peaks of the information functions are also equal to or greater than a value of 10, indicating very high precision and excellent reliability for the Spanish version of the CAARS 2. These information functions meet similar criteria to the English version (see Test Information in chapter 8, Reliability).

Click to expand

Validity

Evidence of validity was first explored for the Content Scales in terms of consistency in the factor structure across languages (explored using a within-subjects measurement invariance approach as per Liu et al., 2017; see also appendix N for an alternative approach that compares this study’s sample to the English version from the Normative Samples). After MI was established, correlations were computed, and mean scale scores were compared for both the Content and DSM Symptom Scales to examine whether scores were consistent between the Spanish and English versions. The CAARS 2–ADHD index, the Associated Clinical Concern Items and Impairment & Functional Outcome Items, and the Validity Scales were analyzed for differences between the Spanish and English translations. Results for all analyses are reported in the following sections.

Content and DSM Symptom Scales

First in the series of analyses, results of the MI investigations are presented in Tables 13.9 and 13.10. Overall, the CAARS 2 Content Scales were found to be invariant across the Spanish and English versions, as evidenced by non-decreasing CFI values and nonsignificant Satorra-Bentler chi-square tests (Satorra & Bentler, 2001). As part of the modeling procedure, some steps required partial invariance adjustments to result in nonsignificant chi-square tests, but the adjustments were infrequent and did not compromise the overall comparability of the scales between the two languages (Dimitrov, 2010). The results provide strong evidence for the validity of the CAARS 2 Spanish translation as a parallel measure to the English version.

Click to expand

Table 13.9. Within-Subjects Measurement Invariance by Language Version (Spanish vs. English): CAARS 2 Self-Report

Scale Invariance Model χ2 df RMSEA CFI TLI SRMR Satorra-Bentler χ2 df Δ CFI
Inattention/​Executive Dysfunction Configural 2345.78*** 1679 .038 .978 .977 .051 --
Weak 2366.51*** 1708 .037 .978 .977 .051 30.55 29 .000
Strong 2400.41*** 1766 .036 .979 .979 .051 48.86 58 .001
Strict 2307.65*** 1795 .032 .983 .983 .055 37.48 29 .004
Hyperactivity Configural 791.76*** 285 .079 .946 .936 .083 --
Weak 793.09*** 296 .077 .947 .940 .083 16.76 11 .001
Strong 793.29*** 318 .073 .949 .947 .083 31.99 22 .002
Strict 741.85*** 330 .067 .956 .955 .086 17.76 12 .007
Impulsivity Configural 601.47*** 285 .063 .962 .955 .071 --
Weak 596.58*** 297 .060 .964 .960 .071 6.37 12 .002
Strong 602.07*** 322 .056 .967 .965 .071 22.13 25 .003
Strict 583.63*** 335 .051 .970 .970 .073 18.20 13 .003
Emotional Dysregulation Configural 362.57*** 125 .082 .973 .964 .055 --
Weak 358.39*** 133 .078 .974 .968 .055 4.13 8 .001
Strong 361.49*** 150 .071 .976 .974 .055 22.19 17 .002
Strict 340.47*** 157 .064 .979 .978 .057 11.10 7 .003
Negative Self-Concept Configural 181.50*** 69 .076 .983 .976 .050 --
Weak 183.17*** 75 .072 .984 .979 .050 4.79 6 .001
Strong 183.07*** 88 .062 .986 .984 .050 11.09 13 .002
Strict 166.42*** 95 .052 .989 .989 .052 5.70 7 .003
Note. N = 283 Spanish version; N = 283 English version. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001. Exploration of partial invariance models at the weak invariance step showed that one loading had to be released for the Hyperactivity scale to become partially invariant. Further, one threshold had to be released for both the Hyperactivity and Inattention/Executive Dysfunction scales to satisfy strong invariance. Finally, one residual for the Hyperactivity and Inattention/Executive Dysfunction scales and two residuals for the Emotional Dysregulation scale had to be released to satisfy strict invariance.
Click to expand

Table 13.10. Within-Subjects Measurement Invariance by Language Version (Spanish vs. English): CAARS 2 Observer

Scale Invariance Model χ2 df RMSEA CFI TLI SRMR Satorra-Bentler χ2 df Δ CFI
Inattention/​Executive Dysfunction Configural 2126.20*** 1679 .034 .987 .986 .044 --
Weak 2146.41*** 1708 .033 .987 .986 .044 20.06 29 .000
Strong 2201.57*** 1767 .033 .987 .987 .044 71.47 59 .000
Strict 2169.47*** 1797 .030 .989 .989 .049 41.42 30 .002
Hyperactivity Configural 486.59*** 285 .056 .984 .981 .046 --
Weak 498.49*** 297 .054 .984 .982 .046 18.66 12 .000
Strong 516.27*** 321 .052 .985 .984 .046 26.74 24 .001
Strict 503.03*** 333 .047 .987 .986 .050 18.59 12 .002
Impulsivity Configural 410.63*** 285 .044 .990 .988 .038 --
Weak 419.40*** 297 .042 .990 .989 .038 12.20 12 .000
Strong 440.65*** 321 .040 .991 .990 .038 29.76 24 .001
Strict 439.76*** 333 .037 .992 .991 .043 18.17 12 .001
Emotional Dysregulation Configural 259.31*** 125 .068 .991 .988 .032 --
Weak 263.26*** 133 .065 .991 .989 .032 6.59 8 .000
Strong 276.76*** 150 .061 .992 .991 .032 20.03 17 .001
Strict 238.87*** 159 .047 .995 .995 .033 4.65 9 .003
Negative Self-Concept Configural 150.02*** 69 .072 .986 .980 .053 --
Weak 143.25*** 75 .063 .988 .984 .053 2.20 6 .002
Strong 142.95*** 88 .052 .990 .989 .054 11.80 13 .002
Strict 140.16*** 93 .047 .992 .991 .055 5.36 5 .002
Note. N = 230 Spanish version; N = 230 English version. RMSEA = Root mean square error of approximation; CFI = Comparative Fit Index; TLI = Tucker-Lewis Index; SRMR = Standardized root mean square residual; ∆CFI = change in CFI. *p < .05, **p < .01, ***p < .001. Exploration of partial invariance models found that one item threshold had to be released for the Impulsivity and Hyperactivity scale to satisfy strong invariance. One residual had to be released for the Impulsivity and Hyperactivity scales to satisfy strict invariance, and two residuals for the Negative Self-Concept scale.

Results from the correlation and mean group difference analyses are presented in Table 13.11 and Table 13.12. Corrected correlations (Sacket et al., 2000) revealed statistically significant and strong or very strong relationships across scales between the Spanish and English versions of the CAARS 2, with a median correlation of .93 for the Self-Report (range = .88 to .96) and .84 for the Observer (range = .79 to .84). Further, Welch’s paired t-tests (Welch, 1947) revealed almost no statistically significant differences between scales when comparing obtained scores on the two language versions (p < 0.01); Cohen’s d effect sizes were also negligible (maximum Cohen’s d = 0.19). Taken together, the strong correlations between scales across language versions, and the lack of a statistically significant difference for almost all scale scores (with all scales being practically insignificant) indicates that similar scores can be expected between the Spanish and English versions. These results provide evidence for the validity of the Spanish version of CAARS 2.

Click to expand

Table 13.11. Correlations and Mean Differences by Language Version (Spanish vs. English): CAARS 2 Self-Report

Scale Correlations English Spanish Paired t-tests
Obtained r Corrected r M SD M SD Cohen's d t (281) p
Inattention/​Executive Dysfunction .92 .96 47.1 8.6 47.4 8.3 0.10 1.71 .088
Hyperactivity .84 .90 47.7 9.0 46.7 8.1 0.19 -3.23 .001
Impulsivity .85 .88 46.9 9.4 47.0 9.3 0.03 0.44 .663
Emotional Dysregulation .87 .92 46.8 8.8 46.8 8.4 0.00 0.04 .965
Negative Self-Concept .84 .94 46.2 7.6 46.0 7.4 0.05 -0.82 .414
DSM ADHD Inattentive Symptoms .89 .95 47.2 8.3 47.3 8.1 0.04 0.68 .498
DSM ADHD Hyperactive/​Impulsive Symptoms .82 .88 47.9 9.0 47.3 8.4 0.10 -1.62 .106
DSM Total ADHD Symptoms .89 .94 47.3 8.5 47.1 8.2 0.05 -0.78 .434
Note. N = 283. All r significant, p < .001. Guidelines for interpreting |r|: very weak < .20; weak = .20 to .39; moderate = .40 to .59; strong = .60 to .79; very strong ≥ .80. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for the Spanish version than the English version.
Click to expand

Table 13.12. Correlations and Mean Differences by Form Language (Spanish vs. English): CAARS 2 Observer

Scale Correlations English Spanish Paired t-tests
Obtained r Corrected r M SD M SD Cohen's d t (228) p
Inattention/​Executive Dysfunction .89 .84 49.0 11.5 48.5 10.9 0.08 1.18 .240
Hyperactivity .89 .83 50.0 12.0 49.2 11.1 0.13 2.03 .044
Impulsivity .86 .79 49.0 11.7 49.1 11.4 0.01 -0.19 .852
Emotional Dysregulation .87 .84 49.7 11.0 49.4 10.7 0.06 0.98 .329
Negative Self-Concept .79 .84 48.8 9.6 49.3 8.8 0.08 -1.28 .202
DSM ADHD Inattentive Symptoms .88 .84 49.5 11.0 48.8 10.5 0.12 1.82 .070
DSM ADHD Hyperactive/​Impulsive Symptoms .87 .81 50.3 11.8 49.7 11.0 0.11 1.59 .113
DSM Total ADHD Symptoms .89 .84 49.9 11.7 49.3 11.0 0.13 1.91 .058
Note. N = 230. All r significant, p < .001. Guidelines for interpreting |r|: very weak < .20; weak = 20 to .39; moderate = .40 to .59; strong = .60 to .79; very strong ≥ .80. Guidelines for interpreting Cohen's |d|: negligible effect size < 0.20; small effect size = 0.20 to 0.49; medium effect size = 0.50 to 0.79; large effect size ≥ 0.80. A positive Cohen's d value indicates higher scores for the Spanish version than the English version.

CAARS 2–ADHD Index

The CAARS 2–ADHD Index was examined to confirm it performed similarly across the Spanish and English versions (see chapter 12, CAARS 2–ADHD Index, for more information on the development, scores, and psychometric properties of the ADHD Index). The probability scores for the Spanish version of the CAARS 2–ADHD Index were compared to the English version using the Wilcoxon Signed Rank Test (Wilcoxon, 1945); an effect size is also provided, r, which can be interpreted using the correlation guidelines provided in this chapter (Rosenthal, 1991).

The difference in probability scores between the Spanish and English versions was not statistically significant, and effect sizes were very weak (Self-Report: V = 3681, p = .723, r = -0.01; Observer: V = 3003, p = .856, r = -0.02). Thus, the CAARS 2–ADHD Index operates similarly in English as it does in Spanish, adding to the validity evidence for the Spanish translation.

Associated Clinical Concern Items and Impairment & Functional Outcome Items

The Associated Clinical Concern Items and Impairment & Functional Outcome Items of the CAARS 2 were also examined to ensure language versions operated similarly. To gauge this, the proportion of individuals with concordant item elevations or endorsements across language versions was calculated and McNemar’s tests with a continuity correction were performed (see Associated Clinical Concern Items and Impairment & Functional Outcome Items in the French Translation section for more information on these statistics). The Associated Clinical Concerns: Item Selection and Scoring and Impairment & Functional Outcome Items: Item Selection and Scoring sections in chapter 6, Development, also provides more information on how endorsed and elevated responses were determined for items in these scales.

Results of the item-level analyses are presented in Table 13.13. For the Associated Clinical Concern Items and the Impairment & Functional Outcome Items, the percentage of individuals with concordant item endorsements or elevations was very high, with agreement above 90% for nearly all items on both forms. McNemar’s tests also showed that item elevations/endorsements were not significantly more frequent on one language version than the other (p > .01), with the exception of a single item on the Impairment & Functional Outcome Items (though it still displays 91.2% concordance). Taken together, the results demonstrate that elevations and endorsements on the Associated Clinical Concern Items and the Impairment & Functional Outcome Items are highly similar between the Spanish and English versions, supporting the validity of the Spanish translation.

Click to expand

Table 13.13. Concordance of Item Elevations/Endorsements by Language (Spanish vs. English)

Item Set Item Stem Self-Report Observer
% Concordant χ2 p % Concordant χ2 p
Associated Clinical Concern Items Suicidal thoughts/​attempts 94.7 0.00 1.000 93.0 0.06 .803
Self-Injury 94.7 0.27 .606 93.5 0.27 .606
Sadness/emptiness* 90.5 1.33 .248 93.0 0.56 .453
Anxiety/​worry 92.9 0.00 1.000 87.4 0.14 .710
Impairment & Functional Outcome Items Bothered by things endorsed on the CAARS 2 92.6 0.00 1.000 90.4 2.23 .136
Things endorsed on the CAARS 2 interfere with life 91.2 0.00 1.000 90.0 0.00 1.000
Problems in romantic/​marital relationship(s) 91.9 0.17 .677 87.4 0.00 1.000
Problems in relationships with family members 94.0 0.24 .628 89.6 0.00 1.000
Problems in relationships with friends, coworkers, or neighbors 92.2 0.05 .831 89.6 0.04 .838
Problems at work and/​or school 93.3 0.21 .646 86.1 0.03 .860
Has a harder time with things than other people do 91.5 3.38 .066 89.6 2.04 .153
Underachiever 90.5 0.59 .441 92.2 0.06 .814
Sleep problems 97.5 0.00 1.000 92.2 1.39 .239
Problems with money management 95.1 0.64 .423 93.5 0.00 1.000
Neglects family or household responsibilities 93.6 1.39 .239 93.5 0.27 .606
Risky driving 94.3 0.56 .453 91.3 0.00 1.000
Problems due to time spent online 91.2 7.84 .005 89.6 2.04 .153
Note. The chi-square test statistic and its associated p value are for the McNemar's tests (df = 1).
* The item stem for this Screening Item is Sadness/Emptiness for Self-Report and Sadness for Observer.

Validity Scales

The CAARS 2 Validity Scales were examined to ensure that they operated similarly in the Spanish and English versions. For both the Negative Impression Index and the Inconsistency Index, the proportion of individuals with concordance across the Spanish and English versions for scale elevations (that is, raw scores that exceed the cut-off) was compared (details provided in the Response Style Analysis: Item Selection and Score Creation section in chapter 6, Development).

As can be seen in Table 13.14, the proportion of individuals with concordant scale elevations was very high (above 90%) for both the Negative Impression Index and Inconsistency Index. Further, McNemar’s tests indicated that scoring above the cut-off on one language version was not statistically significantly more likely than scoring above the cut-off on the other (p < .01). Taken together, both Validity Scales operated similarly in the Spanish and English versions of the CAARS 2 Self-Report and Observer, contributing supporting evidence for the validity of the Spanish language version.

Click to expand

Table 13.14. Concordance of the Validity Scales by Language (Spanish vs. English)

Scales Self-Report Observer
% Concordant χ2 p % Concordant χ2 p
Negative Impression Index 93.3 0.00 1.000 92.6 0.00 1.000
Inconsistency Index 92.9 0.00 1.000 93.0 0.06 .803
Note. The chi-square test statistic and its associated p value are for the McNemar's tests (df = 1).

Summary

The reliability and validity of the Spanish (North America) version of the CAARS 2 was examined in a translation study where individuals completed both the Spanish (North America) and the English version consecutively (with order counterbalanced across individuals). Both the Self-Report and Observer form displayed excellent internal consistency and high levels of measurement precision for all Content and DSM Symptom Scales, with coefficients and information functions comparable to those of the English version for both the current sample, as well as the normative sample. This provides strong evidence of the reliability of the Spanish version of the CAARS 2.

Further, it was demonstrated that the Spanish version of the Content Scales were invariant from the English scales, indicating the measurement models of both the Spanish and English versions of the Content Scales are statistically similar. Examination of obtained scores also supported the finding of high scale correlations and no practical mean differences between language versions for the Content Scales and DSM Symptom Scales. Analyses for the ADHD Index, Associated Clinical Concern Items and Impairment & Functional Outcome Items, and Validity Scales also showed high concordance on scale/item-level endorsements and elevations across language versions. Taken together, these findings provide strong evidence for the validity of the Spanish version of the CAARS 2 and justify expectations that scores generated from both the Spanish and English forms should be highly similar.

< Back Next >