Manual

CAARS 2 Manual

Appendix I: Calculating Statistical Differences in Scores


Appendix I: Calculating Statistical Differences in Scores

This appendix describes how to determine the statistical significance of differences in scores between raters and differences in scores across time. To compare scores between raters, critical values for the Conners Adult ADHD Rating Scales 2nd Edition (CAARS™ 2) and the CAARS™ 2–Short are provided. To compare scores (from the same rater) across time, Reliable Change Index (RCI) values for both T-scores and raw scores of the CAARS 2 and the CAARS 2–Short are provided.

Differences Between Raters

Critical values can be used to establish statistical significance when comparing CAARS 2 T-scores obtained from different raters within the same time period. These critical values can be calculated using the formula provided by Anastasi and Urbina (1997):

This formula takes into account the standard error of measurement (SEM) for each of the scales (see chapter 8, Reliability, for the SEM values for the CAARS 2 scales).

The critical values needed to determine statistical significance between a pair of T-scores at the p < .05 level of significance are provided in Tables I.1a through I.2c. Tables I.1a to I.1c present critical values for the full-length CAARS 2; Tables I.2a to I.2c present critical values for the CAARS 2–Short.

To determine whether the difference between two raters’ scores is statistically significant:

  1. Find the appropriate table based on the length of the form (full-length or short), the relevant pair of raters (e.g., Self-Report to Observer, Observer to Observer), and the type of Normative Sample selected for scoring (Combined Gender, Male, or Female).

  2. Find the column for the age group of the individual being rated.

  3. Find the row containing the scale being compared and identify the critical value.

  4. Compare the absolute difference between the two raters’ T-scores with the critical value. The difference is significant (p < .05) when it is equal to or greater than the critical value in the table.

For example, a 65-year-old woman was rated on the CAARS 2 Observer (full-length) by both her husband and her best friend. Her Impulsivity T-score (Combined Gender normative sample) was 63 when rated by her husband, and 59 when rated by her best friend, an absolute difference of 4 points. Given the ratings came from the full-length CAARS 2, for an Observer-Observer comparison using Combined Gender norms, Table I.2a is the appropriate look-up table. The person being rated is 65 years old, so the clinician used the “60 to 69” column for age. The clinician then found the “Impulsivity” row and noted the appropriate critical value of 7. The clinician compared the absolute difference (63 minus 59 = 4, or 59 minus 63 = -4) between the two Observer T-scores (4) with the critical value of 7. Because the absolute difference of 4 is not equal to or higher than 7, the difference between these two scores is not statistically significant at the p < .05 level. In other words, the Impulsivity scores produced by her husband’s and best friend’s ratings are not statistically different. Even though these two scores fall into different interpretation guideline categories (“Slightly Elevated” versus “Not Elevated”), the difference is not meaningful. Return to Interpretation Step 5 (see Step 5: Integrate and compare CAARS 2 results in chapter 4, Interpretation) to consider how this non-significant finding relates to other information from the CAARS 2, as well as other sources of information.

In contrast, perhaps the same 65-year-old woman was also rated by her supervisor at work, with an Impulsivity T-score of 72. This observer’s score is 9 points away from the husband-rated T-score of 63. The same reference value from Table I.2a, 7, is used, as this comparison is also comparing Observer to Observer, but this time, the absolute difference of 9 exceeds the critical value of 7. In this instance, it is appropriate to report that the supervisor’s rating is significantly higher than the husband’s rating (at the p < .05 level). It would be reasonable to hypothesize that this woman’s difficulties are more evident or more impairing in the workplace than in her home setting (or perhaps that her home setting is optimized to reduce the impact of impulsivity) and to review other information that could reject or support that hypothesis.

CAARS 2 (Full-Length)

Comparisons between Self-Report and Observer T-scores
Click to expand
Click to expand
Click to expand
Comparisons between Two Observers’ T-scores
Click to expand
Click to expand
Click to expand

CAARS 2–Short

Comparisons between Self-Report and Observer T-scores
Click to expand
Click to expand
Click to expand

Table I.3c. CAARS 2–Short Critical Values Denoting Statistically Significant Differences Between Self-Report and Observer Raters: Normative Sample Gender Specific–Females

CAARS 2–Short Scale Age Group
18–24 25–29 30–39 40–49 50–59 60–69 70+
Inattention/Executive Dysfunction 4 4 4 4 4 4 5
Hyperactivity 6 5 6 6 6 6 6
Impulsivity 6 6 6 6 6 6 7
Emotional Dysregulation 5 6 5 5 6 5 6
Negative Self-Concept 6 5 5 6 6 6 7
Comparisons between Two Observers’ T-scores
Click to expand
Click to expand
Click to expand

Differences Across Time

When the same rater completes the CAARS 2 more than once, those different points in time can be compared statistically. The Reliable Change Index (RCI, see Jacobson & Truax, 1991) is a commonly used method to determine whether a change in scores between test administrations is statistically significant.

A statistically significant result means that the measured change can be attributed to reliable differences between the scores, rather than random fluctuations in behavior or error in measurement. If the CAARS 2 score has increased significantly from the pre-test to the post-test, then the individual’s ratings reflect a significant increase in concerns or worsening of symptoms. If the CAARS 2 score has decreased significantly, then the individual’s ratings reflect a significant decrease in concerns, that is, improvement.

The following tables provide RCI values (p < .05) for comparing CAARS 2 ratings across two points in time. Use Tables I.5 and I.7 when comparing T-scores from the full-length CAARS 2 (Table I.5) or CAARS 2–Short form (Table I.7). In some instances, it may be necessary to compare raw scores across two points in time, such as when enough years pass that the individual changes normative samples (e.g., a 24-year-old turns 25, moving from the 18-24 years group into the 25-29 years group). When comparing raw scores, use Table I.6 (full-length CAARS 2) or Table I.8 (CAARS 2–Short).

Remember that the scores being compared must be obtained from the same rater about the same person, and the time interval between administrations should be more than four weeks. Note that age and type of normative sample (Combined Gender, Female, or Male) are not considerations for this comparison.

To determine whether the difference between scores over time is statistically significant:

  • Find the appropriate table based on the form length (full-length or short form) and the type of score being compared (T-score or raw score).

  • Find the column for the rater type (Self-Report or Observer).

  • Find the row for the scale that is being compared and identify the RCI.

  • Compare the absolute difference between the two scores with the RCI. The difference is significant (p < .05) when it is equal to or greater than the RCI.

For example, a spouse’s rating on the full-length CAARS 2 of a 34-year-old man resulted in a baseline Hyperactivity T-score of 85 and a post-treatment T-score of 70 after 6 months in treatment. The absolute difference between these scores is 15. The clinician looked for full-length CAARS 2, T-score comparison, and identified the correct look-up table as Table I.5. Using the Observer column and the Hyperactivity row, they identified the RCI value of 12. Because the absolute difference of 15 is greater than the RCI of 12, this change is statistically significant at the p < .05 level. The score dropped 15 points between Time 1 and Time 2, indicating a decreased report of concerns; this difference suggests improvement in the area of Hyperactivity. The score remains in the Very Elevated range, which means this man likely continues to have challenges related to his restlessness and activity levels, but he is making progress. In addition to stating that the change is statistically significant, the clinician might also determine that the change is clinically meaningful, given observable improvement.

Conversely, the spouse’s ratings for the Inattention/Executive Dysfunction scale were T-score = 74 at baseline and T-score = 70 at post-treatment, an absolute difference of 4 points. Using the same table (Table I.5 for full-length, T-score comparison) and column (Observer), the Inattention/Executive Dysfunction RCI is 9. Because the absolute value of 4 is smaller than the RCI of 9, this difference is not statistically significant. Clinical judgment is needed to determine if this represents a need for extending the treatment plan and/or revising interventions (e.g., changing medication type/dosage, increasing the frequency of therapy, revising therapy modalities).

CAARS 2 (Full-Length)

Click to expand
Click to expand

CAARS 2–Short

Click to expand

Table I.7. CAARS 2–Short Minimum Values Needed for Significance When Comparing Time 1 to Time 2 T-scores

Scale Self-Report Observer
Inattention/Executive Dysfunction 7 9
Hyperactivity 9 10
Impulsivity 8 13
Emotional Dysregulation 10 12
Negative Self-Concept 13 14
Click to expand

Table I.8. CAARS 2–Short Minimum Values Needed for Significance When Comparing Time 1 to Time 2 Raw Scores

Scale Self-Report Observer
Inattention/Executive Dysfunction 6 7
Hyperactivity 5 4
Impulsivity 4 6
Emotional Dysregulation 4 5
Negative Self-Concept 5 5
< Back Next >