Appendices

Manual

CAARS 2 Manual

Appendix E: CAARS 2 Scores Reference

Raw Scores
Standardized Scores
Symptom Counts

This appendix provides additional information about various score types involved in the interpretation of the Conners Adult ADHD Rating Scales 2nd Edition (CAARS™ 2). See chapter 4, Interpretation, for additional guidance.

Raw Scores

Item-level Raw Scores

Most of the items on the CAARS 2 have an item-level raw score ranging from 0 to 3¹, while the Critical Items (Suicidal Thoughts/Attempts, Self-Injury) have item-level raw scores ranging from 0 to 4². Some items include a “Not Applicable” option, and some items on the Observer form also include a “Don’t Know” option. However, these two response options are not assigned a numeric score. Finally, any text responses to the Additional Questions are reviewed qualitatively.

Elevated Items. When an item is endorsed at a level that is relatively rare for a given reference sample, it is flagged as an elevation in the report. Item responses that fall in the upper quartile of the response frequency distribution (i.e., approximately the top 25%) are considered elevated. This designation is particularly meaningful for item-level data (such as Impairment & Functional Outcome Items) when evaluating whether an individual’s item-level response is extreme relative to how others in the selected reference group responded. See chapter 4, Interpretation, particularly Step 4 for information about interpreting item-level elevations; see appendix F for detailed information about item-level elevations on the Associated Clinical Concern Items and Impairment & Functional Outcome Items.

Response Style Analysis Raw Scores

Different methods are used to calculate raw scores for each of the Response Style Analysis metrics. These methods are summarized below. See also chapter 4, Interpretation, Step 1, for information about interpreting Response Style Analysis metrics.

Negative Impression Index: The sum of item scores for items included in this index with responses greater than 1.
Inconsistency Index: The sum of differences between item-pairs included in this index with differences greater than 1 point.
Omitted Items: The number of items without responses (excluding the Additional Questions, which are optional), across the entire CAARS 2.
Pace: The average number of items completed per minute, calculated as the total number of items administered on the CAARS 2 divided by the sum of each item’s response time.

Raw Scale Scores

Content Scale and DSM Symptom Scale raw scores are based on the sum of item scores for each scale. Because each scale has a different number of items, the possible raw score range varies (see appendix A for item count per scale). For all scales on the CAARS 2, higher raw scores indicate higher levels of concern in a given area. For example, a Hyperactivity raw score of 28 indicates a considerably higher report of hyperactive behaviors or severity than a Hyperactivity raw score of 4. Because each scale has a different number of items, it is not meaningful to compare raw scores across scales (e.g., a raw score of 20 is relatively high for Negative Self-Concept, but on the lower side for Inattention/Executive Dysfunction).

In some circumstances, such as monitoring treatment or intervention progress, it may be relevant to examine whether a person’s raw score for a given CAARS 2 scale has changed (see Differences Across Time in appendix I); however, it is essential to remember that raw scores do not take into account important differences between different scales, rater types, or the individual’s demographic characteristics. A raw score of 35 on the Inattention/Executive Dysfunction scale is very unusual at 70+ years of age for the Observer Combined Gender Normative Sample, but more common for 18- to 24-year-olds. Hence, in most cases, it is more meaningful and appropriate to use standardized scores where they are available, such as for the Content Scales and the DSM Symptom Scales.

Standardized Scores

Raw scores on the CAARS 2 are transformed into standardized scores to ease interpretation. T-scores and percentiles are used to compare an individual’s reported symptoms and associated features with what is typical of other adults in the selected reference group (for more detail, see Understanding Reference Samples in chapter 3, Administration and Scoring). This comparison with the reference sample guides the interpretation of results from a single CAARS 2 administration by informing judgments as to how typical or atypical the results are for members of that selected group. Standardized scores are also helpful when comparing results across different scales, different raters, and/or different points in time. Standardized scores on the CAARS 2 are scaled in the same direction; higher scores indicate higher levels of concern or atypicality based on the comparison of that set of ratings with the selected reference sample.

T-scores and Confidence Intervals (CI)

This standardized transformation of a raw score has a mean of 50 and a standard deviation of 10 (see Standardization Procedures in chapter 7, Standardization, for more information about how T-scores on the CAARS 2 were derived). T-scores are provided for CAARS 2 Content Scales and DSM Symptom Scales. Some assessors may interpret a single standard deviation from the mean as an elevated score (i.e., T-score ≥ 60), while others set a stricter threshold of 1.5 or even 2 standard deviations (i.e., T-score = 65 or 70). Interpretation guidelines for CAARS 2 T-scores are provided in Table 4.5 in chapter 4, Interpretation.

T-scores in the CAARS 2 report have a minimum of 10 (i.e., 4 standard deviations below the mean are exceedingly rare) and can go above 100 (i.e., there is no upper cap). Note that CAARS 2 items were designed to capture the upper end of the spectrum of behavior or symptoms, not both upper and lower extremes. Low scores represent the absence of high endorsement and should not be interpreted as the opposite of any construct. For example, while a high score for Inattention/Executive Dysfunction represents difficulty with this construct, a low score on Inattention/Executive Dysfunction indicates the absence of dysfunction and does not imply a particular skill or ability in this domain. An extremely low score, however, is a rare occurrence. When a CAARS 2 profile includes several extremely low scores, consider whether there may be reasons for reporting below-average levels of concern. See chapter 4, Interpretation, particularly Steps 3a and 3b for guidance regarding the interpretation of the CAARS 2 Content Scales and DSM Symptom Scales.

All measurements contain errors (see Naglieri & Chambers, 2009; see also Internal Consistency and Standard Error of Measurement in chapter 8, Reliability); possible measurement errors should always be acknowledged whenever scores are presented and compared. Interpreting results while ignoring measurement errors may produce misleading conclusions for the user (Oosterwijk et al., 2019; Wilkinson & Task Force on Statistical Inference, 1999). Measurement error in CAARS 2 T-scores is represented by the confidence interval (CI). A CI is provided for each CAARS 2 T-score; the default is 90% with the option of 95%. The CI provides a range of standard scores, at a particular level of probability, within which the individual’s true score could be expected to occur (Harvill, 1991). That is, if an individual was evaluated 100 times and a 90% confidence interval was created each time, then 90 of those 100 confidence intervals would be expected to contain their true score. The width of the confidence interval indicates the precision of the estimate; narrower intervals indicate a greater likelihood that the observed score is very close to the true score (Morris & Lobsenz, 2000). A less reliable scale score (one with greater error in measurement) will have a wider confidence interval than more reliable scores. The level of confidence selected is based on the user’s comfort level. The most commonly used confidence levels, 90% and 95%, are available for CAARS 2 T-scores based on a Normative Sample. These CIs are calculated based on each scale’s standard error of measurement (SEM). See appendix C for information about obtaining CIs for T-scores based on the ADHD Reference Sample.

Percentiles

These standardized scores indicate the proportion of individuals in the selected reference group who obtained a raw score the same as, or lower than, that obtained by the individual who was rated. That is, these scores describe the rank of a person compared to others in the selected reference group. For example, if a score is at the 65th percentile, this value means that 65% of the adults in the selected reference group earned a raw score of the same value or lower. Percentiles range from 1st to 99th and are determined using the cumulative frequency distribution of the selected reference group. Percentiles are provided for CAARS 2 Content Scales and DSM Symptom Scales.

Probability Score

The CAARS™ 2–ADHD Index provides a probability that evaluates the degree to which the obtained raw score is more similar to ratings of individuals with ADHD rather than ratings from the general population. The probability score communicates the likelihood (e.g., 90% likely) that a given pattern of responses resembles responses from individuals with an ADHD diagnosis or classification. The CAARS 2–ADHD Index probability score is based on age and does not differ by gender. It ranges from 1% to 99%. For more details about the development of this score, please see chapter 12, CAARS 2–ADHD Index; for guidance regarding interpretation, see chapter 4, Interpretation, particularly Table 4.6.

T-score versus Percentile?

Although both T-scores and percentiles are provided on the CAARS 2, some users may choose to report a single score type, especially when sharing results with individuals who are not well-versed in statistics. T-scores are typically recommended for the interpretation of CAARS 2 results. The consistency of the scores facilitates comparisons across scales, raters, and time points. No matter where a score falls on the bell curve, 10 T-score points mean the same thing (e.g., 10 T-score points is always 1 Standard Deviation [SD]).

However, it can be difficult to explain T-scores to individuals unfamiliar with statistical concepts such as mean and standard deviation. Some users may find it helpful to report percentiles when explaining CAARS 2 results. The general idea of percentile ranks is more accessible to a wider population, although it may be helpful to clarify that a percentile rank is different from a percentage. One possible explanation is that a high percentile, such as the 95th percentile, means this score is as high or higher than 95% of how people of similar age (and gender, if Gender-Specific reference samples are selected) scored. Remember, however, that percentiles are less helpful for making comparisons between scores when interpreting data. A 10-point difference means the same thing for T-score comparisons whether the scores are high or low, but a 10-point difference in percentiles means different things depending on where the score falls on the distribution.

Symptom Counts

Each of the 18 DSM-5-TR symptoms of ADHD (9 Inattentive and 9 Hyperactive/Impulsive) is represented in some form on the CAARS 2. See appendix G for additional information about how symptoms are counted on the CAARS 2 and a list of DSM criteria and corresponding CAARS 2 items

The sum of counted symptoms for each CAARS 2 DSM Symptom Scale (i.e., ADHD Inattentive Symptoms and ADHD Hyperactive/Impulsive Symptoms) is the Symptom Count, ranging from 0 to 9. These Symptom Counts can be considered when examining the DSM-5-TR requirement that adults must present with at least 5 of 9 symptoms in one or both domains to meet Criterion A for a diagnosis of ADHD. Additional interpretive guidance is provided in chapter 4, Interpretation, Step 3b.

¹ A response of “Not true at all; Never/Rarely” is scored 0, “Just a little true; Occasionally” is scored 1, “Pretty much true; Often/Quite a bit” is scored 2, and “Completely true; Very often/Always” is scored 3. Items are reverse scored when “Not true at all; Never/Rarely” indicates more concern than “Completely true; Very often/Always” (e.g., “I believe in myself.”).

² A response of “Never” is scored 0, “Rarely” is scored 1, “Just a little/Occasionally” is scored 2, “Often/Quite a bit” is scored 3, and “Very often/Always” is scored 4.

< Back

Next >