Chapter 4: Interpretation

Manual

CAARS 2 Manual

Chapter 4: Step-by-Step Interpretation Guidelines

Step-by-Step Interpretation Guidelines

view all chapter tables | print this section

Step 1: Examine the Response Style Analysis
Step 2: Examine the Associated Clinical Concern Items
Step 3: Interpret the CAARS 2 Scales
Step 4: Review Item-level Responses on the CAARS 2 Scales, Impairment & Functional Outcome Items, and Additional Questions
Step 5: Integrate and Compare CAARS 2 Results (Across Raters and Across Time)
- Within a Single Rater
- Comparing Results
  - Comparing Results Across Multiple Raters
  - Comparing Results Across Different Points in Time

Step 1: Examine the Response Style Analysis

The first step in the interpretation process is to consider how the rater approached the CAARS 2, as results will only be valid if the information provided by the rater is an accurate reflection of the individual who was rated. The CAARS 2 offers several ways to evaluate the rater’s response style, including two Validity Scales (Negative Impression Index, Inconsistency Index) and two additional metrics (Omitted Items, Pace).

When analyzing the Response Style, determining the validity of ratings is typically not a binary, “valid” versus “invalid” decision; rather, it is a continuum of caution concerning the confidence with which one interprets results. Increased levels of interpretive caution are warranted by extreme or atypical scores on any of the Response Style metrics or when there are multiple issues flagged across these metrics. Conversely, the absence of flags on the Response Style Analysis indicates only that the CAARS 2 was completed in a way consistent with expectations on these specific metrics and indicates that the findings can be interpreted with a greater degree of confidence.

Negative Impression Index. The 6-item Negative Impression Index, embedded in the full-length CAARS 2 and the CAARS 2–Short, identifies unrealistically negative ratings or exaggerated description of problems. The raw score, which ranges from 0 to 18, is compared with a recommended threshold (see Table 4.1). When the Negative Impression Index score exceeds the threshold, it is flagged in the Response Style Analysis section within the Overview of the report to prompt further review.

Items on the Negative Impression Index were chosen because they were endorsed at very low levels in both the General Population Sample and the ADHD Reference Sample (see details on the creation of this index in chapter 6, Development). This low level of endorsement by individuals encouraged to respond honestly means that a high score on the Negative Impression Index likely indicates an extremely negative response style. It is possible, but uncommon, for the Negative Impression Index to be elevated simply due to severe ADHD. When the Negative Impression Index is elevated, consider the following:

Might the rater be exaggerating their concerns out of frustration because they feel unheard or dissatisfied with their functioning or progress?
Might the rater have secondary motivations for obtaining an ADHD identification (e.g., accessing services, medications, and/or accommodations)?

Although an elevated Negative Impression Index score does not immediately invalidate results from the CAARS 2, it should prompt a review of the constituent items (using the Items by Scale section of the report). In some cases, a discussion with the rater to better understand their responses and how they approached the CAARS 2 will help. Additional sources of data (e.g., additional raters, observation data) can also be helpful in understanding an elevated Negative Impression Index. Proceed with interpretation, remembering that the scores may over-represent areas of concern. Please see Case 4, “David,” in chapter 5, Case Studies, for an illustration of interpreting elevated Negative Impression Index scores.

Click to expand

Table 4.1. Interpretation Guidelines for Negative Impression Index

Negative Impression Index Raw Score		Interpretation Guidelines
Self-Report	Observer	Interpretation Guidelines
≥ 6	≥ 7	Warrants follow-up, as it may reflect an attempt to present an unfavorable impression. Review the Items by Scale of the report and other sources of information to determine if this score reflects unrealistically negative ratings, exaggerated description of problems, and/or accurate ratings of problems that rarely occur at the level endorsed.
0–5	0–6	Within the expected range and does not suggest unrealistically negative ratings or an exaggerated description of problems.

Inconsistency Index. The Inconsistency Index, available on the full-length CAARS 2, captures differing responses to seven item pairs expected to yield the same or very similar ratings. Raw scores range from 0 to 21. The raw score is compared with a recommended threshold (see Table 4.2). When the Inconsistency Index score exceeds that threshold, it is flagged in the Overview—Response Style Analysis section of the report to prompt further review.

The pairs of items used for the Inconsistency Index were chosen because they were rated very similarly by individuals in both the normative and clinical samples (for a detailed description of the creation of this index, see chapter 6, Development). Although slight inconsistencies within item pairs are to be expected, it is unusual for there to be numerous or extreme inconsistencies. An elevated score on this index suggests the rater may have been inconsistent in their responses. When the Inconsistency Index is elevated, consider the following:

Might there have been factors that reduced the rater’s consistency (e.g., fatigue, disinterest, careless responding, reading impairment, translation issues if the scale was not administered in their primary language)?
Could the observed inconsistencies reflect actual differences in the behavior of the person being rated? For instance, perhaps slight differences in the items are capturing variations in behavior due to medication status, substance use, transient states (e.g., mood, fatigue, hunger, illness), or settings (e.g., work, school, home).
Is it possible that the rater misunderstood the instructions or misinterpreted some items?
Did the rater genuinely arrive at different interpretations for the paired items?
Could there be motivation for deliberate non-compliance? Did the rater knowingly choose to respond randomly?

For an example of interpreting an elevated Inconsistency Index, see Case 6, “Gloria,” in chapter 5, Case Studies.

The Inconsistency Index is designed to provide the examiner with some insight into whether the responses and scores can be considered valid. As with the Negative Impression Index, an elevated Inconsistency Index does not necessarily mean the CAARS 2 results should be discarded. Review Items by Scale in the report and consider possible interpretations in the context of what you know about the rater and the person being described. Consider contacting the rater to learn more about their responses to items on the Inconsistency Index. Continue with the interpretation, keeping in mind that inconsistent responding may produce spurious results. Additional sources of data (e.g., additional raters, observation data) may be particularly helpful in this situation.

Click to expand

Table 4.2. Interpretation Guidelines for Inconsistency Index

Inconsistency Index Raw Score		Interpretation Guidelines
Self-Report	Observer	Interpretation Guidelines
≥ 4	≥ 4	Warrants follow-up; review the Items by Scale and other sources of information to determine if this score reflects inconsistent, careless, or random responding; comprehension difficulties; or the rater’s interpretation of subtle wording differences within an item pair.
0–3	0–3	Within the expected range; does not suggest inconsistent, careless, or random responding.

Omitted Items. The impact of missing data on the CAARS 2 varies, depending on which items and how many items are omitted. Digital administration reduces the chance of careless omissions, as the rater must click an “Are you sure you want to skip this item?” message for each omitted item. Paper administrations do not have the same protections, particularly when completed remotely; when completed on-site, administrators are encouraged to skim forms to ensure completion before the rater departs. The Overview—Response Style Analysis section provides an overall count of omitted items. Omitted Items is flagged when any of the CAARS 2 scale scores cannot be scored or must be prorated. Users can easily see which CAARS 2 scales are prorated or cannot be scored by examining the CAARS 2 Scales section of the report. Table B.1 in appendix B provides information about allowances for the total number of omitted items on each scale of the CAARS 2. When there are missing responses, review the Items by Scale section of the report to see which items were omitted (see Report Options section in chapter 3, Administration & Scoring for more details), and consider the following possibilities:

Might there be a deliberate choice to not provide a response (e.g., unsure of consequences, worried about judgment, discomfort answering)?
Is it possible that the rater was careless in responding? (Note that this is less likely to occur in digital administrations compared with paper administrations.)
Were the ratings completed in a setting in which it was challenging to sustain attention?
Is it possible that the rater misunderstood directions instructing them to respond to every item?
Might the item have been left blank because an observer was not familiar with this aspect of the person’s functioning?
Might the rater have intended to return to a skipped item, then forgot to do so? (Note that this is less likely to occur in digital administrations compared with paper administrations.)
Is it possible that the rater accidentally clicked the “Next” button and did not attend to warnings about skipping the item?

With care, portions of the CAARS 2 may still be interpretable even with some omitted items. A flag for Omitted Items does not necessarily mean all results must be discarded but can reflect the rater’s avoidance of particular content. Consider whether the omitted items are randomly distributed or reflect one or more common themes. It may be helpful to examine available information and/or talk with the rater to learn more about the items they skipped and why they were skipped so you can better consider the possible impact on Response Style. Note that any Omitted Items will be revisited in Step 4 in terms of their impact on specific CAARS 2 scales (namely, when they result in prorated or un-scorable scales).

Interpretation Guidelines for Omitted Items

If a CAARS 2 scale cannot be scored due to omitted items, the report will flag this as an area to review. In some cases, a score may be prorated based on completed items; a prorated score is the best estimate of a scale that has omitted items, but it may overestimate or underestimate the actual score.

Pace (online administration only). Pace, available for both full-length and short forms, provides the average number of items the rater completed per minute. In the Normative Sample, the typical pace ranged from approximately 7 to 14 items per minute, although some people spent more or less time on their ratings. A pace slower than 1 item per minute is considered to be unusually slow, whereas a pace of 15 items or more per minute is considered to be unusually fast; both of these extremes will be flagged in the Overview—Response Style Analysis section of the CAARS 2 report (see chapter 6, Development, for information on how these cut-offs were determined). See Table 4.3 for additional interpretation guidelines.

It is possible for a rater to work quickly or slowly and still produce accurate data; however, an unusually fast or unusually slow pace often indicates a response style that could impact interpretation. Consider the following possibilities when the Pace metric is flagged in the report:

Unusually fast pace
- Is it possible the rater is a very fast reader?
- Might the rater have been rushing (e.g., completing the form in the parking lot before an appointment, trying to “get it over with”)?
- Did the rater respond impulsively (such as answering without considering the meaning of each item)?
Unusually slow pace
- Is it possible that the rater was working very carefully to avoid any mistakes (e.g., perfectionistic tendencies, anxious about errors)?
- Was the rater’s reading speed impacted by other factors (e.g., dyslexia, CAARS 2 not being administered in the rater’s preferred language, intellectual disorder, slow processing speed)?
- Was the rater multitasking, distracted, fatigued, working in a chaotic environment with many interruptions, or taking frequent breaks (i.e., they were not on task the whole time that the CAARS 2 administration window was open)?

When Pace is flagged on the report, take time to consider these possible explanations and whether they impact your confidence in the CAARS 2 scores. Consider following up with the rater to learn more about how they completed the CAARS 2 and whether there might be external factors affecting how quickly they moved through the items. When Pace is flagged for Self-Report, consider possible follow-up in other areas (e.g., additional evaluation of cognitive abilities, processing speed, or reading skills).

Click to expand

Table 4.3. Interpretation Guidelines for Pace (For Online Administration Only)

Pace (Average Number of Items Completed per Minute)		Interpretation Guidelines
Self-Report	Observer
≥ 15.0 items per minute		Warrants follow-up; this is an unusually fast pace, which could result from a variety of factors (e.g., reading items quickly, giving little consideration to responses, rushing).
1.0 to 14.9 items per minute		Within the expected range; this is a typical pace.
< 1.0 item per minute		Warrants follow-up; this is an unusually slow pace, which could result from a variety of factors (e.g., interruptions, comprehension difficulties, fatigue, extreme deliberation).

Note. Those using paper-and-pencil administration forms are not advised to hand-calculate Pace, as different response modalities can affect expectations regarding the typical speed of response.

Step 2: Examine the Associated Clinical Concern Items

The next interpretation step is to examine ratings provided on the four Associated Clinical Concern Items. The full-length CAARS 2 includes four items that correspond to important clinical symptoms that can co-occur with ADHD.

The two Critical Items refer to the lifetime occurrence of suicidal thoughts/attempts and self-injury. These two items have a different response scale than other CAARS 2 items (see note in Table 4.4). They also include a “Don’t Know” option for observers who lack knowledge about these symptoms. When a Critical Item is endorsed as present at any level (see note under Table 4.4), an alert appears on the Overview—Associated Clinical Concern Items section to alert the clinician to further investigate these symptoms.
The two Screening Items refer to possible anxiety (viz., anxiety/worry) and depression (viz., sadness/emptiness for the Self-Report form, sadness for the Observer form). The Screening Items are flagged on the Overview—Associated Clinical Concern Items section when endorsed at a level higher than what is typical for the selected Normative Sample. For most of the age groups, an endorsement of 2 (“Pretty much true; Often/Quite a bit”) or 3 (“Completely true; Very Often/Always”) warrants follow-up. Note that any level of endorsement for the sadness item is unusual in adults who are 70 years or older; thus, a response of 1 (“Just a little true; Occasionally”) or higher triggers an alert in this age group.

These Critical Items and Screening Items are included to prompt clinicians to consider potential suicidality, non-suicidal self-harm, anxiety, and depression, even when they are not part of a referral. These single items should not be viewed as sufficient or adequate assessments for any of these complex clinical issues. Manifestations of anxiety and depression can mimic ADHD symptoms (for an example of how the CAARS 2 can inform differential diagnosis in such cases, see Case 2, “Maria,” in chapter 5, Case Studies).

Investigate any indication of possible suicidality, non-suicidal self-harm, anxiety, and/or depression (i.e., whenever any of these items are endorsed to a higher degree than “Never” for Critical Items and “Not true at all; Never/Rarely” for Screening Items) further through interview, observation, and other assessment tools. Some of these items were commonly endorsed as being present (at low levels) in the normative samples, which does not alter the need for clinical follow-up but suggests that those doing so should be aware that varying levels of these concerns occur in particular populations. Users who are interested in additional details can view response frequencies for the Associated Clinical Concern Items in appendix F. Endorsement of these items at any level warrants follow-up; please see Table 4.4 for interpretation guidelines regarding what levels of endorsement are considered unusual for each of these items.

Click to expand

Table 4.4. Interpretation Guidelines for Associated Clinical Concern Items

Associated Clinical Concern Item Stem		Rating	Interpretation Guidelines
Critical Items¹	Suicidal thoughts/attempts	≥ 1	Immediate follow-up is strongly recommended.
Critical Items¹	Self-injury	≥ 1	Immediate follow-up is strongly recommended.
Screening Items²	Anxiety/worry	≥ 2	Follow-up is strongly recommended.
Screening Items²	Sadness/emptiness³	≥ 2 (ages 18–69) ≥ 1 (ages 70+)	Follow-up is strongly recommended.

¹ Critical Items ask about entire life and have a different set of response options: 0 ("Never"), 1 ("Rarely"), 2 ("Just a little/Occasionally"), 3 ("Often/Quite a bit"), and 4 ("Very often/Always"). The Observer form also includes a "Don't Know" option because some observers may not have knowledge of suicidality or self-injury.

² Response options for the Screening Items are consistent with other CAARS 2 items.

³ The Self-Report form asks about sadness or emptiness; the Observer form asks about sadness.

Step 3: Interpret the CAARS 2 Scales

This step shifts the focus to components of the CAARS 2 that measure key features of ADHD. First, an analysis of the Content Scale T-scores is conducted. Second, the DSM Symptom Scale results are considered from both the relative perspective of T-scores and the absolute perspective of Symptom Counts. Third, the ADHD Index probability score is examined. The final part of Step 3 focuses on the relationships among these different scores, including identifying and interpreting areas of both corroboration and discrepancy. Note that some users may prefer to begin Step 3 with an overview of the general CAARS 2 Scale profile. In addition to the summary of CAARS 2 Scales in the report Overview, the CAARS 2 Scales section outlines all scores.

3a: Examine the Content Scales

The CAARS 2 Content Scales capture information about core problem domains associated with ADHD in adults, including Inattention/Executive Dysfunction, Hyperactivity, Impulsivity, Emotional Dysregulation, and Negative Self-Concept. Coverage of each full-scale CAARS 2 Content Scale is summarized below; note that the short form has fewer items so does not include all domains. It is important to recognize that these general descriptions may not apply to every person who has a scale elevation. Reviewing the rater responses detailed in the Items by Scale section of the CAARS 2 report will help determine which aspects of a given Content Scale are relevant to address.

Inattention/Executive Dysfunction. This scale includes items about difficulties with paying attention to details, concentrating, staying focused, remembering tasks, planning, time management, prioritizing, and organizing.
Hyperactivity. This scale includes items about feeling restless, having difficulty sitting still, tapping hands or feet, talking too much, distracting others, and having trouble doing activities quietly.
Impulsivity. This scale includes items about feeling impatient, rushing through things, interrupting others, blurting out answers, acting before thinking, and having trouble waiting.
Emotional Dysregulation. This scale includes items about difficulty controlling emotions, such as getting easily irritated or frustrated, overreacting, and having angry outbursts.
Negative Self-Concept. This scale includes items about low self-confidence, feeling like a failure, and self-criticism.

CAARS 2 Content Scale scores are reported as T-scores (with confidence intervals of 90% or 95%) and percentiles. An elevated T-score indicates higher ratings in that area than are expected for the average person in the selected reference group. The higher the T-score, the greater the difference between the person being described and what is typical for the reference sample. For example, an Inattention/Executive Dysfunction T-score of 80 indicates that the rater described more issues with things like being easily distracted, concentrating on tasks, and sustaining attention than other similarly aged individuals from the selected reference sample. Conversely, an Impulsivity T-score of 45 indicates that the rater described typical levels of behaviors like speaking out of turn or acting impulsively (compared with the selected reference group). A Content Scale T-score below 60 indicates that the rater described fewer behaviors in this content area (compared with the selected reference group). Interpretation guidelines for these T-scores are displayed in Table 4.5. Note that scores bordering two categories warrant special consideration. For example, a T-score of 69 is in the “Elevated” range, according to Table 4.5, and a T-score of 70 is in the “Very Elevated” range, but this 1-point difference is not a meaningful distinction from a statistical or clinical perspective. Users may choose to describe a T-score of 69 or 70, for example, as “on the border of the Elevated and Very Elevated,” or “in the Elevated to Very Elevated range.” As T-scores approach 60 (i.e., the border between “Not Elevated” and “Slightly Elevated”), consider context and confidence intervals to help determine whether the score represents the high end of the average versus the low end of the clinical range. The guidelines shown in Table 4.5 are approximations and should not be used as absolute rules.

Click to expand

Table 4.5. Interpretation Guidelines for T-scores

T-score Range	Interpretation Guidelines
≥ 70	Very Elevated
65 to 69	Elevated
60 to 64	Slightly Elevated
< 60	Not Elevated

Note. These T-score Interpretation Guidelines are used for all CAARS 2 T-scores, including Content Scales and DSM Symptom Scales.

3b: Examine the DSM Symptom Scales

On the full-length CAARS 2, the DSM Symptom Scales reflect symptom-level information about ADHD from the Diagnostic and Statistical Manual of Mental Disorders Fifth Edition‒Text Revision (DSM-5-TR; American Psychological Association [APA], 2022). These scales are not included on the CAARS 2─Short or CAARS 2–ADHD Index forms.

DSM ADHD Inattentive Symptoms. This scale includes items representing the nine DSM-5-TR symptoms of Inattention. Scores for this scale include T-scores with confidence intervals, percentiles, and Symptom Counts.
DSM ADHD Hyperactive/Impulsive Symptoms. This scale includes items representing the nine DSM-5-TR symptoms of Hyperactivity and Impulsivity. Scores for this scale include T-scores with confidence intervals, percentiles, and Symptom Counts.
DSM Total ADHD Symptoms. Although the DSM-5-TR does not refer to a “total” score, this CAARS 2 scale is included to provide a dimensional look at overall symptoms of ADHD without regard for specific DSM presentations¹. As such, this scale is based on the total of all items from the CAARS 2 DSM ADHD Inattentive and Hyperactive/Impulsive Symptom Scales. Scores for this scale include T-scores with confidence intervals and percentiles. Given there is no DSM-5-TR total ADHD symptom score to parallel, no Total Symptom Count is calculated.

DSM Symptom Scale T-scores and percentiles on the CAARS 2 are relative scores, comparing an individual to the selected reference group. Like the Content Scales, an elevated DSM Symptom Scale T-score indicates higher ratings in that area than expected for the selected reference group (see interpretation guidelines in Table 4.5). These DSM Symptom Scale T-scores help users evaluate the first part of the DSM Criterion A; that symptoms must be “inconsistent with developmental level.” Although most clinicians are more conscious of developmental levels when evaluating children and adolescents, our data indicate they are a very real consideration when assessing ADHD in adults. Clear age-related differences in the frequency and severity of symptoms emerged among the thousands of people ranging in age from 18 to 70+ years old who comprised the Normative and ADHD Reference samples for the development of the CAARS 2 For example, whereas a raw score of 14 on the DSM Hyperactive/Impulsive Symptom Scale converts to a Not Elevated T-score of 57 for an 18-year-old, it translates to an Elevated T-score of 66 for a 75-year-old. (For additional information about age-related differences, please see the Standardization Procedures section of chapter 7, Standardization.)

In contrast, Symptom Counts on the CAARS 2 are absolute scores, based on an algorithm that is not compared with a reference group (i.e., not adjusted for age). These Symptom Counts suggest which of the 18 DSM-5-TR symptomatic criteria may be relevant to consider towards the DSM threshold of 5 or more symptoms within the Predominantly inattentive and Predominantly hyperactive/impulsive presentations. When a CAARS 2 DSM item is endorsed at a high enough level to meet the DSM-5-TR specification of “often,” a point is added to the Symptom Count. For the correspondence between DSM-5-TR Diagnostic Criterion A symptoms (APA, 2022) and CAARS 2 items, as well as additional technical information about calculating the Symptom Counts, please see appendix G. Each of the 18 symptoms in DSM ADHD Criterion A is represented by at least one item on the full-length CAARS 2 (see appendix G and chapter 6, Development, for additional information). The Symptom Counts for DSM ADHD Inattentive Symptoms and DSM ADHD Hyperactive/Impulsive Symptoms are displayed in the report Overview section, just above the DSM Symptom Scales graph.

Reference for CAARS 2 Symptom Counts

For adults, the DSM-5-TR requires at least 5 of the 9 inattentive symptoms to consider a diagnosis of ADHD Predominantly inattentive presentation, and at least 5 of the 9 hyperactive/impulsive symptoms to consider a diagnosis of ADHD Predominantly hyperactive/impulsive presentation. Individuals who meet these thresholds in both the inattentive and hyperactive/impulsive domains should be considered for a diagnosis of ADHD, Combined presentation.

The CAARS 2 Symptom Counts are not intended to be definitive, but indicators for clinicians that a diagnosis of ADHD may be relevant. Clinical judgment is necessary to confirm that the symptoms are present at sufficient levels and that other diagnostic criteria are met. For example, although a Symptom Count may fall above the DSM adult threshold of 5 or more symptoms per category, other CAARS 2 scales and additional information may suggest an ADHD diagnosis is unlikely (e.g., when symptoms are not persistent, present in at least two settings, impairing, or developmentally inappropriate). It is appropriate to describe symptoms of a particular ADHD presentation as prominent (or not prominent) after reviewing a CAARS 2 Symptom Count, but a diagnosis should never be made or rejected on the basis of a rating scale alone.

Given that relative T-scores and absolute Symptom Counts are based on different approaches, it is possible to have discrepant results within the CAARS 2 DSM Symptom Scales. These potential discrepancies are addressed in Step 3d.

There are several caveats regarding DSM content representation on the CAARS 2. These items are intended to represent key clinical constructs in ways that a layperson can understand. Rewording the professional language of the DSM may elicit more accurate responses but may also result in some aspects of the DSM criteria being incompletely represented. Furthermore, these items focus on DSM Diagnostic Criterion A only (i.e., inattentive and hyperactive-impulsive symptoms); evaluation of additional criteria (e.g., course, level of impairment, pervasiveness, differential diagnosis) must be completed before a DSM diagnosis can be assigned. In other words, critical components of the DSM symptom criteria cannot be covered by a rating scale and must be independently determined by the assessor.

Understanding DSM-Related Content on the CAARS 2

Rating scale data must be interpreted within the context of the full DSM criteria.

The art of diagnosis requires years of training and supervised practice, for which no rating scale can substitute. It is tempting to count symptoms from DSM Criterion A and move forward, but this neglects other essential elements contained in DSM guidelines (such as age of onset, presence across settings, persistence over time, impairment, and ruling out competing explanations for symptoms). Any diagnostic tool must be interpreted within the context of these required elements. When responsibly used, results from the CAARS 2 can provide a valuable source of information to inform diagnosis, guide intervention, and assess treatment progress. In addition to the Criterion A symptoms, the CAARS 2 can inform assessments of symptom presence across settings (through the use of observer forms), persistence over time (comparing serial administrations of the CAARS 2), and ADHD-related impairment (with the Impairment & Functional Outcome Items).

3c: Examine the CAARS 2–ADHD Index

The CAARS 2–ADHD Index suggests the probability of an ADHD classification by identifying whether an individual’s ratings are more similar to individuals who have an ADHD diagnosis or to individuals from the general population. The ADHD Index uses an age-based, combined-gender reference group; it does not differ by gender. This empirically derived 12-item index is available on all forms of the CAARS 2 (full-length, short, and the stand-alone ADHD Index), and is reported as a probability score along a continuum from 1% to 99%. In general, higher probability scores indicate more similarity with age-matched peers who have been diagnosed with ADHD (and less similarity with the General Population Sample). Conversely, lower ADHD Index scores indicate less similarity with the ADHD comparison group (and more similarity with the general population). Scores that are in the borderline range do not show a clear pattern of similarity with either group. Interpretation guidelines for the probability score ranges are displayed in Table 4.6. When an ADHD Index probability score is elevated, examine other CAARS 2 metrics to determine whether there is additional support for a possible diagnosis of ADHD. When an ADHD Index score is borderline or low, the person may still qualify for a diagnosis of ADHD, but with the recognition that their presentation differs from what is typically reported for individuals with ADHD on these 12 items. Most of the individuals in the ADHD reference sample had an ADHD Index probability score of 60% or higher; however, a small portion of this sample scored below 60%. Do not rely on one score to determine a diagnosis; consider how the ADHD Index relates to other scores within the CAARS 2 as well as gathering data from other raters and from other methods of assessment. For examples of CAARS 2–ADHD Index interpretation, please see chapter 5, Case Studies (particularly Case 2, “Maria” and Case 4, “David”).

Click to expand

Table 4.6. Interpretation Guidelines for ADHD Index Probability Score

Probability Score	Guideline	Interpretation
90% to 99%	Very High	Scores in this range have very high similarities to scores from individuals who have ADHD and are very dissimilar to scores from individuals in the general population.
60% to 89%	High	Scores in this range have high similarity to scores from individuals who have ADHD and are dissimilar to scores from individuals in the general population.
40% to 59%	Borderline	Scores in this range do not have clear similarities to scores from one group over the other (i.e., individuals who have ADHD versus individuals in the general population).
10% to 39%	Low	Scores in this range have low similarity to scores from individuals who have ADHD and are more similar to scores from individuals in the general population.
1% to 9%	Very Low	Scores in this range have very low similarity to scores from individuals who have ADHD and are much more similar to scores from individuals in the general population.

3d: Review the Profile of CAARS 2 scales

After reviewing each individual score from the Content Scales, DSM Symptom Scales, and ADHD Index, examine how they relate to each other, paying particular attention to when their results are consistent and when they are discrepant. Note that this interpretive step focuses on similarities and differences within a single rater’s CAARS 2 Scales profile. See Step 5: Integrate and Compare CAARS 2 Results (Across Raters and Across Time) for considering discrepancies across raters or time.

Interpretation is straightforward when various metrics are aligned across Content Scales, DSM Symptom Scales, and the ADHD Index. For example, perhaps all scores related to Hyperactivity and Impulsivity are high, including corresponding Content Scales (Hyperactivity and Impulsivity), DSM Hyperactive/Impulsive Symptoms T-score and Symptom Count, and ADHD Index probability score. In the absence of elevated scores related to inattention, this set of CAARS 2 results would provide strong support for considering a diagnosis of ADHD Predominantly hyperactive/impulsive presentation, as long as other DSM-5-TR criteria were also met.

When faced with a CAARS 2 profile with seemingly inconsistent scores, a discrepancy does not necessarily reflect an error or invalid administration (although it is always wise to double-check data entry if the CAARS 2 was administered through a paper form, and to consider possible response style concerns). The CAARS 2 entails multiple perspectives to enrich a user’s understanding of each person being rated, and these different metrics provide different types of information. When multiple CAARS 2 metrics associated with ADHD are flagged, responsible clinicians will still confirm symptoms along with other DSM criteria, but with greater confidence that a diagnosis of ADHD is likely. When ADHD-related metrics on the CAARS 2 are not aligned, a diagnosis of ADHD is still possible, but clinicians will likely need to dig deeper into the data, follow up with raters, and possibly collect additional information in order to understand the discrepancies and determine whether a preponderance of the data still suggest that an ADHD diagnosis should be considered.

It is important to understand how a single rater’s results on the different CAARS 2 scales relate to each other, including similarities and discrepancies among (a) Content versus DSM Symptom Scales; (b) DSM T-score versus Symptom Count; and (c) DSM Total ADHD Symptoms, DSM ADHD Inattentive Symptoms, DSM ADHD Hyperactive/Impulsive Symptoms, and ADHD Index.

Relationships Between Content Scales and DSM Symptom Scale T-scores

On the full-length CAARS 2, the DSM Symptom Scales are subsets of three Content Scales (Inattention/Executive Dysfunction, Hyperactivity, and Impulsivity; the Items by Scale section of the report indicates which Content Scale items are included in the DSM Symptom Scales). Because the Content Scales include additional items not covered by the DSM scales, it is reasonable to expect that there will be times when Content Scales produce different scores from the DSM Symptom Scales.

When corresponding Content and DSM Symptom Scales are aligned (i.e., both elevated or both not elevated), interpretation is straightforward as the same broad interpretive guideline applies to all. For example, if the Hyperactivity Content Scale T-score is 67, the Impulsivity Content Scale T-score is 88, and the DSM Hyperactive/Impulsive Symptoms Scale T-score is 81, all are elevated (although at different levels) and interpretation is fairly straightforward. In some instances, it would be sufficient to report that this person had more symptoms of hyperactivity and impulsivity than typically seen at their age, with the option of elaborating that impulsivity was more prominent than hyperactivity (followed by item-level details such as described in Step 4: Review Item-level Responses of the CAARS 2 Scales Impairment & Functional Outcome Items, and Additional Questions).

Although less common, it is mathematically possible to have Content and DSM Symptom Scales that fall into different interpretive categories. Consider an example where the T-score on the Inattention/Executive Dysfunction Content Scale is 66 (“Elevated”) and the T-score on the DSM Inattentive Symptoms Scale is 52 (“Not Elevated”). The most likely explanation for this discrepancy is that the non-DSM items were rated more highly than the DSM items; taking a quick look at the optional Items By Scale section of the report will help clarify if this is the case. Conversely, if the Inattention/Executive Dysfunction Content Scale is notably lower than the DSM Inattentive Symptoms Scale (e.g., T-scores of 49 and 68, respectively), it is likely that the DSM items were all rated at high levels and the non-DSM items were rated much lower.

Relationships Between DSM T-scores and Symptom Counts

Even though T-scores and Symptom Counts are different types of scores, they are calculated from the same DSM Symptom Scale items. When a person presents with classic symptoms of ADHD (including many of the Criterion A features) that are endorsed at higher levels than expected for the selected reference group, their DSM Symptom Scale T-scores and Symptom Counts are both likely to be elevated and aligned within a given ADHD presentation type (viz., Predominantly inattentive or Predominantly hyperactive/impulsive, or Combined if features of both are prominent). In contrast, for a person in the general population (specifically, someone who does not present with many features of ADHD), neither T-scores nor Symptom Counts will be elevated for the DSM Symptom Scales. In fact, for most of the General Population Sample, DSM Symptom Scale T-scores and Symptom Counts are consistent with each other. While many in the ADHD Reference Sample had similar patterns of elevation when comparing T-scores and Symptom Counts, this was not always the case.

The DSM Symptom Scale scores will not always be aligned, however. Given that DSM T-scores and Symptom Counts are different types of metrics (relative vs. absolute) that convey different types of information, they can diverge. The DSM Symptom Scale T-scores help the assessor determine if the person’s reported level of symptoms is higher than expected for that comparison group. In contrast, the CAARS 2 Symptom Counts provide information about how many symptoms were rated at high levels, based on language consistent with the DSM Criterion A. The Symptom Counts are not compared with any of the reference groups (viz., age, gender, or diagnostic group).

Keeping in mind the importance of interpreting rating scale data within a broader context, the pattern of DSM Symptom Scale scores may suggest which presentation of ADHD (if any) is most prominent. Several common patterns are summarized here (see also Table 4.7). Bear in mind that these descriptions refer to DSM Symptom Scale T-score and Symptom Count within a single ADHD category (viz., Inattentive, Hyperactive/Impulsive). When T-scores and Symptom Counts for both DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales are high, consider the possibility of ADHD Combined presentation.

High-high pattern. When the DSM Symptom Scale T-score is elevated (60 or higher) and the corresponding Symptom Count is above the DSM threshold (5 or more) within an ADHD category, symptoms of that particular presentation appear to be more prominent.
High-low pattern. Sometimes an atypically high T-score is not accompanied by a high Symptom Count within an ADHD category. Although an elevated DSM Symptom Scale T-score indicates the person has more features of a particular ADHD presentation than expected for the selected reference group (that is, inconsistent with developmental level), this set of ratings falls below the DSM-5-TR requirement of “at least five symptoms.” If other data support a diagnosis of ADHD, and other disorders and contextual factors have been eliminated as explanations for the person’s presentation, it may be relevant to consider a diagnosis of Other Specified ADHD, noting “with insufficient symptoms” for a specific presentation type.
Borderline-high pattern. Borderline data can be difficult to interpret, such as when the DSM Symptom Scale T-score for a given ADHD presentation is “Slightly Elevated” (60‒64), but the corresponding Symptom Count is 5 or higher. This pattern describes a person who has slightly more features of that presentation than expected for the selected reference group, and at least 5 of the corresponding DSM symptoms occur often. Although the evidence is not as convincing in terms of displaying “more symptoms than expected for developmental level,” a possible diagnosis of ADHD may be appropriate, once all the data emerging from the CAARS 2 and the other components of the evaluation are considered.
Low-high pattern. An even more challenging situation arises when the T-score for a given ADHD category is not elevated (< 60), but the corresponding Symptom Count meets or exceeds the DSM threshold of 5. This combination does not meet DSM-5-TR criteria that symptoms are atypical for developmental level, but many DSM features were endorsed as occurring at least “often.” Although many clinicians disregard the developmental caution in the DSM, it would be prudent to proceed cautiously. If other data indicate persistent and impairing features of ADHD that are present in at least two settings, consider using Other Specified ADHD and indicating that the symptoms do not appear to be developmentally atypical.
Low-low pattern. When the DSM Symptom Scale T-score is not elevated (< 60) and the corresponding Symptom Count is below the DSM-5-TR threshold of 5, then that particular presentation of ADHD appears unlikely.

The DSM was created from a general perspective to facilitate use across the population as a whole. Although this approach makes it easier to use the DSM, it results in a loss of some sensitivity and specificity in different age and gender groups (e.g., Hart et al., 1995; Waschbusch & King, 2006). An advantage of DSM Symptom Scale T-scores on the CAARS 2 is that they take age into account (and gender, if that option is selected); this means they can be more sensitive to atypicality, even when symptoms do not meet the absolute symptom count threshold. For an example of a mismatch between DSM T-scores and Symptom Counts, see Observer data for Case 4, “David” in chapter 5, Case Studies.

Click to expand

Table 4.7. Interpretation Guidelines for Integrating DSM Symptom Scale Scores

T-score	Symptom Count	Interpretation Guidelines
≥ 65	≥ 5	Based on this pattern of results, clinically significant DSM symptoms of ADHD are prominent. A diagnosis of ADHD merits further consideration.¹
≥ 60	< 5	This pattern of results is inconclusive; however, a diagnosis of ADHD cannot be ruled out given that ratings on the DSM Symptoms scales exceeded what is typically reported by similarly-aged individuals (despite the number of endorsed symptoms being lower than the DSM threshold for adults). If other sources of information suggest the possibility of ADHD, a classification of ADHD Other Specified may merit further consideration provided that other disorders or contextual factors do not better account for the client's presentation.¹
60-64	≥ 5	This pattern of results suggests the presence of DSM symptoms of ADHD. A possible ADHD diagnosis merits further consideration.¹
< 60	≥ 5	A DSM ADHD diagnosis appears unlikely; however, a diagnosis of ADHD cannot be ruled out given the endorsement of numerous ADHD symptoms (despite these symptoms not exceeding what is typically reported by similarly-aged individuals).¹
< 60	< 5	This pattern of results suggests symptoms of ADHD are not prominent. A DSM ADHD diagnosis appears unlikely. However, it is important to consider additional sources of information before ruling out a diagnosis.²

¹ It is essential to consider additional sources of information, to examine whether symptoms are present to an atypical degree, and to determine if both the symptomatic and the additional criteria specified in the DSM are met before assigning a diagnosis.

² In the special subthreshold case where the T-score falls just below the "Elevated" range (T-score = 55─59) and the Symptom Count is 4 (falling just below the DSM threshold for adults), a DSM ADHD diagnosis continues to appear unlikely; however, the endorsement of numerous symptoms of ADHD may warrant further investigation.

Relationships Among DSM Total ADHD Symptoms, DSM ADHD Inattentive Symptoms, DSM ADHD Hyperactive/Impulsive Symptoms, and ADHD Index

A person with a prototypical presentation of ADHD is likely to have elevated scores for all of these metrics, including the DSM Total ADHD Symptoms T-score, DSM Symptom Scale scores (T-score and Symptom Count, in one or both presentation types), and the ADHD Index probability score. A person who does not have ADHD is likely to have scores that are not elevated across all of these metrics.

It is possible for scores on these metrics to be discrepant since they assess ADHD from different perspectives. The DSM Total ADHD Symptoms T-score is calculated from the sum of item-level scores for the two DSM-based Symptom Scales, taking into account age-based expectations (and gender, when a gender-specific reference group is selected). The DSM Total ADHD Symptoms score helps users describe whether symptoms of ADHD are being reported at higher-than-expected levels. The DSM ADHD Inattentive Symptoms Scale and the DSM ADHD Hyperactive/Impulsive Symptoms Scale are subsets of the DSM Total ADHD Symptoms Scale. Whereas these three scales are based on the DSM-5-TR ADHD criteria, the ADHD Index was built from statistical analyses of which items best differentiated between individuals with and without ADHD. There is very little overlap (viz., only two shared items) between the ADHD Index items and the DSM Symptom Scales. The ADHD Index probability score captures whether an individual’s ratings are more similar to people in the ADHD Reference Sample or in the selected General Population Sample. Interpretive considerations are offered below.

DSM Total ADHD Symptoms versus DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales. The DSM Total ADHD Symptoms T-score is based on responses to the same items that are used to determine the DSM Symptom Scale scores (T-scores and Symptom Counts).
- When the DSM Total ADHD Symptoms T-score is not elevated (T-score < 60) but one or both Symptom Counts (for ADHD Inattentive and/or ADHD Hyperactive/Impulsive) are at or above the DSM threshold (5 or higher), review each of the DSM Symptom Scales in the Items by Scale section of the report. You will likely notice a handful of items rated at a 2 or 3 (“Pretty much true; Often/Quite a bit” or “Completely true; Very Often/Always” ) and most of the others rated at a 0 or 1 (“Not true at all; Never/Rarely” or “Just a little true; Occasionally” ). This pattern suggests that the person has some features of ADHD, although not more than expected for their age (and gender, if a gender-specific reference group was selected). Consider additional sources of information to better understand why features of ADHD are not more prominent on the CAARS 2, and to determine whether there is sufficient support to consider a DSM diagnosis of ADHD.
- When the DSM Total ADHD Symptoms T-score is elevated (T-score ≥ 60) with Symptom Count(s) below the DSM threshold of five, item-level analysis of the DSM Symptom Scales will likely reveal many low-level endorsements of 1 (“Just a little true; Occasionally”) with less than five items rated at a 2 or 3 (“Pretty much true; Often/Quite a bit” or “Completely true; Very Often/Always”). This pattern describes a person with more features of ADHD than typically reported for their age group (and gender, if a gender-specific reference group was selected), but with fewer than five items endorsed at the DSM-specified frequency level of “often.” If all available data suggest the possibility of ADHD, a classification of Other Specified ADHD may be appropriate to consider (noting more features than expected for this age but failure to meet strict DSM guidelines for Criterion A).
- It is mathematically possible to have an elevated DSM Total ADHD Symptoms T-score (i.e., equal to or above 60) even when both T-scores for DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales are not elevated (T-scores < 60). However, in these rare cases, the T-scores are never significantly discrepant. In other words, this scenario only happens when both DSM Symptom Scales have T-scores in the upper end of the not-elevated range (such as 58 or 59) and the DSM Total ADHD Symptoms T-score is very close to 60. Even though these scores technically fall into different interpretive categories, the difference between 58 and 60 is not meaningful. It is appropriate to consider these cases as “borderline” (meaning the border between “Not Elevated” and “Elevated”) and refer to other sources of information to guide interpretation. If other sources of information suggest the possibility of ADHD, a classification of ADHD Other Specified may merit further consideration provided that other disorders or contextual factors do not better account for the client’s presentation.
- Note that the DSM Total ADHD Symptoms T-score will always be elevated (60 or higher) when both DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales have elevated T-scores.
DSM Total ADHD Symptoms versus ADHD Index. Although these metrics were developed through different methodologies and only have two items in common, they were highly correlated in the normative samples. In most cases, these two scores will be aligned. However, it is possible to obtain discrepancies as group data do not always describe individuals accurately.
- When the DSM Total ADHD Symptoms T-score is elevated (T-score ≥ 60) and the ADHD Index probability score is 59% or lower, the person being described has many textbook features of ADHD but does not show a similar pattern to what is often seen for individuals diagnosed with ADHD in a similar age group. (Note that the ADHD Index does not differentiate by gender.) Consider reviewing the Response Style Analysis to examine hypotheses such as a strong preconceived notion that the person had ADHD or a possible diagnosis-seeking for secondary gain. It is possible to have ADHD even when the ADHD Index probability score is 59% or lower. This pattern simply means the person’s presentation is different from the ADHD sample used in the development of the CAARS 2. Because the ADHD Index is based on different items than the DSM Total ADHD Symptoms Scale, it is not surprising to find discrepancies between the ADHD Index and the DSM Total ADHD Symptoms T-score, particularly in the general population.
- Conversely, if the DSM Total ADHD Symptoms T-score is not elevated (T-score < 60) but the ADHD Index probability score is High or Very High (60% or higher), the person has similarities to the ADHD Reference Group, although DSM symptoms of ADHD were not endorsed at very high levels. Consider other diagnostic possibilities that could account for this mismatch in ratings. The ADHD Index was designed to discriminate between an ADHD Reference Sample and a General Population Sample, not to differentiate among various clinical groups. Other scenarios that can produce this mismatch include benefits from accommodated settings and/or effective interventions that improve core DSM features of ADHD but do not have as much impact on behaviors captured by the ADHD Index.
DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales versus ADHD Index. Because the DSM Total ADHD Symptoms scale contains all of the items from the DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scales, much of the information provided in the DSM Total ADHD Symptoms versus ADHD Index is relevant to this comparison. Specific considerations are described below.
- When one of the DSM ADHD Inattentive or DSM ADHD Hyperactive/Impulsive Symptom Scale T-scores is elevated (T-score ≥ 60) and/or one of the Symptom Counts meets or exceeds the DSM threshold of five, but the ADHD Index probability score is 59% or lower, review the DSM Symptom Scale items and the ADHD Index items in the Items by Scale section of the report. A common reason for this pattern is that the person being described has many classic symptoms of that particular subtype of ADHD, but they are not closely aligned with the CAARS 2 ADHD Reference Sample.
- Similarly, when both the DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scale T-scores are elevated (T-score ≥ 60) or both of the Symptom Counts meet or exceed the DSM threshold of five with an ADHD Index probability score that is 59% or lower, the person being described has many textbook symptoms of ADHD Combined presentation, but they do not have high similarity with the CAARS 2 ADHD Reference Sample.
- Conversely, if the DSM ADHD Inattentive and DSM ADHD Hyperactive/Impulsive Symptom Scale T-scores are not elevated and the Symptom Counts are below the DSM threshold but the ADHD Index probability score is 60% or higher, the person has similarities to the ADHD Reference Sample, although specific DSM symptoms of ADHD were not endorsed at very high levels.

Step 4: Review Item-level Responses on the CAARS 2 Scales, Impairment & Functional Outcome Items, and Additional Questions

Having built a scaffold with results from Response Style Analysis, Associated Clinical Concern Items, and CAARS 2 Scale scores, Step 4 addresses the item-level analysis of CAARS 2 data, including specific items on the Content Scales, DSM Symptom Scales, and ADHD Index, as well as the Impairment & Functional Outcome Items and the Additional Questions.

Items Within Scales

Consider individual items that contributed to the Content Scale, DSM Symptom Scale, and ADHD Index scores. Note that some users may choose to do this within Step 3 as they are reviewing scores from each of the scales. Item-level review is especially relevant when elevated scores are obtained, as individual item elevations can help identify specific behaviors to target or track in treatment. For example, it would make sense to address motoric impulsivity rather than verbal impulsivity if motoric impulsivity items led to an elevated Impulsivity T-score. Even when there are no elevations for T-scores, Symptom Counts, or the probability score, an item-level review can help identify specific items that are rated at high levels. For instance, if a few items on a scale are rated at 3 (“Completely true; Very often/Always”), and the rest are rated mostly at 0 (“Not true at all; Never/Rarely”), some of the highly rated items might enhance one’s understanding of the person being rated and be meaningful treatment targets despite the scale as a whole not being elevated.

Within the Items by Scale section of the report, individual items for the Content and DSM Symptom scales are flagged as “Elevated” if they were endorsed at higher levels than expected for individuals in that age group (and gender, when the gender-specific reference group is selected). Note that it is possible to have an elevated scale score without any individual items on the scale being rated at a high-enough level to be considered “Elevated.” This pattern may reflect a broad but low or moderate magnitude array of symptoms in the relevant problem domain, which typically occurs when multiple items are endorsed just below the threshold for item-level elevation (i.e., not triggering an item-level flag), but the sum of the items is higher than expected relative to the selected reference group.

While reviewing items within the CAARS 2 scales, also consider omitted items. In addition to informing Response Style, omitted items can impact scales (both scores and content). When a CAARS 2 scale score has been prorated, it is essential to review the items that contributed to that estimated score. Although prorating scores when a limited number of items are omitted is justified, prorated scores can sometimes be misleading. For example, if a rater marked many items on the Hyperactivity scale at a 0 (“Not true at all; Never/Rarely”) or 1 (“Just a little true; Occasionally”) but skipped a few items that they would have rated a 2 (“Pretty much true; Often/Quite a bit”) or 3 (“Completely true; Very Often/Always”), the prorated score would underestimate the level of hyperactivity. Conversely, if a rater endorsed several items on the Impulsivity scale as a 3 but skipped an item that would have been a 0, the prorated score might overestimate their level of impulsivity.

Interpretation Guidelines for Item-level Elevations

The Items by Scale section of each CAARS 2 report indicates a rater’s response to items in the Content and DSM Symptom scales, as well as whether their endorsement is “Elevated” or “Not Elevated.” An item is flagged as elevated when it was endorsed at higher levels than expected for individuals in the selected Principal Reference Sample.

Impairment & Functional Outcome Items

The full-length CAARS 2 includes 13 Impairment & Functional Outcome items to help users capture important information about the level of impairment related to problems described on the CAARS 2 and key aspects of functioning that are often affected by symptoms of ADHD. Please see appendix A for a listing of these items. The Impairment & Functional Outcome items are grouped together toward the end of the CAARS 2 and include three types of questions:

Global questions ask the degree to which issues endorsed on the CAARS 2 bother the person or interfere with their life. This information can contribute toward the assessment of distress and impairment.
Items concerning problems in social functioning (including relationships with romantic/marital partners, family members, friends/co-workers/neighbors) and occupational/academic functioning help identify areas of impairment to explore in more depth
Finally, specific examples of functional outcomes describe difficulties in areas that are often impacted by ADHD, such as achievement, sleep, financial management, domestic responsibilities, driving, and time spent online.

The report’s Overview—Impairment & Functional Outcome Items section lists all items and indicates whether any of the items are endorsed at a level higher than expected for the selected reference group (indicated as “Elevated;” see Figure 4.1). Information about the response frequencies of the Impairment & Functional Outcome Items can be obtained in appendix F.

Some of the Impairment & Functional Outcome Items allow a “Not Applicable” response (easily identified through the report Overview section) to accommodate individuals who have no experience with a particular domain, such as driving, certain types of relationships, and work/school. Follow-up questions regarding why raters marked “Not Applicable” may illuminate possible diagnostic or treatment issues (e.g., past driving trauma, limited social connections, history of unemployment). It is important to recognize that it is possible to have a diagnosis of ADHD even when there are no elevations in the Impairment & Functional Outcome Items, particularly for an individual who is in an optimized environment with many supports and few demands. However, the DSM-5-TR (APA, 2022) requires evidence of clinically significant impairment in social, academic, and/or occupational functioning, as well as the presence of symptoms in at least two settings (e.g., home, work) for a diagnosis of ADHD. The absence of impairment in an otherwise convincing presentation of ADHD may suggest a diagnosis of Other Specified ADHD.

Responses to the Impairment & Functional Outcome Items can provide catalysts for conversation between a clinician and a client, which, in turn, may inform additional assessment, differential diagnosis, and treatment targets. For example, if a rater reports problems in the key functional domains of relationship, home/family, or work/school, follow-up might include an investigation of the connections between these domains and specific concerns reported on the CAARS 2. Endorsement of sleep problems might suggest the use of a sleep-specific assessment tool such as the PROMIS (see appendix H for additional information). Understanding how symptoms impact functioning is an essential part of the diagnostic and treatment planning process.

Interpretation Guidelines for Impairment & Functional Outcome Items

In addition to the qualitative review, two quantitative data points are provided for each of these items. When an Impairment & Functional Outcome Item is rated in the top quartile for the selected reference group, it is identified as “Elevated” in the report’s Overview section. Appendix F provides response frequencies for Impairment & Functional Outcome items to describe how often each item was rated this high (or higher) in the selected Principal Reference Sample (based on age, or age and gender, depending on selection).

Additional Questions

The full-length CAARS 2 and CAARS 2–Short end with two open-ended questions asking about other issues/problems and main strengths/skills. Rater responses are provided in the Overview—Additional Questions section at the end of the report. These unscored qualitative items often provide valuable information. Many raters use the “issues/problems” item to emphasize difficulties already reflected in their ratings, which can suggest high levels of concern about the issues, a strong desire to be heard, or past experiences of being ignored. Sometimes new content will be mentioned here, which can guide additional interview questions or suggest other areas for assessment.

The “strengths/skills” item gives raters a chance to describe positive qualities that can impact prognosis and be incorporated into treatment planning. When an Observer cannot think of any strengths or skills to describe, this absence may indicate extreme frustration or a negative response bias; for Self-Report, the absence of strengths/skills can suggest low self-esteem (an issue that may also be reflected in the Negative Self-Concept content scale). The placement of the strengths/skills item as the last on the scale was intentional, providing raters a chance to end on a positive note to counter the prior focus on problem areas. Identifying strengths and skills is critical for treatment planning as they can be leveraged to improve functioning, compensate for limitations, and bolster self-esteem.

Step 5: Integrate and Compare CAARS 2 Results (Across Raters and Across Time)

The final step of the CAARS 2 interpretation sequence involves integrating all of the rich data obtained through the previous steps. After summarizing findings for every single rater, CAARS 2 results can be compared across multiple raters and/or multiple points in time. The interpretive context of a multi-informant, multi-modal evaluation is helpful in evaluating different possibilities generated when reviewing data from a single rating scale. Although this manual focuses on the interpretation of the CAARS 2 specifically, users are strongly encouraged to continue this integration process by combining these scores with information gained through other methods (e.g., clinician observations, interview data, record review).

Within a Single Rater

Steps 1 through 4 provide all the necessary data to reach this final step of interpreting a single CAARS 2 form. Begin Step 5 by reviewing your notes from Steps 1 through 4, including responses to the Additional Questions. Then return to the report’s Overview section (see Figure 4.1), and note any flagged areas. Examine the overall profile of T-scores and consider the following interpretive suggestions:

A few elevated T-scores (60 or higher) in the context of many T-scores that are not elevated (less than 60) can suggest areas that may be important for diagnosis and/or treatment planning.
A profile with many elevated T-scores (60 or higher) may indicate pervasive difficulties across all aspects of ADHD and related concerns, particularly when many Impairment & Functional Outcome Items are elevated. This profile may also reflect an exaggeration of symptoms, particularly if the Negative Impression Index is also flagged; this pattern is often seen when a person has not felt heard in the past, when they are motivated for a diagnosis to be assigned, or when they perceive that many aspects of their life are not going well.
A profile where most of the T-scores are between 40 and 59 can indicate that the person’s presentation is typical for the selected reference sample, particularly when accompanied by a Negative Impression Index that is within the expected range (i.e., not flagged in the Response Style Analysis).
A profile with many T-scores below 40 indicates fewer concerns than typical for the general population, which is uncommon for a clinical referral. In addition to the possibility that the person being rated has fewer difficulties than others their age (and gender, if selected), it may be that a rater has limited insight or awareness of symptoms, that they are in denial, or that they are purposefully minimizing symptoms. Ratings from observers who are not very familiar with a person’s daily functioning can also produce a low profile.

When results from the different components of the CAARS 2 are aligned and there are no concerns related to Response Style, Step 5 is quite straightforward. Summarize areas of concern, supplementing with item-level information. When there are discrepancies amongst various scores, look for additional information to guide the interpretation of those results. Examine items contributing to each composite score, especially elevated scores; the Items by Scale section of the report is valuable here (or see appendix A for a list of Items by Scale).

Whether there are discrepancies or not, it is possible that some elevated T-scores (or percentiles) may be based on a handful of very high ratings, whereas others may reflect numerous lower-level responses. Understanding the source of an elevated CAARS 2 scale score will improve diagnostic accuracy as well as help identify meaningful treatment targets. A review of individual items, although always indicated, is especially important for scores that were prorated or could not be calculated due to omitted items, as it can be enlightening to notice any patterns among skipped items, as well as how other items were rated. Extreme ratings (high or low) on items within incomplete scales can skew a prorated score; make note of any extreme ratings. Glance over the entire CAARS 2 and attend to items with the highest ratings as these may suggest areas to prioritize in treatment.

Integrate results from the CAARS 2 with other sources of information, such as background history, clinical observations, observer data, past evaluations, available records, and test data. No rating scale can be responsibly interpreted without this essential context and a diagnosis cannot be assigned on the basis of DSM Criterion A (i.e., symptoms) alone. Use the CAARS 2 as a way to gather relevant data to inform diagnosis, treatment planning, and intervention. Avoid the temptation to rely on a single rating scale or a single score as a source of information.

Comparing Results

After each individual CAARS 2 form is interpreted, results can be compared across raters and across points in time. In a clinical setting, CAARS 2 multi-rater comparisons can be used to corroborate findings across raters and to assess the pervasiveness of symptoms across settings, whereas repeated assessments can be used to describe the persistence of symptoms over time and/or response to intervention. In all these scenarios, there are three key questions:

Do the results differ across raters or time?
Are any observed differences statistically significant?
Are the differences clinically meaningful?

Eyeballing the data is not sufficient, because two scores that look different at a casual glance could actually be statistically similar. Evaluating whether differences between raters or across time are meaningful requires statistical finesse, to increase the certainty that the change is real and not simply the result of random fluctuations or measurement error. Mathematical calculations that take the standard error of measurement into account (see chapter 8, Reliability) help identify whether the difference between two scores exceeds what would be expected to occur by chance (See appendix I for steps to determine statistical significance when comparing CAARS 2 scores between two raters or across two points in time). Be cautious about assuming that two scores are different; take the time to compare them statistically before interpreting the difference.

Not all statistically significant differences are clinically meaningful. First, consider a comparison between two observers who both endorsed extremely high levels of impulsivity for a 20-year-old person (T-scores of 85 and 70, full-length CAARS 2, Normative Sample–Combined Gender). The difference between Impulsivity T-scores of 85 and 70 is 15, much higher than the critical value of 7 (appendix I, Table I.2a), and therefore statistically significant. However, both scores are considered “Very Elevated” on the CAARS 2 and represent high levels of impulsivity as observed by two different raters. The clinician must consider these results in the context of all available information to determine if the statistically significant difference between observers/settings is clinically meaningful. It would be reasonable to report that features of impulsivity are prominent in both settings. Similarly, consider the comparison of results obtained at two different points in time, such as a Time 1 Negative Self-Concept T-score of 71 and a Time 2 T-score of 64 (both full-length CAARS 2 Observer forms). Although the difference between these two scores is not statistically different (as 7 is less than the look-up value of 12 in Table I.5), the interval change suggests a trend toward clinically meaningful improvement, particularly if accompanied by narrative examples of improved self-concept in daily interactions and observed progress within therapy.

When comparing scores, whether across multiple raters or across different points in time, always start with examining statistical significance and follow up with consideration of whether the differences are clinically meaningful. Both evaluative perspectives on these comparisons are essential.

Comparing Results Across Multiple Raters

When completing a clinical evaluation of possible adult ADHD, try to obtain information from more than one set of CAARS 2 ratings, ideally including a self-report and at least one observer. Supplementing self-report ratings with ratings from one or more observers can help establish that symptoms are present in more than one setting (e.g., home and work), as required in DSM Criterion C. The CAARS 2 can help a clinician gather information from multiple sources without traveling to different locations or interviewing multiple people. When Self-Report and Observer scores are consistent with each other, this pattern can suggest reasonable insight and awareness on the part of the person being rated. Differences between Self-Report and Observer scores identify areas that require additional consideration by the examiner. In some cases, inter-rater discrepancies occur due to over-reporting of problems (e.g., for secondary gain, or due to high levels of distress) or under-reporting (e.g., due to limited awareness, embarrassment, or denial). Sometimes, observers under-report problems because they do not have the opportunity to see the individual in certain settings (e.g., parents rating a young adult who lives elsewhere while attending school) or because symptoms are masked by supports (e.g., a personal assistant who corrects errors in the workplace). Sometimes discrepancies are not due to either over- or under-reporting of problems, but rather reflect actual differences in the individual’s functioning in different contexts. For example, an individual may be able to control their emotions in a workplace setting, but they may lose control in less structured settings like home. Similarly, an individual may lose the ability to concentrate in noisy environments (e.g., when colleagues are talking or there is construction in the vicinity), but they may find it easy to work effectively in a quiet setting (e.g., a private office). Understanding the basis for discrepancies can provide insight into protective factors that are associated with improved functioning, as well as risk factors that limit success. This knowledge can be essential when teaching a person to use situational engineering and self-advocacy to optimize their environment.

Comparing Results Across Different Points in Time

When the same rater completes the CAARS 2 more than once, the scores from those different points in time can be compared. For example, CAARS 2 Self-Report scores from an initial evaluation might be contrasted with self-report data after engaging in executive coaching, or a spouse might complete the CAARS 2 Observer several times during a medication trial where the dosage is varied. When comparing results over time, it is essential to obtain ratings from the same individual (e.g., it would not be appropriate to statistically compare baseline data from a roommate with follow-up data from a new spouse).

A significant decrease in CAARS 2 scores from Time 1 to Time 2 suggests improvement whereas a significant increase in scores may indicate a worsening of symptoms and/or an increased awareness of them. When the difference between assessment points is not statistically significant, the scores should not be interpreted as showing meaningful change. However, change is often gradual; scores that are not statistically discrepant might still indicate a trend toward clinically meaningful change. In such cases, it would be important to confirm the trend by reassessing after another month of treatment. For example, consider results from someone who completed the CAARS 2 prior to treatment (Time 1), 1 month into treatment (Time 2), and 2 months into treatment (Time 3). The difference between Time 1 and Time 2 might not be significant, and the difference between Time 2 and Time 3 might not be significant, but comparing Time 1 and Time 3 may capture statistically and clinically significant change that has occurred

Knowing the context for why a rater completed the CAARS 2 more than once can guide the interpretation of stable or changing scores. For example, sometimes the CAARS 2 is given twice before starting treatment to establish the persistence or course of symptoms, and other times it is given before, during, and after treatment to assess interval change. The CAARS 2 might also be re-administered to assess adjustment after a change in life circumstances, or before a particular event to document a need for accommodations. (Please see Repeated Administrations of the CAARS 2 in chapter 3, Administration and Scoring, for additional information.)

Similar scores from two different points in time suggest that the person’s presentation has not shifted significantly, or that the amount of change is consistent with age-related expectations for their selected reference group. Changes between CAARS 2 scores over two administrations suggest variation in the person’s presentation. Increased elevations in CAARS 2 scores over time indicate worsening of symptoms (i.e., have increased in prominence, frequency, and/or severity) and/or an increased awareness of symptoms; decreased scores suggest changes that have positively impacted ADHD symptoms and/or associated features.

¹ A single score does not rule in or rule out a diagnosis. Although many individuals in the ADHD Reference Sample had an Elevated DSM Total ADHD Symptoms T-score, there were some individuals with a clinical diagnosis of ADHD who scored in the Not Elevated range.

< Back

Next >