CAARS 2 Manual Chapter 12: Development |
The creation of the CAARS 2–ADHD Index began with the identification of items from the full-length CAARS 2 that best
differentiated between individuals from the general population (i.e., without a clinical diagnosis) and those with
ADHD. The ADHD Index was designed to comprise a limited number of the most discriminating items, so as to provide a
brief and efficient tool that could be used independently (e.g., for screening purposes) or in conjunction with
other CAARS 2 scores. Development of the CAARS 2–ADHD Index began upon the completion of the CAARS 2 standardization
phase (see
Item
Development in chapter 6,
Development) and relied on the same samples
and item pool to identify the
best-performing items.
Samples
To begin the item selection phase, the Total Samples of the CAARS 2 (see Standardization Phase in chapter 6, Development)
were divided into two subsets: individuals without a clinical diagnosis (non-clinical General Population samples)
and the ADHD Reference Samples. Note that individuals diagnosed with ADHD-Hyperactive/Impulsive Presentation were
excluded from the analyses due to their underrepresentation in the ADHD Reference Sample (N = 10 for
Self-Report; N
= 9 for Observer). The General Population samples were substantially larger than the ADHD samples; however,
unbalanced samples can result in problems when classifying groups (as prediction will naturally favor the larger
group). Therefore, samples were balanced by matching individuals in the ADHD Reference sample with a corresponding
individual from the General Population in terms of age, gender, race/ethnicity, and education level (EL). Coarsened
matching was used to facilitate the creation of comparable samples (e.g., EL groups were collapsed such that No High
School Diploma [EL 1] and High School Diploma [EL 2] were combined when seeking matching samples). The matched pairs
of individuals from the General Population and individuals diagnosed with ADHD were then split into a training
sample and a validation sample, at a 70/30 ratio, respectively. The demographic characteristics for the rated
individuals and the raters are found in Tables 12.1 to 12.3.
Table 12.1. Demographic Characteristics of the Rated Individuals: CAARS 2–ADHD Index Self-Report Training and Validation Samples
Demographic
|
Training
|
Validation
|
General Population
|
ADHD
|
General Population
|
ADHD
|
N
|
%
|
N
|
%
|
N
|
%
|
N
|
%
|
Gender
|
Female
|
95
|
54.0
|
94
|
53.4
|
24
|
35.3
|
24
|
35.3
|
Male
|
81
|
46.0
|
81
|
46.0
|
44
|
64.7
|
43
|
63.2
|
Other
|
0
|
0.0
|
1
|
0.6
|
0
|
0.0
|
1
|
1.5
|
U.S. Race/Ethnicity
|
Asian
|
3
|
1.7
|
1
|
0.6
|
1
|
1.5
|
1
|
1.5
|
Black
|
9
|
5.1
|
4
|
2.3
|
1
|
1.5
|
0
|
0.0
|
Hispanic
|
9
|
5.1
|
7
|
4.0
|
6
|
8.8
|
7
|
10.3
|
White
|
122
|
69.3
|
130
|
73.9
|
51
|
75.0
|
45
|
66.2
|
Other
|
4
|
2.3
|
7
|
4.0
|
2
|
2.9
|
2
|
2.9
|
Canadian
Race/Ethnicity
|
Not a visible minority
|
29
|
16.5
|
21
|
11.9
|
6
|
8.8
|
12
|
17.6
|
Visible minority
|
0
|
0.0
|
6
|
3.4
|
1
|
1.5
|
1
|
1.5
|
U.S. Region
|
Northeast
|
28
|
15.9
|
30
|
17.0
|
12
|
17.6
|
16
|
23.5
|
Midwest
|
37
|
21.0
|
36
|
20.5
|
8
|
11.8
|
13
|
19.1
|
South
|
48
|
27.3
|
48
|
27.3
|
31
|
45.6
|
15
|
22.1
|
West
|
34
|
19.3
|
34
|
19.3
|
10
|
14.7
|
11
|
16.2
|
Canadian Region
|
Central
|
17
|
9.7
|
13
|
7.4
|
2
|
2.9
|
7
|
10.3
|
East
|
2
|
1.1
|
3
|
1.7
|
1
|
1.5
|
2
|
2.9
|
West
|
10
|
5.7
|
9
|
5.1
|
4
|
5.9
|
4
|
5.9
|
Education Level
|
No high school diploma
|
2
|
1.1
|
2
|
1.1
|
0
|
0.0
|
0
|
0.0
|
High school diploma/GED
|
22
|
12.5
|
22
|
12.5
|
10
|
14.7
|
10
|
14.7
|
Some college or associate degree
|
70
|
39.8
|
70
|
39.8
|
24
|
35.3
|
24
|
35.3
|
Bachelor’s degree
|
51
|
29.0
|
45
|
25.6
|
21
|
30.9
|
20
|
29.4
|
Graduate or professional degree
|
31
|
17.6
|
37
|
21.0
|
13
|
19.1
|
14
|
20.6
|
Diagnosis
|
ADHD Inattentive
|
0
|
0.0
|
81
|
46.0
|
0
|
0.0
|
31
|
45.6
|
ADHD Hyperactive/Impulsive
|
0
|
0.0
|
0
|
0.0
|
0
|
0.0
|
0
|
0.0
|
ADHD Combined
|
0
|
0.0
|
94
|
53.4
|
0
|
0.0
|
37
|
54.4
|
Anxiety
|
0
|
0.0
|
47
|
26.7
|
0
|
0.0
|
17
|
25.0
|
Depression
|
0
|
0.0
|
51
|
29.0
|
0
|
0.0
|
17
|
25.0
|
Other Diagnosis
|
0
|
0.0
|
39
|
22.2
|
0
|
0.0
|
13
|
19.1
|
No Diagnosis
|
176
|
100.0
|
0
|
0.0
|
68
|
100.0
|
0
|
0.0
|
Age in years M(SD)
|
36.7 (13.3)
|
36.5 (13.0)
|
35.5 (12.3)
|
35.5 (12.4)
|
Total
|
176
|
100.0
|
176
|
100.0
|
68
|
100.0
|
68
|
100.0
|
Note.
Anxiety includes Generalized Anxiety Disorder, Panic Disorder, and Social Anxiety Disorder. Depression includes
Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include
less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and
Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring
diagnoses counted towards more than one diagnostic group.
Table 12.2. Demographic Characteristics of the Rated Individuals: CAARS 2–ADHD Index Observer Training and Validation Samples
Demographic
|
Training
|
Validation
|
General Population
|
ADHD
|
General Population
|
ADHD
|
N
|
%
|
N
|
%
|
N
|
%
|
N
|
%
|
Gender
|
Female
|
59
|
49.2
|
57
|
47.5
|
20
|
48.8
|
20
|
48.8
|
Male
|
61
|
50.8
|
61
|
50.8
|
21
|
51.2
|
21
|
51.2
|
Other
|
0
|
0.0
|
2
|
1.7
|
0
|
0.0
|
0
|
0.0
|
U.S. Race/Ethnicity
|
Asian
|
2
|
1.7
|
0
|
0.0
|
0
|
0.0
|
0
|
0.0
|
Black
|
1
|
0.8
|
2
|
1.7
|
1
|
2.4
|
1
|
2.4
|
Hispanic
|
9
|
7.5
|
9
|
7.5
|
5
|
12.2
|
3
|
7.3
|
White
|
81
|
67.5
|
90
|
75.0
|
27
|
65.9
|
29
|
70.7
|
Other
|
4
|
3.3
|
2
|
1.7
|
0
|
0.0
|
2
|
4.9
|
Canadian Race/Ethnicity
|
Not a visible minority
|
21
|
17.5
|
12
|
10.0
|
7
|
17.1
|
5
|
12.2
|
Visible minority
|
2
|
1.7
|
5
|
4.2
|
1
|
2.4
|
1
|
2.4
|
U.S. Region
|
Northeast
|
17
|
14.2
|
28
|
23.3
|
10
|
24.4
|
9
|
22.0
|
Midwest
|
25
|
20.8
|
30
|
25.0
|
11
|
26.8
|
7
|
17.1
|
South
|
36
|
30.0
|
28
|
23.3
|
6
|
14.6
|
11
|
26.8
|
West
|
19
|
15.8
|
17
|
14.2
|
6
|
14.6
|
8
|
19.5
|
Canadian Region
|
Central
|
9
|
7.5
|
12
|
10.0
|
6
|
14.6
|
2
|
4.9
|
East
|
3
|
2.5
|
3
|
2.5
|
1
|
2.4
|
1
|
2.4
|
West
|
11
|
9.2
|
2
|
1.7
|
1
|
2.4
|
3
|
7.3
|
Education Level
|
No high school diploma
|
0
|
0.0
|
0
|
0.0
|
1
|
2.4
|
1
|
2.4
|
High school diploma/GED
|
15
|
12.5
|
15
|
12.5
|
4
|
9.8
|
4
|
9.8
|
Some college or associate degree
|
52
|
43.3
|
52
|
43.3
|
18
|
43.9
|
18
|
43.9
|
Bachelor’s degree
|
29
|
24.2
|
23
|
19.2
|
12
|
29.3
|
12
|
29.3
|
Graduate or professional degree
|
24
|
20.0
|
30
|
25.0
|
6
|
14.6
|
6
|
14.6
|
Diagnosis
|
ADHD Inattentive
|
0
|
0.0
|
50
|
41.7
|
0
|
0.0
|
15
|
36.6
|
ADHD Hyperactive/Impulsive
|
0
|
0.0
|
0
|
0.0
|
0
|
0.0
|
0
|
0.0
|
ADHD Combined
|
0
|
0.0
|
70
|
58.3
|
0
|
0.0
|
26
|
63.4
|
Anxiety
|
0
|
0.0
|
28
|
23.3
|
0
|
0.0
|
17
|
41.5
|
Depression
|
0
|
0.0
|
35
|
29.2
|
0
|
0.0
|
14
|
34.1
|
Other Diagnosis
|
0
|
0.0
|
27
|
22.5
|
0
|
0.0
|
10
|
24.4
|
No Diagnosis
|
20
|
16.7
|
0
|
0.0
|
41
|
100.0
|
0
|
0.0
|
Age in years M (SD)
|
35.8 (13.4)
|
35.7 (13.2)
|
32.7 (10.4)
|
32.5 (10.3)
|
Total
|
120
|
100.0
|
120
|
100.0
|
41
|
100.0
|
41
|
100.0
|
Note.
Anxiety includes Generalized Anxiety Disorder, Panic Disorder, and Social Anxiety Disorder. Depression includes
Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include
less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and
Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring
diagnoses count towards more than one diagnostic group.
Table 12.3. Demographic Characteristics of Raters: CAARS 2–ADHD Index Observer Training and Validation Samples
Rater Demographic
|
Training
|
Validation
|
General Population
|
ADHD
|
General Population
|
ADHD
|
N
|
%
|
N
|
%
|
N
|
%
|
N
|
%
|
Gender
|
Female
|
68
|
56.7
|
79
|
65.8
|
28
|
68.3
|
24
|
58.5
|
Male
|
52
|
43.3
|
40
|
33.3
|
13
|
31.7
|
17
|
41.5
|
Other
|
0
|
0.0
|
1
|
0.8
|
0
|
0.0
|
0
|
0.0
|
U.S. Race/Ethnicity
|
Asian
|
1
|
0.8
|
1
|
0.8
|
1
|
2.4
|
0
|
0.0
|
Black
|
4
|
3.3
|
2
|
1.7
|
3
|
7.3
|
0
|
0.0
|
Hispanic
|
16
|
13.3
|
9
|
7.5
|
7
|
17.1
|
5
|
12.2
|
White
|
71
|
59.2
|
91
|
75.8
|
22
|
53.7
|
28
|
68.3
|
Other
|
2
|
1.7
|
1
|
0.8
|
0
|
0.0
|
1
|
2.4
|
Canadian Race/Ethnicity
|
Not a visible minority
|
22
|
18.3
|
13
|
10.8
|
7
|
17.1
|
6
|
14.6
|
Visible minority
|
4
|
3.3
|
3
|
2.5
|
1
|
2.4
|
1
|
2.4
|
U.S. Region
|
Northeast
|
16
|
13.3
|
26
|
21.7
|
8
|
19.5
|
8
|
19.5
|
Midwest
|
24
|
20.0
|
32
|
26.7
|
11
|
26.8
|
8
|
19.5
|
South
|
38
|
31.7
|
30
|
25.0
|
8
|
19.5
|
10
|
24.4
|
West
|
16
|
13.3
|
16
|
13.3
|
6
|
14.6
|
8
|
19.5
|
Canadian Region
|
Central
|
9
|
7.5
|
12
|
10.0
|
6
|
14.6
|
4
|
9.8
|
East
|
3
|
2.5
|
3
|
2.5
|
1
|
2.4
|
1
|
2.4
|
West
|
14
|
11.7
|
1
|
0.8
|
1
|
2.4
|
2
|
4.9
|
Education Level
|
No high school diploma
|
0
|
0.0
|
2
|
1.7
|
0
|
0.0
|
0
|
0.0
|
High school diploma/GED
|
22
|
18.3
|
16
|
13.3
|
9
|
22.0
|
5
|
12.2
|
Some college or associate degree
|
52
|
43.3
|
31
|
25.8
|
10
|
24.4
|
14
|
34.1
|
Bachelor's degree
|
33
|
27.5
|
39
|
32.5
|
19
|
46.3
|
14
|
34.1
|
Graduate or professional degree
|
13
|
10.8
|
32
|
26.7
|
3
|
7.3
|
8
|
19.5
|
Relation to Individual Being Rated
|
Spouse
|
25
|
20.8
|
66
|
55.0
|
11
|
26.8
|
23
|
56.1
|
Friend
|
46
|
38.3
|
15
|
12.5
|
14
|
34.1
|
8
|
19.5
|
Other Family Member
|
48
|
40.0
|
35
|
29.2
|
16
|
39.0
|
9
|
22.0
|
Other
|
1
|
0.8
|
4
|
3.3
|
0
|
0.0
|
1
|
2.4
|
Length of Relationship
|
1–5 months
|
2
|
1.7
|
0
|
0.0
|
1
|
2.4
|
1
|
2.4
|
6–11 months
|
0
|
0.0
|
2
|
1.7
|
0
|
0.0
|
0
|
0.0
|
1–3 years
|
11
|
9.2
|
14
|
11.7
|
1
|
2.4
|
5
|
12.2
|
More than 3 years
|
107
|
89.2
|
104
|
86.7
|
39
|
95.1
|
35
|
85.4
|
How well does the rater know the individual being rated?
|
Moderately well
|
13
|
10.8
|
5
|
4.2
|
4
|
9.8
|
3
|
7.3
|
Very well
|
107
|
89.2
|
115
|
95.8
|
37
|
90.2
|
38
|
92.7
|
How often does the rater interact with the individual
being rated?
|
Monthly
|
15
|
12.5
|
2
|
1.7
|
3
|
7.3
|
1
|
2.4
|
Weekly
|
44
|
36.7
|
20
|
16.7
|
12
|
29.3
|
5
|
12.2
|
Daily
|
61
|
50.8
|
98
|
81.7
|
26
|
63.4
|
35
|
85.4
|
Age in years M (SD)
|
39.2 (15.5)
|
37.9 (14.2)
|
39.5 (13.4)
|
34.5 (11.7)
|
Total
|
120
|
100.0
|
120
|
100.0
|
41
|
100.0
|
41
|
100.0
|
Analyses and Results
A gradient boosting machine learning model (GBM; Friedman, 2001) was employed to select items from the CAARS 2
Content Scales and Impairment & Functional Outcome Items that were best able to classify individuals who belong to
the General Population versus ADHD group. Machine learning models are increasingly used to develop and test
psychological assessment tools (Bleidorn & Hopwood, 2019; Dwyer et al., 2018). GBM creates a series of decision
trees using the variables in the model. A decision tree organizes the variables (i.e., items) into a series of steps
that best classify individuals in the sample. In models created by GBM, trees are built sequentially so that each
new tree learns from errors in the previous tree to improve classification accuracy. GBM models have some advantages
over regression models and other more traditional approaches to classification and prediction, including the ability
to handle a large number of items.
Once the item pool was selected and the samples were created, the first step in the analysis was to tune the model
in the training sample. Parameters that maximized the performance of the model while striking a balance between
complexity and accuracy were selected. A model that is overly complex will have good prediction in the sample but
will not generalize well to other samples; alternately, a model that is too simple will have poor accuracy (Miller
et al., 2016). K-fold cross-validation was employed in the model-tuning step to select the model with the
optimal
complexity. K-fold cross-validation was conducted in a series of five steps: (a) the training sample was
split into
five subsets; (b) the model was trained on all but one subset; (c) the model was evaluated in the final subset and
the prediction error was calculated; (d) the previous steps were repeated five times (k = 5), rotating which
subset
of the sample was treated as a final subset in each round; and (e) the prediction error across the five iterations
was averaged. Following these steps, the model with the lowest cross-validation prediction error value was selected
as the final model. Model tuning was completed via the caret package in R v.6.0-86 (Kuhn, 2008), and the
final
tuning parameters were used for the GBM, via the gbm package v.2.1.7 in R (Ridgeway, 2006).
The results from GBM analysis with the training sample were used to select variables to be considered for the ADHD
Index. The following classification accuracy statistics were used to help decide between models (see
chapter 6,
Development, for a description of each statistic): overall correct classification, sensitivity,
specificity,
positive predictive value, negative predictive value, and kappa. As a global construct, classification accuracy
assesses the ability of the model to accurately determine whether the individual’s scores more closely resemble
those from the ADHD or General Population group. Results from the model are compared to the criterion, which, in
this case, was an ADHD diagnosis.
The top 10 best discriminating items were selected using the relative importance statistic from the GBM. The
relative importance statistic indicates how important variables are, relative to others in the model (note that
there are no absolute guidelines when interpreting this statistic; instead, it is used to compare items to one
another). A 12-item solution was also tested by expanding the 10-item solution. In instances where the 10-item
solution did not include content from Hyperactivity or Impulsivity, then an item from those Content Scales with the
highest relative importance statistic was selected to extend content coverage. Selected items for the 10- and
12-item versions are presented in Tables 12.4 and 12.5,
respectively. Note
that
because the analyses were conducted
independently for Self-Report and Observer ratings, the items selected are not identical (although they share
overlapping content).
Table 12.4. Candidate Items: CAARS 2–ADHD Index Self-Report
Originating Section
|
Item Stem
|
10-item Solution
|
12-item Solution
|
Inattention/Executive Dysfunction
|
Difficulty completing tasks
|
X
|
X
|
Inattention/Executive Dysfunction
|
Needs reminders
|
X
|
X
|
Inattention/Executive Dysfunction
|
Concentrates only on interesting things
|
X
|
X
|
Inattention/Executive Dysfunction
|
Loses things
|
X
|
X
|
Inattention/Executive Dysfunction
|
Overfocused or distracted
|
X
|
X
|
Inattention/Executive Dysfunction
|
Distracted easily
|
X
|
X
|
Inattention/Executive Dysfunction
|
Difficulty paying attention
|
X
|
X
|
Inattention/Executive Dysfunction
|
Difficulty staying focused
|
X
|
X
|
Inattention/Executive Dysfunction
|
Requires deadlines
|
|
X
|
Hyperactivity
|
Talks too much
|
X
|
X
|
Hyperactivity
|
Fidgets
|
X
|
X
|
Impulsivity
|
Interrupts others
|
|
X
|
Table 12.5. Candidate Items: CAARS 2–ADHD Index Observer
Originating Section
|
Item Stem
|
10-item Solution
|
12-item Solution
|
Inattention/Executive Dysfunction
|
Overfocused or distracted
|
X
|
X
|
Inattention/Executive Dysfunction
|
Distracted easily
|
X
|
X
|
Inattention/Executive Dysfunction
|
Difficulty paying attention
|
X
|
X
|
Inattention/Executive Dysfunction
|
Difficulty staying focused
|
X
|
X
|
Inattention/Executive Dysfunction
|
Difficulty prioritizing
|
X
|
X
|
Inattention/Executive Dysfunction
|
Procrastinates
|
X
|
X
|
Inattention/Executive Dysfunction
|
Needs reminders
|
|
X
|
Hyperactivity
|
Talks when they should be quiet
|
|
X
|
Impulsivity
|
Impulsive
|
X
|
X
|
Negative Self-Concept
|
Lacks confidence
|
X
|
X
|
Negative Self-Concept
|
Self-critical
|
X
|
X
|
Impairment & Functional Outcome Item
|
Finds things harder than other people
|
X
|
X
|
Model tuning and the GBM analyses were repeated for the 10-item and 12-item solutions with the training sample.
Classification accuracy results from the potential item subsets were compared to each other and were compared with
results from a version that included all candidate items. Results (as presented in Table
12.6) revealed that the
item subsets performed well relative to all items. The 12-item solutions for both rater forms were preferred over
the 10-item solutions as they had slightly better classification statistics.
Table 12.6. Classification Accuracy: CAARS 2–ADHD Index Training Sample
Form
|
Items Tested
|
Overall Accuracy
|
Sensitivity (%)
|
Specificity (%)
|
Positive Predictive Value (%)
|
Negative Predictive Value (%)
|
Kappa
|
Self-Report
|
All items
|
99.1
|
99.4
|
98.9
|
98.9
|
99.4
|
.98
|
12-item
|
96.3
|
96.6
|
96.0
|
96.0
|
96.6
|
.93
|
10-item
|
95.2
|
96.6
|
93.8
|
93.9
|
96.5
|
.90
|
Observer
|
All items
|
98.8
|
97.5
|
100.0
|
100.0
|
97.6
|
.98
|
12-item
|
93.3
|
92.5
|
94.2
|
94.1
|
92.6
|
.87
|
10-item
|
92.9
|
92.5
|
93.3
|
93.3
|
92.6
|
.86
|
Note.
N = 352 for Self-Report; N = 240 for Observer.
The next step in the development process was to test the 10-item and 12-item solutions using the validation samples.
Results of the classification accuracy for this analysis are presented in Table 12.7. It
is notable that many of the
items that emerged as top candidates for the ADHD Index (9 of 12 for Self-Report; 7 of 12 for Observer) were items
that originated from the Inattention/Executive Dysfunction scale, as well as the fact that some of the same items
were candidates across the two rater forms. The item content of the 12-item solution included an item from either
the Hyperactivity (for Observer) or Impulsivity (for Self-Report) that was not found in the 10-item version.
Considering most items identified by the GBM were Inattentive/Executive Dysfunction items, this additional breadth
of content was deemed important, and the 12-item solution was retained.
Table 12.7. Classification Accuracy: CAARS 2–ADHD Index Validation Sample
Form
|
Item Subset
|
Overall Accuracy
|
Sensitivity (%)
|
Specificity (%)
|
Positive Predictive Value (%)
|
Negative Predictive Value (%)
|
Kappa
|
Self-Report
|
12-item
|
91.0
|
92.6
|
89.7
|
90.0
|
92.4
|
.82
|
10-item
|
90.0
|
91.2
|
89.7
|
89.9
|
91.0
|
.81
|
Observer
|
12-item
|
88.0
|
90.2
|
85.4
|
86.0
|
89.7
|
.76
|
10-item
|
88.0
|
90.2
|
85.4
|
86.0
|
89.7
|
.76
|
Note.
N = 136 for Self-Report; N = 82 for Observer.
Once the final CAARS 2–ADHD Index items were selected, the performance of the new ADHD Index was compared to the
performance of the original CAARS ADHD Index to ensure that the new Index performed as well or better than the
previous iteration. The raw summed scores of each Index were compared using Receiver Operating Characteristic (ROC)
curves and area under the ROC curve (AUC). A ROC curve plots the performance of a measure at each possible cut-off
score, or threshold. AUC is a widely used measurement of accuracy. It represents the total area under the AUC curve
and is interpreted as the probability that an individual diagnosed with ADHD selected at random will have a higher
score than a randomly selected individual from the general population. Values from .50 to .69 reflect poor accuracy,
values between .70 to .90 reflect moderate accuracy, and values above .90 are highly accurate (Fischer et al.,
2003). The ROC curves are plotted in Figures 12.1 and 12.2
below. Visual inspection of the ROC curves shows that the
CAARS 2–ADHD Index covers a greater area under the curve than the original ADHD Index, thereby improving the
likelihood of correct classification. Results between the two Indexes were contrasted using DeLong’s test (DeLong et
al., 1988) with 5,000 bootstrapped samples using the validation sample. Results show the CAARS 2–ADHD Index
performed statistically significantly better than the original CAARS ADHD Index for both Self-Report (Z =
-3.55,
p < 0.01; original ADHD Index AUC = .89, CAARS 2–ADHD Index AUC = .95) and Observer (Z = -2.86,
p
< 0.01, original ADHD Index AUC = .84, CAARS 2–ADHD Index AUC = .94). While both the original and new ADHD Index
exceed guidelines for acceptable accuracy, these results show that the CAARS 2–ADHD Index has high classification
accuracy, validate that the items selected are performing well, and demonstrate that the current Index represents an
improvement over the original.