Manual

CAARS 2 Manual

Chapter 12: Development


Development

The creation of the CAARS 2–ADHD Index began with the identification of items from the full-length CAARS 2 that best differentiated between individuals from the general population (i.e., without a clinical diagnosis) and those with ADHD. The ADHD Index was designed to comprise a limited number of the most discriminating items, so as to provide a brief and efficient tool that could be used independently (e.g., for screening purposes) or in conjunction with other CAARS 2 scores. Development of the CAARS 2–ADHD Index began upon the completion of the CAARS 2 standardization phase (see Item Development in chapter 6, Development) and relied on the same samples and item pool to identify the best-performing items.

Samples

To begin the item selection phase, the Total Samples of the CAARS 2 (see Standardization Phase in chapter 6, Development) were divided into two subsets: individuals without a clinical diagnosis (non-clinical General Population samples) and the ADHD Reference Samples. Note that individuals diagnosed with ADHD-Hyperactive/Impulsive Presentation were excluded from the analyses due to their underrepresentation in the ADHD Reference Sample (N = 10 for Self-Report; N = 9 for Observer). The General Population samples were substantially larger than the ADHD samples; however, unbalanced samples can result in problems when classifying groups (as prediction will naturally favor the larger group). Therefore, samples were balanced by matching individuals in the ADHD Reference sample with a corresponding individual from the General Population in terms of age, gender, race/ethnicity, and education level (EL). Coarsened matching was used to facilitate the creation of comparable samples (e.g., EL groups were collapsed such that No High School Diploma [EL 1] and High School Diploma [EL 2] were combined when seeking matching samples). The matched pairs of individuals from the General Population and individuals diagnosed with ADHD were then split into a training sample and a validation sample, at a 70/30 ratio, respectively. The demographic characteristics for the rated individuals and the raters are found in Tables 12.1 to 12.3.

Click to expand

Table 12.1. Demographic Characteristics of the Rated Individuals: CAARS 2–ADHD Index Self-Report Training and Validation Samples

Demographic Training Validation
General Population ADHD General Population ADHD
N % N % N % N %
Gender Female 95 54.0 94 53.4 24 35.3 24 35.3
Male 81 46.0 81 46.0 44 64.7 43 63.2
Other 0 0.0 1 0.6 0 0.0 1 1.5
U.S. Race/Ethnicity Asian 3 1.7 1 0.6 1 1.5 1 1.5
Black 9 5.1 4 2.3 1 1.5 0 0.0
Hispanic 9 5.1 7 4.0 6 8.8 7 10.3
White 122 69.3 130 73.9 51 75.0 45 66.2
Other 4 2.3 7 4.0 2 2.9 2 2.9
Canadian Race/Ethnicity Not a visible minority 29 16.5 21 11.9 6 8.8 12 17.6
Visible minority 0 0.0 6 3.4 1 1.5 1 1.5
U.S. Region Northeast 28 15.9 30 17.0 12 17.6 16 23.5
Midwest 37 21.0 36 20.5 8 11.8 13 19.1
South 48 27.3 48 27.3 31 45.6 15 22.1
West 34 19.3 34 19.3 10 14.7 11 16.2
Canadian Region Central 17 9.7 13 7.4 2 2.9 7 10.3
East 2 1.1 3 1.7 1 1.5 2 2.9
West 10 5.7 9 5.1 4 5.9 4 5.9
Education Level No high school diploma 2 1.1 2 1.1 0 0.0 0 0.0
High school diploma/GED 22 12.5 22 12.5 10 14.7 10 14.7
Some college or associate degree 70 39.8 70 39.8 24 35.3 24 35.3
Bachelor’s degree 51 29.0 45 25.6 21 30.9 20 29.4
Graduate or professional degree 31 17.6 37 21.0 13 19.1 14 20.6
Diagnosis ADHD Inattentive 0 0.0 81 46.0 0 0.0 31 45.6
ADHD Hyperactive/Impulsive 0 0.0 0 0.0 0 0.0 0 0.0
ADHD Combined 0 0.0 94 53.4 0 0.0 37 54.4
Anxiety 0 0.0 47 26.7 0 0.0 17 25.0
Depression 0 0.0 51 29.0 0 0.0 17 25.0
Other Diagnosis 0 0.0 39 22.2 0 0.0 13 19.1
No Diagnosis 176 100.0 0 0.0 68 100.0 0 0.0
Age in years M(SD) 36.7 (13.3) 36.5 (13.0) 35.5 (12.3) 35.5 (12.4)
Total 176 100.0 176 100.0 68 100.0 68 100.0
Note. Anxiety includes Generalized Anxiety Disorder, Panic Disorder, and Social Anxiety Disorder. Depression includes Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring diagnoses counted towards more than one diagnostic group.
Click to expand

Table 12.2. Demographic Characteristics of the Rated Individuals: CAARS 2–ADHD Index Observer Training and Validation Samples

Demographic Training Validation
General Population ADHD General Population ADHD
N % N % N % N %
Gender Female 59 49.2 57 47.5 20 48.8 20 48.8
Male 61 50.8 61 50.8 21 51.2 21 51.2
Other 0 0.0 2 1.7 0 0.0 0 0.0
U.S. Race/Ethnicity Asian 2 1.7 0 0.0 0 0.0 0 0.0
Black 1 0.8 2 1.7 1 2.4 1 2.4
Hispanic 9 7.5 9 7.5 5 12.2 3 7.3
White 81 67.5 90 75.0 27 65.9 29 70.7
Other 4 3.3 2 1.7 0 0.0 2 4.9
Canadian Race/Ethnicity Not a visible minority 21 17.5 12 10.0 7 17.1 5 12.2
Visible minority 2 1.7 5 4.2 1 2.4 1 2.4
U.S. Region Northeast 17 14.2 28 23.3 10 24.4 9 22.0
Midwest 25 20.8 30 25.0 11 26.8 7 17.1
South 36 30.0 28 23.3 6 14.6 11 26.8
West 19 15.8 17 14.2 6 14.6 8 19.5
Canadian Region Central 9 7.5 12 10.0 6 14.6 2 4.9
East 3 2.5 3 2.5 1 2.4 1 2.4
West 11 9.2 2 1.7 1 2.4 3 7.3
Education Level No high school diploma 0 0.0 0 0.0 1 2.4 1 2.4
High school diploma/GED 15 12.5 15 12.5 4 9.8 4 9.8
Some college or associate degree 52 43.3 52 43.3 18 43.9 18 43.9
Bachelor’s degree 29 24.2 23 19.2 12 29.3 12 29.3
Graduate or professional degree 24 20.0 30 25.0 6 14.6 6 14.6
Diagnosis ADHD Inattentive 0 0.0 50 41.7 0 0.0 15 36.6
ADHD Hyperactive/Impulsive 0 0.0 0 0.0 0 0.0 0 0.0
ADHD Combined 0 0.0 70 58.3 0 0.0 26 63.4
Anxiety 0 0.0 28 23.3 0 0.0 17 41.5
Depression 0 0.0 35 29.2 0 0.0 14 34.1
Other Diagnosis 0 0.0 27 22.5 0 0.0 10 24.4
No Diagnosis 20 16.7 0 0.0 41 100.0 0 0.0
Age in years M (SD) 35.8 (13.4) 35.7 (13.2) 32.7 (10.4) 32.5 (10.3)
Total 120 100.0 120 100.0 41 100.0 41 100.0
Note. Anxiety includes Generalized Anxiety Disorder, Panic Disorder, and Social Anxiety Disorder. Depression includes Major Depressive Disorder, Major Depressive Episode, and Persistent Depressive Disorder. Other diagnoses include less frequently reported co-occurring diagnoses, such as Autism Spectrum Disorder and Substance-Related and Addictive Disorders. The sum of diagnoses is greater than the total N because individuals with co-occurring diagnoses count towards more than one diagnostic group.
Click to expand

Table 12.3. Demographic Characteristics of Raters: CAARS 2–ADHD Index Observer Training and Validation Samples

Rater Demographic Training Validation
General Population ADHD General Population ADHD
N % N % N % N %
Gender Female 68 56.7 79 65.8 28 68.3 24 58.5
Male 52 43.3 40 33.3 13 31.7 17 41.5
Other 0 0.0 1 0.8 0 0.0 0 0.0
U.S. Race/Ethnicity Asian 1 0.8 1 0.8 1 2.4 0 0.0
Black 4 3.3 2 1.7 3 7.3 0 0.0
Hispanic 16 13.3 9 7.5 7 17.1 5 12.2
White 71 59.2 91 75.8 22 53.7 28 68.3
Other 2 1.7 1 0.8 0 0.0 1 2.4
Canadian Race/Ethnicity Not a visible minority 22 18.3 13 10.8 7 17.1 6 14.6
Visible minority 4 3.3 3 2.5 1 2.4 1 2.4
U.S. Region Northeast 16 13.3 26 21.7 8 19.5 8 19.5
Midwest 24 20.0 32 26.7 11 26.8 8 19.5
South 38 31.7 30 25.0 8 19.5 10 24.4
West 16 13.3 16 13.3 6 14.6 8 19.5
Canadian Region Central 9 7.5 12 10.0 6 14.6 4 9.8
East 3 2.5 3 2.5 1 2.4 1 2.4
West 14 11.7 1 0.8 1 2.4 2 4.9
Education Level No high school diploma 0 0.0 2 1.7 0 0.0 0 0.0
High school diploma/GED 22 18.3 16 13.3 9 22.0 5 12.2
Some college or associate degree 52 43.3 31 25.8 10 24.4 14 34.1
Bachelor's degree 33 27.5 39 32.5 19 46.3 14 34.1
Graduate or professional degree 13 10.8 32 26.7 3 7.3 8 19.5
Relation to Individual Being Rated Spouse 25 20.8 66 55.0 11 26.8 23 56.1
Friend 46 38.3 15 12.5 14 34.1 8 19.5
Other Family Member 48 40.0 35 29.2 16 39.0 9 22.0
Other 1 0.8 4 3.3 0 0.0 1 2.4
Length of Relationship 1–5 months 2 1.7 0 0.0 1 2.4 1 2.4
6–11 months 0 0.0 2 1.7 0 0.0 0 0.0
1–3 years 11 9.2 14 11.7 1 2.4 5 12.2
More than 3 years 107 89.2 104 86.7 39 95.1 35 85.4
How well does the rater know the individual being rated? Moderately well 13 10.8 5 4.2 4 9.8 3 7.3
Very well 107 89.2 115 95.8 37 90.2 38 92.7
How often does the rater interact with the individual being rated? Monthly 15 12.5 2 1.7 3 7.3 1 2.4
Weekly 44 36.7 20 16.7 12 29.3 5 12.2
Daily 61 50.8 98 81.7 26 63.4 35 85.4
Age in years M (SD) 39.2 (15.5) 37.9 (14.2) 39.5 (13.4) 34.5 (11.7)
Total 120 100.0 120 100.0 41 100.0 41 100.0

Analyses and Results

A gradient boosting machine learning model (GBM; Friedman, 2001) was employed to select items from the CAARS 2 Content Scales and Impairment & Functional Outcome Items that were best able to classify individuals who belong to the General Population versus ADHD group. Machine learning models are increasingly used to develop and test psychological assessment tools (Bleidorn & Hopwood, 2019; Dwyer et al., 2018). GBM creates a series of decision trees using the variables in the model. A decision tree organizes the variables (i.e., items) into a series of steps that best classify individuals in the sample. In models created by GBM, trees are built sequentially so that each new tree learns from errors in the previous tree to improve classification accuracy. GBM models have some advantages over regression models and other more traditional approaches to classification and prediction, including the ability to handle a large number of items.

Once the item pool was selected and the samples were created, the first step in the analysis was to tune the model in the training sample. Parameters that maximized the performance of the model while striking a balance between complexity and accuracy were selected. A model that is overly complex will have good prediction in the sample but will not generalize well to other samples; alternately, a model that is too simple will have poor accuracy (Miller et al., 2016). K-fold cross-validation was employed in the model-tuning step to select the model with the optimal complexity. K-fold cross-validation was conducted in a series of five steps: (a) the training sample was split into five subsets; (b) the model was trained on all but one subset; (c) the model was evaluated in the final subset and the prediction error was calculated; (d) the previous steps were repeated five times (k = 5), rotating which subset of the sample was treated as a final subset in each round; and (e) the prediction error across the five iterations was averaged. Following these steps, the model with the lowest cross-validation prediction error value was selected as the final model. Model tuning was completed via the caret package in R v.6.0-86 (Kuhn, 2008), and the final tuning parameters were used for the GBM, via the gbm package v.2.1.7 in R (Ridgeway, 2006).

The results from GBM analysis with the training sample were used to select variables to be considered for the ADHD Index. The following classification accuracy statistics were used to help decide between models (see chapter 6, Development, for a description of each statistic): overall correct classification, sensitivity, specificity, positive predictive value, negative predictive value, and kappa. As a global construct, classification accuracy assesses the ability of the model to accurately determine whether the individual’s scores more closely resemble those from the ADHD or General Population group. Results from the model are compared to the criterion, which, in this case, was an ADHD diagnosis.

The top 10 best discriminating items were selected using the relative importance statistic from the GBM. The relative importance statistic indicates how important variables are, relative to others in the model (note that there are no absolute guidelines when interpreting this statistic; instead, it is used to compare items to one another). A 12-item solution was also tested by expanding the 10-item solution. In instances where the 10-item solution did not include content from Hyperactivity or Impulsivity, then an item from those Content Scales with the highest relative importance statistic was selected to extend content coverage. Selected items for the 10- and 12-item versions are presented in Tables 12.4 and 12.5, respectively. Note that because the analyses were conducted independently for Self-Report and Observer ratings, the items selected are not identical (although they share overlapping content).

Model tuning and the GBM analyses were repeated for the 10-item and 12-item solutions with the training sample. Classification accuracy results from the potential item subsets were compared to each other and were compared with results from a version that included all candidate items. Results (as presented in Table 12.6) revealed that the item subsets performed well relative to all items. The 12-item solutions for both rater forms were preferred over the 10-item solutions as they had slightly better classification statistics.

Click to expand

The next step in the development process was to test the 10-item and 12-item solutions using the validation samples. Results of the classification accuracy for this analysis are presented in Table 12.7. It is notable that many of the items that emerged as top candidates for the ADHD Index (9 of 12 for Self-Report; 7 of 12 for Observer) were items that originated from the Inattention/Executive Dysfunction scale, as well as the fact that some of the same items were candidates across the two rater forms. The item content of the 12-item solution included an item from either the Hyperactivity (for Observer) or Impulsivity (for Self-Report) that was not found in the 10-item version. Considering most items identified by the GBM were Inattentive/Executive Dysfunction items, this additional breadth of content was deemed important, and the 12-item solution was retained.

Click to expand

Once the final CAARS 2–ADHD Index items were selected, the performance of the new ADHD Index was compared to the performance of the original CAARS ADHD Index to ensure that the new Index performed as well or better than the previous iteration. The raw summed scores of each Index were compared using Receiver Operating Characteristic (ROC) curves and area under the ROC curve (AUC). A ROC curve plots the performance of a measure at each possible cut-off score, or threshold. AUC is a widely used measurement of accuracy. It represents the total area under the AUC curve and is interpreted as the probability that an individual diagnosed with ADHD selected at random will have a higher score than a randomly selected individual from the general population. Values from .50 to .69 reflect poor accuracy, values between .70 to .90 reflect moderate accuracy, and values above .90 are highly accurate (Fischer et al., 2003). The ROC curves are plotted in Figures 12.1 and 12.2 below. Visual inspection of the ROC curves shows that the CAARS 2–ADHD Index covers a greater area under the curve than the original ADHD Index, thereby improving the likelihood of correct classification. Results between the two Indexes were contrasted using DeLong’s test (DeLong et al., 1988) with 5,000 bootstrapped samples using the validation sample. Results show the CAARS 2–ADHD Index performed statistically significantly better than the original CAARS ADHD Index for both Self-Report (Z = -3.55, p < 0.01; original ADHD Index AUC = .89, CAARS 2–ADHD Index AUC = .95) and Observer (Z = -2.86, p < 0.01, original ADHD Index AUC = .84, CAARS 2–ADHD Index AUC = .94). While both the original and new ADHD Index exceed guidelines for acceptable accuracy, these results show that the CAARS 2–ADHD Index has high classification accuracy, validate that the items selected are performing well, and demonstrate that the current Index represents an improvement over the original.

Click to expand

Figure 12.1. ROC Curve: Comparison between Original and CAARS-2 ADHD Index Self-Report

Click to expand

Figure 12.2. ROC Curve: Comparison between Original and CAARS-2 ADHD Index Observer

< Back Next >