Manual

CAARS 2 Manual

Chapter 6: Pilot Phase


Pilot Phase

The psychometric properties of the proposed items were tested with a demographically representative general population sample and a clinical sample of individuals diagnosed with ADHD. The pilot study also explored how well the proposed items contributed to the various goals of the CAARS revision described earlier in the chapter. The study was conducted digitally via online recruitment and administered by an email link shared with participants, although in-person support was available for those with limited literacy or limited comfort with technology. All participants provided consent and received a small monetary compensation. Five items were presented at a time on a participant’s screen in a grid format. In addition to the item responses themselves, data collection included item-level and total-test response times. Psychometric analyses (described below) were then conducted to select the final items for the CAARS 2.

Samples

The pilot version of the CAARS 2 was administered to large Self-Report and Observer samples drawn from individuals in the general population with no current mental health diagnoses (described herein as the General Population Samples). The pilot version was also administered to Self-Report and Observer samples where the individual being rated reported the presence of a current mental health diagnosis that was confirmed by a licensed psychologist. These samples (described herein as the Clinical samples), contained ratings of the same individual in approximately 65% of the case, due to the way data were collected (that is, an individual rated themselves and an Observer also provided a rating of that individual).

The General Population pilot study samples consisted of roughly equal representation by age groups and by gender of the individual being rated, and representation of broad categories of race/ethnicity, geographic region, and educational level (EL) that closely matched U.S. census proportions at the time of recruitment (with a 3.0% average deviation proportion derived from the American Community Survey 2010; U.S. Census Bureau, 2017). Note that the Self-Report and Observer samples of the General Population were independent (i.e., the Observers in these samples did not rate the same individuals who rated themselves on the Self-Report). Descriptions of the Self-Report (N = 555) and Observer (N = 506) General Population Samples are presented in Tables 6.1 and 6.2. In the Observer sample, 32.6% were spouses, 44.8% were relatives/family members (primarily parents or siblings), 21.8% were close friends, and 0.8% identified themselves as another relation. Observers were required to have known the individual being rated for at least one month (92.8% of this sample reported knowing the individual for more than three years) and to have known them well (89.3% of this sample reported knowing the individual “Very well”).

Click to expand

The Clinical pilot study samples (N = 150 for Self-Report; N = 123 for Observer) comprised ratings of North American individuals with a confirmed ADHD diagnosis (Combined Presentation: N = 56 for Self-Report; N = 46 for Observers of individuals with ADHD. Inattentive Presentation: N = 62 for Self-Report; N = 48 for Observer), and ratings of individuals with confirmed diagnoses other than ADHD (e.g., mood or anxiety-related disorders; N = 32 for Self-Report; N = 29 for Observer). An attempt was made to recruit individuals diagnosed with ADHD Predominantly hyperactive/impulsive presentation; however, given the lower prevalence of this presentation in adults (Anastopoulos & Shelton, 2001), it was not surprising that this group was difficult to recruit. In the end, a decision was made to focus on Inattentive and Combined presentations. Detailed information about the clinical samples used during the pilot phase is presented in Table 6.3. In addition to the information found in that table, it is noteworthy that more than half of the individuals with ADHD reported taking a psychoactive medication for their disorder(s) and approximately 21% of rated individuals in the Self-Report and Observer samples reported having more than one mental health diagnosis.

Click to expand

Table 6.3. Demographic Characteristics of the CAARS 2 Pilot Study Clinical Samples

Demographic Self-Report Observer:
Rated
Individual
Observer:
Rater
N % N % N %
Age (in years) 18–24 53 35.3 50 40.7 16 13.0
25–34 37 24.7 30 24.4 32 26.0
35–44 31 20.7 26 21.1 24 19.5
45–54 20 13.3 13 10.6 30 24.4
55–64 8 5.3 4 3.3 16 13.0
65+ 1 0.7 0 0.0 5 4.1
Gender Male 63 42.0 54 43.9 34 27.6
Female 87 58.0 69 56.1 89 72.4
Race/Ethnicity Hispanic 12 8.0 5 4.1 8 6.5
Black 5 3.3 5 4.1 3 2.4
White 129 86.0 108 87.8 106 86.2
Other 4 2.7 5 4.1 6 4.9
Region U.S. Northeast 37 24.7 36 29.3 36 29.3
U.S. Midwest 23 15.3 19 15.4 19 15.4
U.S. South 56 37.3 43 35.0 41 33.3
U.S. West 12 8.0 12 9.8 14 11.4
Canada 22 14.7 13 10.6 13 10.6
Education Level High school diploma or lower 32 21.3 25 20.3 24 19.5
Some college or associate degree 51 34.0 44 35.8 24 19.5
Bachelor's degree or higher 67 44.7 53 43.1 75 61.0
Diagnosis ADHD Inattentive 62 41.3 48 39.0 -- --
ADHD Combined 56 37.3 46 37.4 -- --
Other clinical diagnoses 32 21.3 29 23.6 -- --
Relation to individual being rated Spouse -- -- -- -- 57 46.3
Friend -- -- -- -- 14 11.4
Other Family Member -- -- -- -- 48 39.0
Other -- -- -- -- 4 3.3
Length of relationship 1–3 years -- -- -- -- 21 17.1
More than 3 years -- -- -- -- 101 82.1
How well does the rater know the individual being rated? Moderately Well -- -- -- -- 10 8.1
Very Well -- -- -- -- 113 91.9
How often does the rater interact with the individual being rated? Monthly -- -- -- -- 0 0.0
Weekly -- -- -- -- 21 17.1
Daily -- -- -- -- 102 82.9
Total 150 100.0 123 100.0 123 100.0

Analyses and Results

The psychometric properties of the items from the Content Scales and DSM Symptom Scales were investigated with both classical test theory (CTT) and item-response theory (IRT) methodologies. First, items were reviewed for general quality via (a) response frequencies by clinical and demographic groups to ensure that all options were endorsed, (b) inter-item correlations to confirm the relatedness of content, and (c) group differences to establish that items performed as predicted (e.g., individuals with ADHD Combined Presentation were expected to endorse the Hyperactivity and Impulsivity items more highly than individuals without any diagnosis and more highly than individuals with other clinical diagnoses, including ADHD Inattentive). Second, items were subjected to exploratory factor analyses (EFA) to understand the structure of the item pool. The EFA results revealed a preliminary empirical structure with a large primary factor, encompassing items from the Inattention, Memory, and Planning/Organization/Time Management domains. Additional separate factors that also emerged from the EFA included a blended Hyperactivity/Impulsivity factor (which also contained a few items originally hypothesized as content for Emotional Lability), an Emotional Lability factor, and a Problems with Self-Concept factor. These factors were incorporated into further analyses, inspecting internal consistency, inter-scale correlations, and item-to-scale correlations.

Statistical findings, including the latent structure of the scales and item-level analyses, were weighed alongside clinical expertise from psychologists on the development team to determine the set of items that would be carried through the next phase of development. Items were considered for removal if

  • non-negligible demographic differences were observed (e.g., meaningful differences between racial/ethnic groups),

  • items did not relate well to any factors or to their intended factor (e.g., item factor loading < .30),

  • expected differences between clinical groups were not notable or were inconsistent with hypotheses (e.g., an item written to measure inattentive behavior was rated similarly by individuals with ADHD and those with no clinical diagnosis), or

  • item-to-scale correlations and/or item communalities were low (that is, an indicator of the unique contribution of an individual item to its scale; low values defined as < .20; Kline, 1986).

Additionally, IRT analyses identified ideal item performance, specifically those that offered relatively greater precision of measurement, as well as items positioned at the correct level of the traits being measured. Results from the pilot study analyses saw the removal, revision, or replacement of 88 (of 274) items for Self-Report and 85 (of 271) items for Observer.

In addition to these analyses conducted on items written for the Content Scales and DSM Symptom Scales, frequency distributions of responses for the items designed to assess impairments and functional outcomes and associated clinical concerns were compared among individuals with no clinical diagnosis (the General Population Sample), individuals diagnosed with ADHD Combined presentation, individuals diagnosed with ADHD Inattentive presentation, and individuals diagnosed with a clinical disorder other than ADHD (primarily Anxiety and/or Depression in this sample). Items with very low endorsement overall (i.e., exceptionally low baseline of response frequency), minimal response differences between the clinical and general population groups, or that did not contribute unique information (e.g., if another item captured a similar content area and performed better) were selected for review. Upon careful consideration of practical utility and statistical properties, the development team chose to remove an additional 10 items from the Self-Report item pool and 10 items from the Observer item pool.

The pool of proposed validity items was tested with two additional independent samples. After providing consent and completing the procedures described below, the participants were debriefed on the purpose of the study and compensated for their participation. Participants were recruited online and completed the assessment via an emailed link. One sample, “Fake Bad Responders,” was instructed to create an overly negative impression of themselves (Self-Report, N = 64), or of the person they were rating (Observer, N = 63). Instructions for the Fake Bad Responders of the Self-Report study were “Your task is to respond to the statements in a way that creates an overly bad impression of yourself. This means that you present yourself in the worst possible light. You can respond in whatever way you think this may be achieved.” The other sample, “Fake ADHD Responders,” was instructed to respond as if they were seeking an ADHD diagnosis, based on a brief description of symptoms of ADHD that one might find from a cursory internet search (N = 69 for Self-Report; N = 53 for Observer). For the Fake ADHD Responders, instructions for the Self-Report study were “Your task is to respond to the statements in a way that will make people think you have Attention-Deficit/Hyperactivity Disorder (ADHD). You might have heard about ADHD in the media and know some of the symptoms. People with ADHD have trouble paying attention, are hyperactive, and impulsive.” Note that the instructions for Observers were a close match to the instructions provided as a sample here; minimal changes were made to convert the context to the rating of another individual, as opposed to rating oneself.

Analyses of the Fake Bad and Fake ADHD samples included examining frequency distributions, with emphasis on responses at the extreme ends of the rating scale—that is an item response of 3 (“Completely true; Very often/Always”) or an item response of 0 (“Not true at all; Never/Rarely”)—and calculating Cliff’s delta for pairwise group comparisons. Items with the most notable differences between “Honest Responders” (i.e., individuals in the General Population and ADHD Reference Samples who were instructed to respond to all items honestly) and faking samples were selected as candidates, and their responses were summed to create a raw score. The distribution of raw scores was compared across the same groups to determine whether there was a raw score at which a small proportion (ideally less than 5%) of Honest Responders and as large a proportion as possible of raters in the faking samples would be identified. Items written in order to detect threats to symptom validity, such as negative impression, infrequent responses (though some of these can reflect real-but-rare ADHD symptoms), and exaggerated or ADHD-like symptoms that are not genuine, were evaluated individually as possible elements of the Negative Impression and ADHD Symptom Validity Test indices, as well as one large item pool as a general indicator of negative response bias. Results demonstrated that honest and feigned responses could be distinguished with relative accuracy, which improved when all item types were combined into a single index. The best-performing items were retained for the next phase of development of the CAARS 2 to form a validity index that blends the various negative response style items.

At the completion of the pilot phase, there were 172 Self-Report items and 167 Observer items retained for the next phase of development. These included Associated Clinical Concern Items, Impairment & Functional Outcome Items, and items for the validity scales, Content Scales, and DSM Symptom Scales. The items from the original CAARS ADHD Index were also retained for the next phase to allow for the comparison with a new CAARS 2 ADHD Index.

< Back Next >