-
Chapter 1: Introduction
-
Chapter 2: Background
-
Chapter 3: Administration and Scoring
-
Chapter 4: Interpretation
-
Chapter 5: Case Studies
-
Chapter 6: Development
-
Chapter 7: Standardization
-
Chapter 8: Reliability
-
Chapter 9: Validity
-
Chapter 10: Fairness
-
Chapter 11: CAARS 2–Short
-
Chapter 12: CAARS 2–ADHD Index
-
Chapter 13: Translations
-
Appendices
CAARS 2 ManualChapter 8: Test Information |
Test Information |
Another way to understand reliability involves the use of item response theory (IRT) to evaluate item and test information (Ayearst & Bagby, 2010). Higher values for information on items and tests (i.e., scales, in this context) indicate greater measurement precision and lower measurement error. Item information was a contributing factor in the item selection phase of developing the CAARS 2 (see chapter 6, Development, for more information). One of the benefits of item and test information is the recognition that the degree of precision in measurement can vary across the trait level (i.e., an assessment can be more precise at measuring individuals who are at certain levels of the construct, and less precise in other ranges). In contrast, classical test theory (CTT) reliability assumes the test works equally well across all levels of the trait (Ayearst & Bagby, 2010). For the CAARS 2, test information at higher levels of the trait or construct being measured, specifically approaching and exceeding 1.5 SD above the mean, was prioritized. This priority was set because the purpose of the CAARS 2 is to assess clinically significant symptoms, associated features, and functional impairments or outcomes related to having ADHD—as represented by T-scores that fall 1.5 SD above the mean—rather than to measure the full spectrum of behaviors associated with the constructs (i.e., the CAARS 2 was developed to capture problematically high levels of the included constructs, not low levels).
Test information is assessed using a test information function (TIF), which plots the precision of the test across all levels of the construct being measured. The inverse of the information function is the conditional SEM. Conditional SEM is similar in concept to SEM (for a detailed explanation on traditional SEM, see Standard Error of Measurement in this chapter), with the distinction that the amount of error varies conditionally based on one’s standing on the construct being measured. That is, there may be some ranges of scores for which the test is more precise or less precise, based on the precision and reliability of the items used. Therefore, the least amount of error in a test is at the peak of the information function (Ayearst & Bagby, 2010; Embretson & Reise, 2000). Guidelines for interpreting test information suggest values greater than 10 are highly precise, values below 10 are moderately precise, and values close to 5 are adequately precise. These recommended guidelines (above 10, between 5 and 10, and less than 5) are approximately set at standard errors of .44, .39, and .32, and reliability coefficients of .90, .85, and .80, respectively (Flannery et al., 1995; Reeve & Fayers, 2005).
An analysis of test information was conducted on the Total Sample (i.e., all general population and all clinical cases combined; see chapter 6, Development, for a description of the samples), to provide maximum information in estimating these functions via the mirt package in R (Chalmers, 2012). As seen in Figure 8.1, the Self-Report and Observer forms of the CAARS 2 demonstrate high precision across the range of the trait being measured for the Inattention/Executive Dysfunction scale (note that this scale is presented as an example, and figures for all other scales are provided in appendix L). The peak of the curve for most scales is at approximately 2 SD above the mean, and the area under the test information functions (or curves) is wide, such that precision remains consistent from average levels of the construct (i.e., when theta, on the x-axis, is at 0) to 2 to 3 SD above the mean, depending on the scale. In other words, the scale scores are likely to be the most precise for individuals with T-scores ranging from 50 to about 75. Additionally, the peak of the curve extends well beyond information values of 10 for Self-Report and Observer forms, which indicates very high precision of measurement, or reliability, for each scale. The high degree of precision and small degree of error at the target ranges, as illustrated by the example seen in Figure 8.1, provide further evidence for the reliability of the CAARS 2 scales.
< Back | Next > |