Manual

CAARS 2 Manual

Chapter 7: Standardization Procedures


Standardization Procedures

Trends in CAARS 2 raw scores for the Normative Samples were examined to determine age- and gender-based normative groupings. Consistent with previous research on gender differences in the symptomatology of ADHD (e.g., Conners, Erhardt, & Sparrow, 1999; Robinson et al., 2008; Williamson & Johnston, 2015), separate normative groups were created for Males, Females, and Combined Gender (comprising male, female, and other genders). Gender differences are explored in more detail in chapter 10, Fairness.

Analyses were conducted to determine whether statistically and practically significant differences between age groups within each gender were observed, using Kruskal-Wallis chi-square tests, Kendall’s tau (τ) correlation coefficients, and comparisons of standardized differences (Cohen’s d effect size ratios). There was a statistically significant effect of age group (categorized as 5-year age bands) on raw scores (p < .01 for all scales on Self-Report except Emotional Dysregulation; p < .001 for Inattention/Executive Dysfunction, Negative Self-Concept, and DSM ADHD Inattentive for Observer; see Table 7.16 and 7.17 for full results for Self-Report and Observer, respectively). There were also significant correlations observed between raw scale scores and age (treated as a continuous variable; see Table 7.18 for Self-Report, p < .01 for all scales; see Table 7.19 for Observer, p < .01 for all scales except Hyperactivity, DSM ADHD Inattentive Symptoms Scale, and DSM ADHD Hyperactive/Impulsive Symptoms Scale). (Note that the terms age and age group are used to distinguish when age is treated continuously or categorically, respectively.)

Effect sizes between all age groups were then examined. For Self-Report, effect sizes ranged from 0.04 to 0.71, with scores typically decreasing with age (although it is notable that scores for Inattention/Executive Dysfunction decrease from young adults to middle-aged adults, but then increase in older ages, showing a curvilinear relationship; see Figure 7.1). For Observer, effect sizes ranged from -0.46 to 0.58, showing an overall decrease with age for all scales. Through this analysis, it was determined that age-based normative groups are meaningful, but adjacent age groups with negligible differences (i.e., Cohen’s |d| < .20) could be collapsed into a single age group. As a result, seven normative age groups were created for both CAARS 2 Self-Report and Observer Normative Samples: 18–24, 25–29, 30–39, 40–49, 50–59, 60–69, and 70+.

Click to expand

Figure 7.1. Inattention/Executive Dysfunction Raw Score by Age of the Rated Individuals: CAARS 2 Self-Report Combined Gender Normative Sample

To best capture the important relationship between age and raw scores on the CAARS 2, continuous norming was selected as the method for creating standardized scores. Through this regression-based method (Roid, 1983; Zachary & Gorsuch, 1985; Zhu & Chen, 2011), the means and standard deviations of the normative age groups were statistically smoothed to mitigate the effects of sampling variability and to better model the progression of symptoms across ages. By establishing a line or curve that best fits the data and theory, continuous norming makes efficient use of information from the whole sample rather than drawing upon one age group at a time (Angoff & Robertson, 1987). All analyses were conducted in R Studio, using the stats package (version 3.6.2; R Core Team, 2013).

For each scale on the CAARS 2, standard assumptions for general linear models were checked (e.g., inspecting normality, absence of outliers, heteroskedasticity; Tabachnick & Fidell, 2007). Due to the nature of the constructs (symptomology at a level that is not present in much of the general population), the raw scores for most scales violated assumptions of normality. The positive skew was corrected by applying a square root transformation, where necessary, prior to entering the scores into regression models (Lenhard et al., 2016). Each regression model included age (in years) and age squared as predictors, testing both a linear and curvilinear relationship, with the scale score as the outcome variable. If the curvilinear (quadratic) term was not statistically significant (p < .05), this term was dropped from the regression and a linear model was used instead. The resultant parameter estimates (i.e., unstandardized beta weights) from the regression model were extracted and used to derive a smoothed predicted mean for each age group.

The same process was applied to the standard deviations. Standard deviations for each scale score were calculated for each age group, serving as the outcome variable, and age (and age-squared, where statistically significant) was entered as a continuous predictor. Again, the parameter estimates of these regression models were extracted and used in the computation of smoothed predicted standard-deviation values for each age group. The smoothed means and standard deviations were then used to calculate T-scores, and these standard scores showed reduced noise due to sampling variability and ensured no discontinuity between adjacent age groups. The CAARS 2 Combined Gender and Gender Specific Normative Samples, for both Self-Report and Observer, were standardized as T-scores, with a mean of 50 and standard deviation of 10.

Empirical percentiles were also calculated within each group in the Normative Samples. Empirical percentiles are generated using the frequency distribution of the actual scores. Therefore, if 90% of the scores are at, or below, a given raw score, that raw score is assigned the 90th percentile. In contrast, theoretical percentiles could be calculated from the empirical T-scores, such that a T-score of 50 is equivalent to a percentile rank of 50 and follows a standard normal distribution. However, due to the skewness of many of the scales in the CAARS 2, empirical percentiles were selected instead to better reflect the shape of the distributions when communicated via percentiles for the Normative Samples.

This continuous norming process was also applied to the ADHD Reference Sample to create standardized scores. For consistency with the Normative Sample, the same age groups were used where possible; however, due to the sparse data for older age ranges, 60-69 years old and 70+ years old were collapsed into a single group (60+ years old) for the ADHD Reference Sample (see Table 7.11). For each scale, a smoothed predicted mean and standard deviation was calculated from a regression to predict each scale’s raw score using age (and age-squared, where statistically significant) as a continuous variable. By using this method, scores for all ages can be interpolated (based on the resultant regression line), even for age groups with a small sample size (e.g., individuals aged 50–59 years in the Observer sample). The entire sample is employed in calculating the shape of the regression line, and therefore provides a stable estimate by reducing dependence on small samples. This method, in turn, reduces the noise from sampling variability, and yields scores that can be used to effectively describe all age groups without discontinuous jumps between groups. The procedure that draws upon the full sample provides more robust estimates of moments of the distribution (e.g., M and SD) than would be afforded by a small sample size alone. Using the same process as the general population samples, the Combined Gender and Gender Specific Reference Samples, for both Self-Report and Observer, were standardized to T-scores (M = 50, SD = 10) for each scale on the CAARS 2. To calculate percentiles for the ADHD Reference Sample, theoretical (rather than empirical) percentiles were chosen to better capture the shape of the distribution of responses. The raw scores of the ADHD Reference Samples were more normally distributed than the Normative Samples, and therefore, theoretical percentiles were most appropriate.

Standardized scores for the ADHD Index are described in detail in chapter 12, CAARS 2–ADHD Index.

< Back Next >