Manual

Conners 4 Manual

Chapter 12: Reliability

Reliability

view all chapter tables | print this section

Internal Consistency
Standard Error of Measurement
Test Information
Test-Retest Reliability
Inter-Rater Reliability

Multiple indicators of reliability are provided for scores from the Conners 4–ADHD Index, including internal consistency, test information, test-retest reliability, and inter-rater reliability. A thorough explanation of reliability, as well as the reliability evidence for the full-length Conners 4, can be found in chapter 8, Reliability. Note that the Conners 4–ADHD Index raw scores are used primarily in this section to permit correlational analyses.

Internal Consistency

Internal consistency estimates for the Conners 4–ADHD Index are presented in Tables 12.15 and 12.16 for the Normative and ADHD Reference Samples (see chapter 7, Standardization, for a description of these samples; see chapter 8, Reliability, for an in-depth description of internal consistency and the coefficients reported in this section).

The reliability coefficients presented in Tables 12.15 and 12.16 exceed guidelines for high reliability and indicate that the Conners 4–ADHD Index has excellent internal consistency on all forms across all age groups. Across all age groups and genders in the Normative Samples, the median omega of the Conners 4–ADHD Index for Parent was .93 (ranging from .91 to .94), .90 for Teacher (ranging from .86 to .92), and .87 for Self-Report (ranging from .85 to .90). For the ADHD Reference Samples, the median omega for Parent was .89 (ranging from .87 to .91), .87 for Teacher (ranging from .85 to .89), and .83 for Self-Report (ranging from .81 to .85). In summary, multiple metrics indicate that the Conners 4–ADHD Index provides cohesive, consistent, and reliable estimates of key predictive ADHD symptoms.

Click to expand

Table 12.15. Internal Consistency: Conners 4–ADHD Index Normative Sample

Form	Age Group	Combined Gender				Males				Females
Form	Age Group	N	α	ω	MIC	N	α	ω	MIC	N	α	ω	MIC
Parent	6 to 7	260	.92	.92	.51	130	.92	.93	.48	130	.92	.92	.53
	8 to 9	260	.93	.94	.55	130	.94	.94	.57	130	.92	.92	.51
	10 to 11	260	.93	.93	.53	130	.92	.92	.52	130	.93	.93	.53
	12 to 13	260	.94	.94	.47	130	.94	.94	.48	130	.93	.93	.46
	14 to 15	260	.93	.93	.49	130	.93	.93	.50	130	.93	.94	.49
	16 to 18	260	.92	.92	.54	130	.92	.92	.58	130	.91	.91	.48
Teacher	6 to 7	260	.91	.92	.43	130	.92	.92	.43	130	.90	.91	.39
	8 to 9	260	.90	.90	.40	130	.90	.91	.37	130	.88	.88	.43
	10 to 11	260	.90	.91	.39	130	.90	.91	.39	130	.89	.89	.36
	12 to 13	260	.89	.90	.38	130	.88	.89	.40	130	.90	.91	.33
	14 to 15	260	.88	.89	.47	130	.89	.90	.48	130	.87	.88	.44
	16 to 18	260	.88	.89	.42	130	.89	.90	.43	130	.86	.86	.37
Self-Report	8 to 9	220	.87	.87	.32	110	.88	.88	.32	110	.86	.86	.33
	10 to 11	220	.85	.86	.37	110	.85	.85	.35	110	.86	.86	.39
	12 to 13	220	.88	.88	.41	110	.87	.87	.42	110	.88	.89	.40
	14 to 15	220	.89	.89	.35	110	.90	.90	.35	110	.89	.89	.36
	16 to 18	220	.87	.87	.36	110	.86	.87	.37	110	.87	.87	.33

Note. α = alpha, ω = omega, and MIC = mean inter-item correlation.

Click to expand

Table 12.16. Internal Consistency: Conners 4–ADHD Index ADHD Reference Samples

Form	Age Group	Combined Gender				Males				Females
Form	Age Group	N	α	ω	MIC	N	α	ω	MIC	N	α	ω	MIC
Parent	6 to 12	358	.88	.88	.43	237	.88	.88	.43	121	.87	.87	.45
Parent	13 to 18	202	.90	.90	.37	131	.90	.90	.38	71	.91	.91	.36
Teacher	6 to 12	188	.85	.86	.37	134	.85	.86	.36	54	.84	.85	.39
Teacher	13 to 18	133	.87	.88	.32	93	.87	.88	.33	40	.88	.89	.30
Self-Report	8 to 12	117	.83	.83	.28	72	.82	.82	.26	45	.84	.85	.28
Self-Report	13 to 18	112	.82	.83	.28	71	.81	.81	.27	40	.83	.84	.31

Note. α = alpha, ω = omega, and MIC = mean inter-item correlation.

Standard Error of Measurement

The standard error of measurement (SEM) is a statistic that quantifies the amount of error in the obtained raw scores (more details are provided in chapter 8, Reliability). Overall, the median SEM was 2.67 for Parent, 3.20 for Teacher, and 3.57 for Self-Report in the Conners 4 Normative Samples, and 3.28 for Parent, 3.61 for Teacher, and 4.12 for Self-Report in the ADHD Reference Samples (see Table 12.17 for SEM values for the Normative and ADHD Reference Samples). The low values reported here indicate a small standard error of measurement, or very little error in the estimated true scores for the Conners 4–ADHD Index, and therefore a high degree of reliability.

Click to expand

Table 12.17. Standard Error of Measurement for Raw Scores: Conners 4–ADHD Index

Sample	Age Group	Parent			Teacher			Self-Report
Sample	Age Group	Combined Gender	Males	Females	Combined Gender	Males	Females	Combined Gender	Males	Females
Normative Sample	6 to 7	2.75	2.73	2.76	2.91	2.86	3.04	—	—	—
	8 to 9	2.53	2.35	2.80	3.15	3.08	3.43	3.59	3.48	3.73
	10 to 11	2.70	2.80	2.59	3.08	3.04	3.33	3.80	3.83	3.76
	12 to 13	2.48	2.35	2.65	3.20	3.36	3.05	3.48	3.54	3.37
	14 to 15	2.58	2.62	2.55	3.27	3.22	3.45	3.25	3.18	3.31
	16 to 18	2.88	2.84	2.93	3.32	3.20	3.68	3.60	3.61	3.57
ADHD Reference Sample	6 to 12*	3.47	3.43	3.57	3.77	3.71	3.89	4.11	4.23	3.83
ADHD Reference Sample	13 to 18	3.10	3.13	2.98	3.46	3.50	3.33	4.13	4.32	4.05

Note. * = Age Group is 8 to 12 for Self-Report.

Test Information

Test information was explored in the Total Samples for Parent, Teacher, and Self-Report (for details about the Total Samples, see Tables 6.5 and 6.6 in chapter 6, Development; for details about test information, see chapter 8, Reliability). As seen in Figure 12.1, precision of measurement for the Conners 4–ADHD Index is excellent for both the Parent and Teacher forms. Values exceeded 10 for average levels of the trait (i.e., denoted as 0 along the x-axis) to past 2 standard deviations (SD) above the mean, which reflect clinical levels of the trait. For Self-Report, values stay at about 10 for average levels of the trait to 2 SD above the mean. Values of this magnitude meet guidelines for test information, suggesting sufficient precision of measurement for the Conners 4–ADHD Index across all rater forms and supporting the reliability of this index.

Figure 12.1. Test Information: Conners 4–ADHD Index

Test-Retest Reliability

The test-retest reliability of the Conners 4–ADHD Index was assessed by computing the correlation of raw scores from individuals who completed the test on two separate occasions. It was expected that the index would have stable scores that do not vary significantly upon re-administration. Ratings of youth from the general population (i.e., youth who did not have a clinical diagnosis) completed the Conners 4–ADHD Index with a 2- to 4-week time interval in between administrations (N = 81 for Parent; N = 61 for Teacher; N = 68 for Self-Report; refer to Table F.1 and Table F.2 in appendix F for demographic characteristics of the test-retest samples).

The correlation coefficients and descriptive statistics of the raw scores from the test-retest samples are provided in Table 12.18. Results demonstrated that the Conners 4–ADHD Index has strong (r = .66 for Self-Report, p < .001) to very strong (r = .91 for Parent, r = .89 for Teacher, p < .001) evidence of test-retest reliability. As further evidence of score stability over the retest period, mean scores from each administration were closely aligned, as seen in Table 12.19. The stability of the Conners 4–ADHD Index probability score was further evaluated by calculating the difference between the percentage of the samples within each probability score category range from Time 1 to Time 2. Overall, fewer than 10%, 7%, and 11% of the Parent, Teacher, and Self-Report samples, respectively, showed a shift in probability score category; that is, scores that were classified as “Low” at Time 1 were overwhelmingly likely to also be classified as “Low” at Time 2. The stable nature of the scores, as demonstrated by the test-retest reliability coefficient, provides confidence that any change observed in scores over time is more likely due to a true change, as opposed to imprecise measurement.

Click to expand

Table 12.18. Test-Retest Reliability: Conners 4–ADHD Index

Form	r	Time 1			Time 2
Form	r	M	Mdn	SD	M	Mdn	SD
Parent	.91	6.8	5	6.1	6.0	4	6.4
Teacher	.89	11.7	9	9.7	11.3	8	10.0
Self-Report	.66	10.1	9	6.8	9.7	8	7.4

Note. N = 81 Parent; N = 61 Teacher; N = 68 Self-Report. All correlations significant, p < .001. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80.

Click to expand

Table 12.19. Percentage of Change in Probability Score Range: Conners 4–ADHD Index Test-Retest

Form	Administration	Percentage of Sample Within Each Score Range						Cliff’s d
Form	Administration	Very Low	Low	Borderline	Moderate	High	Very High	Cliff’s d
Parent	Time 1	77.5	12.7	2.8	0.0	1.4	5.6	.09
Parent	Time 2	87.3	2.8	4.2	0.0	0.0	5.6	.09
Teacher	Time 1	33.3	31.3	6.3	12.5	8.3	8.3	.02
Teacher	Time 2	37.5	27.1	10.4	6.3	6.3	12.5	.02
Self-Report	Time 1	21.8	54.5	3.6	7.3	12.7	0.0	.09
Self-Report	Time 2	30.9	45.5	10.9	7.3	1.8	3.6	.09

Note. N = 81 Parent; N = 61 Teacher; N = 68 Self-Report. Guidelines for interpreting Cliff’s |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff’s d value indicates that the proportion of Time 1 scores were higher overall than at Time 2.

Inter-Rater Reliability

Inter-rater reliability refers to the degree of agreement between two raters who are rating the same youth. Estimates of inter-rater reliability help describe levels of consistency between raters, typically indexed by a correlation coefficient (in this case, Pearson’s r; LeBreton & Senter, 2008). Consistency between raw scores from various raters was explored for the Conners–ADHD Index Parent, Teacher, and Self-Report. Two inter-rater studies were conducted with the Conners 4–ADHD Index: (a) two raters of the same rater type rated the same youth (i.e., two parents or two teachers), and (b) two raters of different rater types rated the same youth (i.e., parent/teacher, parent/self-report, or teacher/self-report).

Inter-Rater Reliability Study 1. In the first inter-rater study for the Conners 4–ADHD Index, the dyads were comprised of two Parents (N = 68) and two Teachers (N = 34) rating the same youth (refer to Table F.3 and Table F.4 in appendix F for the demographic characteristics of the inter-rater samples).

The obtained inter-rater reliability coefficients are provided in Table 12.20. Strong (Teacher r = .62, p < .001) to very strong (Parent r = .83, p < .001) inter-rater agreement was found for the Conners 4–ADHD Index. Table 12.20 also shows the descriptive statistics for each rater, highlighting the similarity of average scores between the two raters through the very small differences between mean raw scores. The consistency of the Conners 4–ADHD Index probability score was further evaluated by calculating the difference between the percentage of the sample within each Probability Score category range from Parent 1 to Parent 2 or Teacher 1 to Teacher 2. There was a negligible effect size of the difference between raters (Cliff’s d = -.01 for Parent and .03 for Teacher; see Table 12.21), indicating that different raters typically provided ratings that led to scores within the same category, as seen through the similar percentages of each sample with scores in each category. These findings support the reliability of the Conners 4–ADHD Index.

Click to expand

Table 12.20. Inter-Rater Reliability Study 1: Conners 4–ADHD Index

Form	r	Rater 1			Rater 2
Form	r	M	Mdn	SD	M	Mdn	SD
Parent	.83	10.9	10	7.8	11.4	9	8.7
Teacher	.62	22.6	22	9.8	22.2	22	9.3

Note. N = 68 for Parent rater pairs; N = 34 for Teacher rater pairs. All correlations significant, p < .001. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80.

Click to expand

Table 12.21. Percentage of Change in Probability Score Range: Conners 4–ADHD Index Inter-Rater Study 1

Form	Rater	Percentage of Sample Within Each Score Range						Cliff’s d
Form	Rater	Very Low	Low	Borderline	Moderate	High	Very High	Cliff’s d
Parent	Parent 1	62.3	14.8	3.3	4.9	1.6	13.1	−.01
Parent	Parent 2	62.3	14.8	3.3	1.6	1.6	16.4	−.01
Teacher	Teacher 1	0.0	25.8	6.5	19.4	19.4	29.0	-.03
Teacher	Teacher 2	3.2	19.4	9.7	12.9	25.8	29.0	-.03

Note. N = 68 for Parent rater pairs and N = 34 for Teacher rater pairs. Guidelines for interpreting Cliff’s |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff’s d value indicates that the proportion of Parent 1/Teacher 1 scores were higher overall than Parent 2/Teacher 2 scores.

Inter-Rater Reliability Study 2. In the second inter-rater study, comparisons were made across different types of raters. As the Conners 4–ADHD Index Parent, Teacher and Self-Report all measure similar constructs, similarity in scores across different types of raters, as well as between observer ratings and self-reported ratings, would provide evidence of the reliability of the test scores. Although some degree of similarity was expected between informants, it was nonetheless expected that a certain degree of incongruence would exist between the ratings from different informants (i.e., the correlations should be small to moderate in size), because various informants may have different opinions about, or different experiences with, the youth’s behavior, and because they see the youth in different contexts.

For the Conners 4–ADHD Index, correlation coefficients were calculated between the following pairs of raters: (a) parent and teacher, (b) parent and self-report, and (c) teacher and self-report. To examine these relationships, parents, teachers, and youth provided ratings of the same youth (N = 62; all ratings completed within a 30-day period). All youth in this sample had a clinical diagnosis (refer to Table F.5 in appendix F for demographic characteristics of the rated youth and raters). The obtained correlation coefficients between different rater types are provided in Table 12.22.

Results were similar to the results found on the full-length Conners 4 and the Conners 4–Short. The correlations between the Conners 4–ADHD Index raw probability scores ranged from very weak (Teacher/Self-Report r = .11) to weak (Parent/Self-Report r = .21) to moderate (Parent/Teacher r = .48, p < .001). In addition, there was a small effect size of the difference between raters (Cliff’s d ranges from -.10 to .27; see Table 12.23). These results indicate that although raters largely provided ratings that led to similar categories of the probability score, some differences may arise, as evidenced by the slight differences seen in the proportions of each rater sample in each score category in Table 12.23. This range in agreement between raters may be due to a variety of reasons, one of which is setting differences. Youth can report on their own behaviors across multiple settings, whereas parents and teachers can only report on those behaviors observed in specific environments, such as home and school. There is clearly immense value in obtaining ratings from multiple sources when administering the Conners 4–ADHD Index.

Click to expand

Table 12.22. Inter-Rater Reliability Study 2: Conners 4–ADHD Index

Form	r	p	Rater 1			Rater 2
Form	r	p	M	Mdn	SD	M	Mdn	SD
Parent/Teacher	.48	< .001	17.5	18	7.2	20.2	21	9.8
Parent/Self-Report	.21	.104	17.5	18	7.2	20.1	20	7.3
Teacher/Self-Report	.11	.413	20.2	21	9.8	20.1	20	7.3

Note. N = 62 for all forms. Guidelines for interpreting |r|: very weak < .20, weak = .20 to .39, moderate = .40 to .59, strong = .60 to .79, very strong ≥ .80.

Click to expand

Table 12.23. Percentage of Change in Probability Score Range: Conners 4–ADHD Index Inter-Rater Study 2

Form	Rater	Percentage of Sample Within Each Score Range						Cliff’s d
Form	Rater	Very Low	Low	Borderline	Moderate	High	Very High	Cliff’s d
Parent/Teacher	Parent	14.5	8.1	6.5	12.9	3.2	54.8	.24
Parent/Teacher	Teacher	3.2	27.4	3.2	19.4	25.8	21.0	.24
Parent/Self-Report	Parent	14.5	8.1	6.5	12.9	3.2	54.8	.18
Parent/Self-Report	Self-Report	1.6	16.1	11.3	17.7	25.8	27.4	.18
Teacher/Self-Report	Teacher	3.2	27.4	3.2	19.4	25.8	21.0	−.10
Teacher/Self-Report	Self-Report	1.6	16.1	11.3	17.7	25.8	27.4	−.10

Note. N = 62 for all forms. Guidelines for interpreting Cliff’s |d|: negligible effect size < .15; small effect size = .15 to .32; medium effect size = .33 to .46; large effect size ≥ .47. A positive Cliff’s d value indicates that the proportion of scores for the first form listed in each pair were higher overall than the second form listed.

< Back

Next >

Chapter 12: Conners 4–ADHD Index

Conners 4 Manual

Chapter 12: Reliability

Reliability

Internal Consistency

Standard Error of Measurement

Test Information

Test-Retest Reliability

Inter-Rater Reliability