The main findings in this validation of the CORE-OM in a mid-adolescent sample were a new factor solution and a higher cut-off score than reported in adult samples. The EFA resulted in a five factor solution, and the factor contents were interpreted as general problems, positive resources, risk to self, risk to others, and problems with others. The CFA model fit for this model was good. The measurement invariance analysis for gender should not be performed without modification of the scale. The clinical cut-off score based on the all-item total was higher than in an adult sample. Both the all item total and general problems cut-off score showed gender difference.
Factor analysis and reliability
From the exploratory factor analysis, based on the training part of the training sample, a five-factor model was interpreted to be the best candidate for model evaluation. In the EFA, this model had improved model fit over factor solutions with less factors, and had factors that were interpreted as General problems, Positive resources, Risk to self, Risk to others and Problems with others. In the following confirmatory factor analysis, done on the testing part of the sample, model fit for this model can be characterized as good.
The developers of the CORE-OM manual describes the instrument as a four-dimensional measure with dimensions of Subjective well-being, Problems, Functioning and Risk [6]. There may be many reasons why data from the present youth sample yielded a different factor structure. We believe that one main reason for this is that eight of the 34 items are positively keyed. Lyne et al. (2006) [9] showed that for CORE-OM on an adult sample, method factors related to positive and negative wording of the items played a role in achieving acceptable model fit. In our sample, all the eight positively keyed items loaded on the same factor in the five-factor EFA. It seems that when the adolescents answer these items, the positive resources in their lives are prompted, rather than just negative aspects. This highlights that assuming that a low score on a positively keyed item reflects the same as a high score on a negatively keyed item is problematic. According to the tripartite theory of anxiety and depression [39] negative affect and lack of positive affect may represent separate dimensions of internalizing problems. The current factor solution supports that negative affect and lack of positive affect are not two sides of the same coin.
Combining all the positively keyed items to a separate subscale not only solves the problem of reversed items, but also produces a substantially easier subscale to interpret than the theoretically derived Well-being scale, since it reflects resources, wellbeing and self-efficacy.
Incidentially, one of the positively keyed items had a much lower factor loading in the CFA than the other items. Although item 19 (“I have felt warmth and affection for someone») is positively keyed it differs in content from the rest of the positive items. This item may measure traits like empathy or affection directed towards other people, and not necessarily positive feelings about themselves. Also, removing this item from the scale improves the Omega reliability score by nearly 0.02, and this item had an item to rest-correlation below 0.30. One other reason for the low factor loading for item 19 may be that the Norwegian translation of the word “affection” is a word that is probably not used among Norwegian adolescents nowadays. Thus, a revision of the Norwegian translation is recommended.
The risk items split into two distinct factors. The risk to others items (item 6 and 22) correlates highly with each other but little with the other items in the questionnaire. We also see this through low factor correlations between the latent Risk-to-others variable and the other four latent variables in the CFA. In the EFA, a Risk to other scale shows up early, although it is questionable whether these two items cover a large enough range of such a dimension.
The Risk to self-dimension seems more robust, having a high reliability score for the internal consistency. The factor correlation between Risk-to-self and General problems is very high (> 0.90), and this seems natural as having many symptoms of problems may impact self-harm and suicidal ideation. Cross-loadings between Risk-to-self and General problems were evident for the items “I have felt despairing or hopeless” and “I have felt unhappy”. Although such items may indirectly indicate risk of self-harm, we believe that these items are more direct indicators of the severity of of emotional problems.
The reliability analysis revealed that the Omega would increase slightly if item 34 “I have hurt myself physically or taken dangerous risks with my health” was removed from the scale. This item may or may not be related to intensions of self-mutilations or suicide. The other items within the Risk to self-scale are more directly associated with such intentions, while taking dangerous risks may be sensation-seeking behavior not directly associated with self-harm intentions.
For the General problems scale, half (17) of the CORE-OM items loaded highest on this variable in the EFA. For this scale, the Omega reliability would improve slightly if the items 8 and 29 were removed from the scale, and these two items had the lowest item to rest correlations for the items within the General problems scale. Item 8 (“I have been troubled by aches, pains or other physical problems “) may be caused by mental health issues but can also be a result of injuries, physical disease and other issues not related to emotional problems. Increased reliability removing this item from the general problem scale may be an indication of this. For item 29 (“I have been irritable when with other people”) was probably the item that was most difficult to place. It loaded moderately on the General problem latent variable, and cross-loaded on the Problems with others latent variable. To be irritable when with others can be an indicator of problems with the functioning with others, but can also be an indicator of emotional problems since irritability may be associated with several traits or conditions [40].
Finally, items 25, 26 and 33 loaded on the Problems with others factor. These items have to do with relationships with others. Lyne et al. [9] pointed at the same three items as belonging to a common factor. After accounting for a general distress factor, these three items were the only items that had meaningful loadings on their residualized Functioning factor. This highlights that feelings of humiliation or critique from others and having no friends may form a separate factor in the CORE-OM instrument.
Measurement invariance for gender
We did a measurement invariance analysis for gender, to evaluate whether it is reasonable to make mean comparisons between girls and boys using CORE-OM.
Comparing the configural model and full scalar model, we found that scalar model fit significantly worse than the configural model, and this indicates that one cannot compare means for boys and girls without modifications to the scales. After 4–5 steps of relaxing constraints in the scalar model, we found a partial scalar model that did not fit significantly worse than the configural model. In comparing means for boys and girls on the CORE-OM scales, one should probably be careful in using the items 14, 29, 4, 31 and 19.
Different researchers rely on different fit statistics when evaluating measurement invariance Putnick and Bornstein [41] show that many consider that a small change in CFI or RMSEA going from a configural to a scalar model could indicate scalar invariance. The change in CFI and RMSEA shown for gender invariance in the non-clinical sample in the present study, is very small, and within the limits of full scalar invariance mentioned by Putnick and Bornstein [41]. However, it is problematic if one chooses the change in χ2 as criterion for invariance when it is non-significant and other criteria when it is significant. We used a data driven method (modification indices) instead to establish partial scalar invariance. Partial scalar invariance can be concluded when a large majority of the items on the factors is invariant [42] The use of modification indices is also controversial [41], but can be helpful in determining items that are problematic. For example, our analysis showed that item 14 in CORE-OM (“I have felt like crying”) may be an item that is problematic to include when symptoms of depression or anxiety are to be compared between the genders. Boys and girls report very differently on this item, and this difference cannot be attributed only to the amount of emotional problems on the latent scale the adolescents have but also to some gender specific traits.
Gender differences in the non-clinical sample
We compared factor means for male and female adolescents in the non-clinical group using the final partial scalar model. Boys and girls differed on four of the five latent variables. A non-significant difference between the genders was found for the risk to self-variable. For the non-clinical group few adolescents had thoughts of self-harm. The girls scored higher than boys did on the general factor, and that has also been shown in other studies [5, 13]. Boys scored significantly higher on the risk to others factor. This is consistent with other validations of the scale [7, 14] .
For the positive resources latent variable, girls scored significantly lower than boys. Finally, for the Problems with others factor the girls scored higher than boys. The items in this scale have to do with feelings of having been criticized, humiliated, made shameful or having no friends, and are as such about emotional relations with others. Girls tend to use emotional coping skills more often than boys, and help from others, while boys tend to devaluate such emotional expressions [43], hence stronger feelings related to emotional relationships can be the result. In the Japanese version of CORE-OM the female participants showed lower scores on “close relationships” subscales [14].
Factor correlations between the latent variables in the chosen factor solution were high, except for those involving risk to others. Similar gender differences for general emotional problems, positive resources and problems with others can be a sign that related concepts are being involved.
Mixing positive and negative items in a questionnaire
Lyne et al. [9] concluded their article, studying 2140 adult patients, that the most useful scoring method of the CORE-OM would be to compute a general total score based on the 28 non-risk items and a risk total based on the remaining six items. The main difference between the 17-item general problems scale from the present study and the 28-item non-risk scale is the exclusion of the positively keyed items from the 17-item version.
One of the reasons for including both positively and negatively keyed items in a questionnaire is to reduce acquiescence bias (response style bias, respondents tending to agree with statements) [44]. However, positively and negatively keyed items may involve different cognitive processes [45, 46] and this is one of the reasons that a positive item latent variable showed up in the EFA. It is a paradox that including some positively keyed items in a questionnaire consisting mostly of negatively keyed items, in order to mitigate acquiescence bias, seems to confuse the responders and therefore makes the instrument less valid and scales less reliable.
Clinical cut-off score
The original validation of the CORE suggested a clinical cut-off of 1.2 [5], and later validations have suggested a cut-off point as low as 1.0 [47] to define clinical caseness. However, in these adolescent samples, the cut-off score on the All-items CORE-OM was 1.31, 1.44 (girls) and 1.02 (boys). This finding needs to be replicated, but it corresponds well with the finding that youths also score higher than adults on the BDI [22, 23]. Consistent of the results from the present study we also recommend the 17-item factor as a measure of general problems. The positively keyed items do not interfere with this factor and the problem with others items are also excluded. In this way we have a more reliable measure on emotional problems and the cut-off scores for this factor is suggested as an alternative to the established All items minus Risk score. The rationale for this is that the All items minus Risk 28-item score includes all reversed items, and may thus actually underestimate the level of emotional distress experienced by patients. The cut-off scores for both All-items and the 17-items general distress factor show gender differences, with girls scoring higher than boys and a higher score than in adult samples [7]. We suggest that the cut-off scores either is gender specific or that the cut-off for gender combined is set lower to accommodate for the boys lower scoring, as suggested by Connell et al. [47].
Limitations
The clinical sample may not be representative of the entire CAMHS population due to the sample being preselected based on symptoms of emotional problems. Furthermore, patients evaluated as suicidal were excluded from the sample because they could not be subjected to the 6-week waiting condition. However, since the CORE-OM was mainly developed to monitor outpatient treatment and is not the outcome measure of choice for psychosis or conduct disorder, the present clinical sample probably has a high density of the phenomena that the CORE-OM was designed to monitor.
The age span in the non-clinical sample was 14–18, while the age range in the clinical sample was 14–17. The reason for this is that Norwegian CAMHS receives only those younger than age 18 as patients, while youths 18 and older are referred to mental health services for the adult population. However, in high school, enrolment in different grades is based on the year of birth. We decided not to exclude the 18-year-olds from the non-clinical sample. Furthermore, the mean age in the two samples is similar.
Due to the low rate of males in the clinical sample, the mean and standard deviation in the male clinical sample used in the Jacobson and Truax formula have large standard errors. Therefore, the clinical cut-offs for boys are encumbered with uncertainty.