Empirical research in clinical supervision: a systematic review and suggestions for future studies

Background Although clinical supervision is considered to be a major component of the development and maintenance of psychotherapeutic competencies, and despite an increase in supervision research, the empirical evidence on the topic remains sparse. Methods Because most previous reviews lack methodological rigor, we aimed to review the status and quality of the empirical literature on clinical supervision, and to provide suggestions for future research. MEDLINE, PsycInfo and the Web of Science Core Collection were searched and the review was conducted according to current guidelines. From the review results, we derived suggestions for future research on clinical supervision. Results The systematic literature search identified 19 publications from 15 empirical studies. Taking into account the review results, the following suggestions for further research emerged: Supervision research would benefit from proper descriptions of how studies are conducted according to current guidelines, more methodologically rigorous empirical studies, the investigation of active supervision interventions, from taking diverse outcome domains into account, and from investigating supervision from a meta-theoretical perspective. Conclusions In all, the systematic review supported the notion that supervision research often lags behind psychotherapy research in general. Still, the results offer detailed starting points for further supervision research. Trial registration PROSPERO; CRD42017072606, registered on June 20, 2017.


Background
Although in psychotherapy training and in professionlong learning, clinical supervision is regarded as one of the major components for change in psychotherapeutic competencies and expertise, its evidence base is still considered weak [1][2][3]. Clinical supervision is currently considered a distinct competency in need of professional training and systematic evaluation; however, theoretical developments and experience-driven practice still seem to diverge, and "significant gaps in the research base" are evident ( [1], p. 88).
Definitions of supervision underline different aspects, whereas a lack of consensus seems to impede research [1]. Falender and Shafranske [4,5] stress the development of testable psychotherapeutic competencies in the learners, i.e., their knowledge, skills and values/attitudes, through supervision; on the other hand, supervisors need to develop competence to deliver supervision. Milne and Watkins [6] describe clinical supervision as "the formal provision, by approved supervisors, of a relationshipbased education and training that is work-focused and which manages, supports, develops and evaluates the work of colleague/s" (p. 4). In contrast, Bernard and Goodyear [7] emphasize supervision's hierarchical approach, in as much as it is provided by more senior to more junior members of a profession. The goals of supervision may thus range between the poles of being normative (i.e., ensuring quality and case management), restorative (i.e., providing emotional and coping support) and formative (i.e., promoting therapeutic competence), and, thus, may ultimately lead to effective and safe psychotherapy [6]. Hence, it is pivotal for supervisors to reflect upon their own knowledge or skills gaps, and to engage in further qualification [8]. Clinical supervision may involve different therapeutic approaches and thus addresses therapists from varying mental health backgrounds [8], which is the stance taken in the current review.
Besides providing a definition of clinical supervision, it is relevant to delineate related terms. One is feedback, a supervision technique that "refers to the 'timely and specific' process of explicitly communicating information about performance" ( [8], p. 28). Contrary to supervision, coaching strives to enhance well-being and performance in personal and work domains [9], and is therefore clearly distinct from supervision and psychotherapy with mental health patients provided by licensed therapists.
In the supervision literature, there is no paucity of narrative reviews, commentaries or concept papers. Previous reviews have revealed positive effects of supervision, for example on supervisee's satisfaction, autonomy, awareness or self-efficacy [10][11][12][13]. Still, results on the impact of supervision on patient outcomes are still considered mixed [10]. Importantly, there is a knowledge gap regarding the active components of supervision, i.e., the effects of supervision or supervisor interventions on supervisees and their patients [10].
Past reviews, however, suffer from several limitations (for details, see [14]). First of all, strategies used for literature search and screening have not always been described or implemented rigorously, that is, implemented in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA [15]) reporting guidelines (e.g. [10][11][12][16][17][18][19]). Further, several reviews focus specifically on the positive effects of supervision [19] or specifically on learning disabilities [11], emphasize the authors' point of view [20,21], or concentrate on the supervisory relationship only [14]. While the majority of the above-mentioned reviews are narrative, Alfonsson and colleagues conducted a systematic review [14], pre-registered and published a review protocol [22] and implemented a thorough literature search and methodological appraisal. However, since they focused exclusively on cognitive behavioral supervision and on experimental designs, only five studies fit their inclusion criteria. Additionally, interrater agreement was only moderate during screening. Likewise, in our previous scoping review [23], we concentrated on cognitive behavioral supervision. Furthermore, like other supervision reviews [20,21], it was published in German only, limiting its scope.
Thus, the current systematic review aimed to complement previous reviews by using a comprehensive methodology and concise reporting. First, we aimed to review the current status of supervision interventions (e.g., setting, session frequency, therapeutic background) and of the methodological quality of the empirical literature on clinical supervision. Second, we aimed to provide suggestions for future supervision research.

Materials and methods
We conducted a systematic review by referring to the PRISMA reporting guidelines [15]. The review protocol was registered and published with the International Prospective Register of Systematic Reviews (PROSPERO; CRD42017072606).

Inclusion and exclusion criteria
We included studies referring to clinical supervision as defined above by Milne and Watkins [6] above. Both, supervision conducted on its own or as part of a larger intervention (as in psychotherapy training) were included. Treatment studies in which supervision was conducted solely to foster treatment delivery were excluded because they mainly address study adherence and are still covered in other reviews [24,25]. Furthermore, clinical supervision had to refer to psychotherapy, whereas supportive interventions accompanying other treatments (e.g., clinical management) were excluded. Thus, we included studies referring to mental health patients, and studies with patients with physical diseases were considered only if the reason for treatment was patients' mental health. Studies with another population (e.g., simulated patients or pseudo-clients) were excluded. In order to focus the review in the heterogeneous field of clinical supervision, we limited it to adult patients. Studies on family therapy were included if they focused on adults. Studies with mixed adult and child/adolescent populations were included if the results were reported for the adult population separately. No prerequisites were predefined for supervisor qualification. Any empirical study published within a peer-reviewed process (i.e., without commentaries or reviews) and any outcome measures were included. As such, any supervision outcome (e.g., supervisees' satisfaction or competence), including negative or unexpected outcomes (e.g., nondisclosure), were allowed. In line with Hill & Knox [10], we did not focus on studies exclusively examining the supervision process because firstly, it does not provide knowledge on the effectiveness of supervision, and secondly, relationship variables are already covered by other reviews [11]. Thus, the review focused on supervision interventions, and studies exclusively focusing on the effects of relationship variables or attitudes between the supervisee and supervisor (i.e., as independent variables) were excluded. However, relationship variables were considered if they were considered as dependent variables in the primary studies.

Study search
The bibliographic database search was conducted during February and March 2017 in key electronic mental health databases (Fig. 1). To include the current evidence, we focused our search on studies published from 1996 onwards. There were no language restrictions. The following search strategy was used: supervis* AND (psychotherap* OR cognitive-behav* OR behav* therapy OR CBT OR psychodynamic OR psychoanaly* OR occupational therapy OR family therapy OR marital therapy) NOT (management OR employ* OR child* OR adolesc*). Then, we inspected the reference lists of the included studies (backward search) and conducted a cited reference search (forward search). We finished our search in July 2017.

Screening and extraction
Referring to Perepletchikova, Treat and Kazdin [26], one reviewer (FK) introduced two Master's psychology students (JM, SW) to the review methods, and the group discussed the review process in weekly one-hour sessions. First, titles and abstracts were screened for inclusion (JM, SW). The first 10% (n = 671) of all titles and abstracts were screened by both raters independently. Inter-rater agreement regarding title/abstract screening amounted to κ = .83 [CI = .73-.93], which is considered high [27].
Next, full texts of eligible and unclear studies were retrieved and then screened again independently by both raters (JM, SW). Disagreements were resolved through discussion or through the inclusion of a third reviewer (FK). If publications were not available through inter-library loans, a copy was requested from the corresponding author. For nine authors, contact details were not retrievable, and out of the 15 authors that were contacted, five replied. Inter-rater agreement concerning full text screenings for inclusion/exclusion was κ = .87 [CI = .77-.97].
For data extraction, we used a structured form that was piloted by three reviewers (FK, JM, SW) on five studies. It comprised information on supervision characteristics (e.g., setting, implementation and competence) and study characteristics (e.g., design, main outcome). Data were extracted independently by two Fig. 1 Flowchart on study selection. Adapted from Moher and colleagues (15); SV: supervision raters, the results were then compared, and disagreements resolved again by mutual inspection of the original data.

Methodological quality
Since we included various study designs, we could not refer to one common tool for the assessment of methodological quality. We therefore developed a comprehensive tool applicable to various study designs to allow for comparability between studies. For the development, we followed prominent recommendations [27][28][29]. The items were as follows: a) an appropriate design regarding the study question; b) the selection of participants; c) measurement of variables/data collection; d) control/ consideration of confounding variables; and e) other sources of bias (such as allegiance bias or conflicts of interest). Every item was rated on whether low (1), medium (2) or high (3) threats to the methodological quality were supposed. The resulting sum score ranges from 5 to 15, with higher values indicating the possibility of greater threats to the methodological quality. The methodological quality was rated by two review authors independently (JM or SW and FK). Inter-rater reliability for the sum scores reached ICC (1,2) = .88 [CI = .70-.95], which is considered high [30]. Disagreements in ratings were again resolved through discussion within the review group.
Due to the heterogeneity of the study designs and outcomes, we will present the review results narratively and in clearly arranged evidence tables.

Supervisions
Only a minority of studies described any form of supervision manual used or any prior training of supervisors [32, 37-39, 42, 43]. In most cases, supervisees were postgraduates or had a PhD degree. Regarding the frequency of supervision sessions, most studies reported weekly sessions [31,32,34,35,37,41,42], and the total number varied considerably from 3 [35] to 78 sessions [31].
Three studies did not describe the supervision frequency [33,36,45], and one singled out one supervision session only [44] (recommendation to "Describe how the study is conducted").

Interventions
Whereas different forms of feedback or multiple-component supervision interventions were commonly studied, active interventions such as role play were seldom used [37,39,40]. Three studies did not describe the interventions used within supervision [35,44,45] (recommendation to "Investigate active supervision methods"). Four supervisions used a form of live intervention [36,[41][42][43], and the remainder conducted supervision face-to-face. All but five studies [32-34, 44, 45] investigated some form of technological support.

Methodological quality
The assessments of the methodological quality are presented in Table 2. The total methodological quality score was between 9 and 11 in six publications [32, 38, 41-43, 46, 49], between 12 and 13 in eight publications (score of 12-12 [31, 33-36, 45, 49];), and between 14 and 15 in five of the 19 publications [37,39,40,44,47], with a lower score indicating a lower risk of a threat to the methodological quality. On an item level, most problems referred to the selection of participants, the control of confounders, and other bias such as allegiance bias ( Fig. 2; recommendation to "Conduct methodologically stringent empirical studies").

Discussion
The aim of the present study was to systematically review the status and quality of the current empirical literature on clinical supervision and, based on the review findings, to draw conclusions for future studies. The current review identified 19 publications referring to 15 empirical studies on the status of clinical supervision. Despite using wide inclusion criteria, it is remarkable that only such a small number of studies could be included. In contrast to former reviews, our study was conducted systematically according to current guidelines, using a reproducible methodology and concise reporting. Compared to previous reviews, it was not limited to psychotherapeutic approaches or study designs.
Regarding the psychotherapeutic approaches of the supervisees, most interventions had a CBT background, Fig. 3 Supervision outcomes and methodological quality of the respective studies. In relation to the methodological quality; e.g., 2 studies with medium and 1 study with higher risk of possible threats to methodological quality investigated the supervisory relationship Fig. 2 Methodological quality of the included studies. Lower risk … lower possible threats to methodological quality, sum score of 9-11 (range 5-15); medium risk … 12-13; higher risk … 14-15; e.g., 16 studies with higher risk of threats regarding selection of participant issues which still documents a research gap in studies on clinical supervision between CBT and other therapeutic approaches.
Aside from psychotherapy approaches, the meta-theoretical perspective of competency-based supervision, as proposed by the American Psychological Association [8], provides a more integrative and broader view. Their supervision guidelines involve seven key domains central to good-quality supervision, from supervisor competencies to diversity or ethical issues. Importantly, they describe supervision to be science-informed, which again underlines the importance of supervisors and supervisees to keep their evidence-based knowledge and skills up-todate during profession-long learning.
Considering the conduction of supervision, face-toface supervision was prevalent, but technological support was common as well, at least in published empirical studies. A variety of interventions was used, including less active ones such as case discussions and coaching, as well as more active ones such as feedback on patient outcomes or supervisee performance. It is clearly positive that active interventions (such as coaching and feedback) were implemented and evaluated because they have proven useful in active learning and therapist training [50]. Nevertheless, even more active methods, such as exercise or role play, were an exception [23]. Furthermore, it remains unclear which interventions are helpful in profession-long learning and maintenance of expertise [21,23]. We found that central supervision characteristics, such as the training of supervisors or the manual used for supervision, were not described consistently. Although a detailed description of how studies were conducted seems intuitive, it is surprising that reporting guidelines are not referred to consistently.
Concerning design characteristics, most studies were uncontrolled or used small samples. Further constraints were associated with the lack of follow-up data and major inconsistencies in the evaluation of negative effects. Although external observers, which were only sometimes independent, were used, almost half of the studies relied exclusively on self-reported questionnaires. Another problem was that the heterogeneity in the designs and instruments hampered the quantitative summary of results. Methodological quality has been criticized in supervision research for years (e.g. [16,17],), and inconclusive findings or relevant alternative explanations additionally impeded firm conclusions on supervision effects. Regarding the effects of clinical supervision, the review documents that supervision research clearly lags behind psychotherapy research in general; that is, we still have limited evidence on supervision effects, especially those regarding patient benefits [10], and we continue to search for active supervision ingredients [51].
Acceptance and satisfaction are crucial prerequisites for supervision effects, and they were the variables most frequently investigated. Although positive results in these domains may be considered stable [13], satisfaction may not be confused with effectiveness. Taken from health care-related conceptualizations [52], subjective satisfaction may depend on a number of variables, such as mutual expectations, communication, the supervisory relationship, the access to supervision or financial strains. In this sense, satisfaction is distinct from learning and competence development. Other important outcomes of supervision, such as the therapeutic relationship and competencies, treatment integrity, patient symptoms or unwanted effects, clearly need further investigation [10,21]. Other ideas include considering not only the supervisory relationship but also supervisory expectations as important process variables across psychotherapeutic approaches [13].

Limitations
We constructed a short tool for rating methodological quality, which enabled comparisons between the diverse designs of the studies included. Although inter-rater reliability was high, it lacks comparability with other reviews. Due to a stricter operationalization of the inclusion criteria, six studies were included in our previous scoping review [23], and three were included in another current review [14] that were not part of the current systematic review. More specifically, one study was not located via our search strategy, and the other publications did not describe explicitly if the patients were adults. As the excluded publications were mainly referring to CBT supervision, it generally reflects the stronger evidence-base of CBT that has its roots in basic research. Since the review aimed to illustrate the status and quality of supervision research, we did not restrict it to specific designs, but mapped the status quo. This necessarily increased heterogeneity, and especially regarding supervision effects, it limited the possibility to draw clear-cut conclusions or to combine the results statistically. Differences in the results of reviews may result not only from methodological aspects but also from diversity in the primary studies, which may be addressed only by better supervision research [14].

Conclusions
The review provides a variety of starting points for future research. The recommendations derived mainly refer to the replicability of research (i.e., to conduct methodologically stringent empirical studies, and to include positive and negative supervision outcomes). Taking a competency-based view, the following are examples of significant foci of both future practice and supervision research [23,53,54]: Define, review and continuously develop supervisor competencies. Include active methods, live feedback and videobased supervision. Enhance the deliberate commitment to ethical standards to protect patients. Positively value and include scientific knowledge and progress. Foster profession-long learning of supervisees and supervisors.
Logistics may be an important issue in supervision research. Therefore, if large-scale quantitative studies are difficult to conduct or fund, methodologically sound pragmatic trials [3] and experimental studies may be feasible alternatives. Most of the results still speak to the lack of scientific rigor in supervision research. Thus, we consider competency-based supervision and research investigating the essential components of supervision as the major goals for future supervision research and practice.