Using the GAF to assess functional impairment and change in Shift Work Disorder (SWD)
Background: The Global Assessment of Functioning (GAF) addresses social, psychological, and occupational functioning and has been used as an outcome measure. Although some have questioned the GAF’s utility due to observed low inter-rater reliability estimates (e.g., Bates et al., 2002), these problems have been mitigated in research by training and monitoring (Hilsenroth et al., 2000; Vatnaland et al., 2007). We sought to determine whether the GAF can be used as a reliable and valid way to detect functional change in a group of sleep-disordered patients. Our secondary aim was to contribute to existing research on the GAF by assessing whether training method (e.g., video vs. written vignette) impacts reliability and rater accuracy.
Methods: Raters received training and assigned GAF scores after reviewing videotaped interviews and vignettes of sleep-disordered patients. T-tests examined whether GAF ratings were sensitive to functional changes at different stages of illness, and were also used to determine concordance with gold standard scores.
Results: Mean comparisons in GAF ratings between the videos were significant (t(42) = -10.69, p<.001) and concordance with expert raters was achieved. Ratings also differed significantly between both vignette 1 and vignette 2 (t(96) = -15.02, p<.001), and vignette 2 and vignette 3 (t(57) = 17.47, p<.001). However, there was only moderate concordance with expert raters for the vignettes.
Conclusions: Results suggest that the GAF can assess functional changes in SWD patients, and that this population exhibits social, occupational, and psychological problems that are observable when using broad-based psychiatric scales. Consistent with prior research (e.g., Hilsenroth, 2000), results also indicate that raters are able to score the GAF accurately following training that addresses rating strategy and consistent conceptualization of GAF anchors. Video training materials were superior to vignettes for reaching concordance with expert ratings. Additionally, raters showed higher agreement with gold-standard scores when ratings changed in the expected direction (improvement in the second visit) rather than an unexpected one (worsening after baseline). Therefore, GAF training should address expectancy issues in addition to rater error and other inter-rater reliability problems. This approach will help to improve the GAF’s construct validity when applied to ratings of broader populations, as well as to this SWD cohort.