Next Article in Journal
Center of Pressure Deviation during Posture Transition in Athletes with Chronic Ankle Instability
Previous Article in Journal
Incidence, Risk Factors, and Consequences of Post-Traumatic Stress Disorder Symptoms in Survivors of COVID-19-Related ARDS
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reliability and Validity of Six Selected Observational Methods for Risk Assessment of Hand Intensive and Repetitive Work

1
Department of Medical Sciences, Occupational and Environmental Medicine, Uppsala University, SE-751 85 Uppsala, Sweden
2
Department of Occupational and Environmental Medicine, Uppsala University Hospital, SE-751 85 Uppsala, Sweden
3
School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, SE-141 57 Huddinge, Sweden
4
Centre for Occupational and Environmental Medicine, Stockholm County Council, SE-113 65 Stockholm, Sweden
5
Unit of Occupational Medicine, Institute of Environmental Medicine (IMM), Karolinska Institutet, SE-171 77 Stockholm, Sweden
6
Department of Occupational Health Science and Psychology, Faculty of Health and Occupational Studies, University of Gävle, SE-801 76 Gävle, Sweden
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2023, 20(8), 5505; https://doi.org/10.3390/ijerph20085505
Submission received: 8 March 2023 / Revised: 3 April 2023 / Accepted: 4 April 2023 / Published: 13 April 2023
(This article belongs to the Section Occupational Safety and Health)

Abstract

:
Risk assessments of hand-intensive and repetitive work are commonly done using observational methods, and it is important that the methods are reliable and valid. However, comparisons of the reliability and validity of methods are hampered by differences in studies, e.g., regarding the background and competence of the observers, the complexity of the observed work tasks and the statistical methodology. The purpose of the present study was to evaluate six risk assessment methods, concerning inter- and intra-observer reliability and concurrent validity, using the same methodological design and statistical parameters in the analyses. Twelve experienced ergonomists were recruited to perform risk assessments of ten video-recorded work tasks twice, and consensus assessments for the concurrent validity were carried out by three experts. All methods’ total-risk linearly weighted kappa values for inter-observer reliability (when all tasks were set to the same duration) were lower than 0.5 (0.15–0.45). Moreover, the concurrent validity values were in the same range with regards to total-risk linearly weighted kappa (0.31–0.54). Although these levels are often considered as being fair to substantial, they denote agreements lower than 50% when the expected agreement by chance has been compensated for. Hence, the risk of misclassification is substantial. The intra-observer reliability was only somewhat higher (0.16–0.58). Regarding the methods ART (Assessment of repetitive tasks of the upper limbs) and HARM (Hand Arm Risk Assessment Method), it is worth noting that the work task duration has a high impact in the risk level calculation, which needs to be taken into account in studies of reliability. This study indicates that when experienced ergonomists use systematic methods, the reliability is low. As seen in other studies, especially assessments of hand/wrist postures were difficult to rate. In light of these results, complementing observational risk assessments with technical methods should be considered, especially when evaluating the effects of ergonomic interventions.

1. Introduction

Work-related musculoskeletal disorders (WRMSD) are still a major concern in working life [1]. These disorders can, besides pain and suffering for the individual, cause the employer economic consequences due to sick leave costs and reduced productivity [1,2,3,4,5]. Physical factors in the work environment, such as forceful exertions, awkward postures and repetitive work, as well as psychosocial and organisational factors, are associated with WRMSDs in the neck, shoulder and arms [6,7,8,9,10,11,12]. Hence, risk assessments of physical factors are of importance for identifying potentially harmful work tasks and for prioritizing and designing workplace interventions, both regarding the physical design of the workplace, work technique and work organization [13,14]. For evaluation purposes, it is also recommended to perform renewed risk assessments after an intervention. Furthermore, governmental bodies, such as national or regional work environment authorities, stipulate that risk assessments should be conducted within the systematic occupational health management work [15]. It is therefore highly important that methods for risk assessments are valid and reliable.
Through ergonomics research in combination with technical developments, it has become less complicated and less expensive to perform risk assessments through direct measurements using different types of technical equipment, and several of these methods are now becoming available also for ergonomists within the occupational health services (OHS), or equivalent [16,17]. However, the most common way for practitioners to identify and quantify physical exposures at a workplace is still to use observations. Previous research indicates that ergonomists often assess risks in the work environment solely by means of observation, based on his/her knowledge and experience, without the use of any systematic methodology or explicit method [18,19,20]. Further, Eliasson et al., concluded that the reliability of such assessments is low [21]. To improve the systematics of observations, different observational-based methods can be used. These methods are described as being useful due to their low cost and ability to present the results of the risk assessment in a way that is easy to understand for the stakeholders [18,22].
Different observational methods are designed to target and assess different exposures, such as manual handling, (heavy lifting, push-pull actions), awkward postures or repetitive work/hand intensive work. The ergonomist often needs to combine several observational methods when performing a comprehensive risk assessment of a workplace. In a review article by Takala et al., 2010, 30 eligible observational methods were identified, and it was concluded that for many methods studies of reliability and validity were lacking [22]. Since then, a number of reliability studies of different observational methods have been conducted, but the studies show mixed results (see Appendix A; Table A1 and Table A2). A recent systematic review by Graben et al. concluded that comparisons of reliability and validity between different methods are hampered by differences in the study design between existing studies, e.g., regarding the background and competence of the observers, the complexity of the observed work tasks, whether the observations are done directly or digitally, as well as the differences in the chosen statistical methodology [23].
For the assessment of hand intensive and repetitive work, some of the more well-known and more commonly used methods are the Assessment of repetitive tasks of the upper limbs (ART) [24], the Hand Arm Risk Assessment Method (HARM) [25], the Occupational Repetitive Actions Checklist (OCRA) [26], the Quick Exposure Check (QEC) [27] and the Strain Index (SI) [28]. These methods have all been shown to have a reasonably good predictive ability, either in direct studies, or indirectly, for the methods that are based on another method or on epidemiologically documented risk factors [22,25,26,27,28,29].
Furthermore, the methods have all been evaluated with regards to their inter-observer reliability, with the reliability commonly interpreted as fair, but rarely above a moderate level [27,30,31,32,33,34,35,36,37,38,39]. Moreover, these methods (with the exception of HARM) have been evaluated concerning intra-observer reliability, which usually has been proven to be higher [27,30,31,34,35,37,40]. For further information regarding these studies, see Appendix A (Table A1 and Table A2).
To conclude, there are large methodological differences between studies. The occupations and work tasks where the methods have been evaluated differ, as well as the statistical methods used for evaluation of the reliability. Furthermore, the observers’ background regarding ergonomics expertise varies between studies, such as the observers being workers with no prior education within ergonomics, or occupational health students or experienced ergonomists within occupational health service. All these factors can influence the results in the various studies and makes a comparison between the studies and the evaluated methods hard to perform. It is therefore of interest to include several methods in the same study and assess the reliability and validity of these methods using the same study design, taking all the aforesaid factors into consideration.
The purpose of this project was to evaluate the above-mentioned five risk assessment methods, concerning both inter- and intra-observer reliability as well as concurrent validity, using the same observers, the same occupations and work tasks and the same statistical methods.
Additionally, since this project was performed in a Swedish context, a sixth method, the most commonly used among ergonomists within the OHS in Sweden, the Repetitive work and work posture models by the Swedish Work Environment Authority (SWEA) was included [18].

2. Materials and Methods

Inter- and intra-observer reliability of six different observational methods were assessed by letting twelve experienced ergonomists perform risk assessments of ten video-recorded work tasks twice. Concurrent validity was assessed by comparing the ergonomists’ assessments with consensus assessments carried out by three experts within ergonomics.

2.1. Included Methods

All methods except OCRA were already translated into Swedish prior to the present project and were, prior to and at the time of the project publicly available, either through the Swedish Work Environment Authority’s website, or through the website hosted by the Department of Occupational and Environmental Medicine, Uppsala University Hospital. The OCRA method was translated into Swedish by the authors of the present study with support from Professor Daniela Colombini [34]. Several of the methods are well-known by Swedish ergonomists in occupational health service [18], and training in these methods are often a part of the continuing education courses and university level programmes in ergonomics in Sweden. The six methods are described below. For details on previous reliability studies regarding the six methods, see Appendix A (Table A1 and Table A2).
1. Assessment of repetitive tasks of the upper limbs (ART) was developed by the British Health and Safety Executive (HSE) [24]. When using the method, an assessment is made in four different areas: frequency/repetition of movements, force demands, work postures (neck, back, arm/shoulder, hand) and other factors (including work pace and task duration). The calculated score is translated into one of three risk levels: green, yellow or red (corresponding to low, moderate and high risk). Previous studies of the inter-observer reliability of the final risk score have shown Intra class correlations (ICCs) ranging from 0.73 to 0.87, which can be interpreted as moderate to good reliability. Intra-observer reliability studies have shown that ICCs for the final risk score and overall risk level were ranging from 0.84 to 0.99 and 0.90, respectively, indicating a good to excellent reliability [41,42].
2. Hand Arm Risk Assessment Method (HARM) was developed by the Netherlands Organisation for applied scientific research (TNO) [25]. When using the method, an assessment is made in five different areas: task duration, force demands (including frequency/repetition of grasp), work postures (neck, arm/shoulder and forearm/wrist), exposure to vibrating tools and other factors (such as precision demands, adverse climate and pauses). The calculated score is translated into one of three risk levels: green, yellow or red (corresponding to low, moderate and high risk). The only previous study found of the inter-observer reliability indicated a moderate to good reliability, with an ICC of the final risk score of 0.73. No studies of the intra-observer reliability were found.
3. Occupational Repetitive Actions of the Upper Limbs checklist (OCRA) was developed in Italy and is a simplified version of the OCRA index [26,43,44]. When using the checklist, assessments are made in six different areas: work postures, frequency/repetition of movements, (arm/shoulder, elbow, wrist and hand), force demands, task duration, lack of recovery time and other factors (physio-mechanical as well as socio-organisational). A calculated risk score is translated into one of five risk levels: green, yellow, light red, dark red and purple (corresponding to acceptable, very low, medium-low, medium and high risk). Previous studies have shown ICCs of the overall risk levels ranging from 0.62 to 0.80 for the inter-observer reliability [31,33], and 0.85 for the intra-observer reliability, indicating a moderate to good reliability and a good reliability, respectively [31].
4. Quick exposure check (QEC) was developed in the UK [27,45]. When using the method, an assessment is made in six different areas: work postures, frequency/repetition of movements, (back, neck, arm/shoulder, wrist/hand), force demands, task duration, exposure to vibrating tools and other factors (such as visual demands, work pace and stress). The calculated score is translated into one of four risk levels: low, moderate, high and very high risk. The risk levels are presented specifically for back, neck, arm/shoulder and wrist/hand. Additionally, a total score has been suggested by Brown and Li, and is also used in the present study [46]. Previous studies of the inter-observer reliability for the total score have revealed ICCs ranging from 0.71 to 0.97 [35,47], considered as moderate to excellent reliability, and ICCs of the intra-observer reliability between 0.4 and 0.89 [35,37], which indicated a poor to good reliability.
5. Strain Index (SI) was developed in the United States as a method for analysing jobs with a risk of distal upper extremity disorders [28]. When using the method an assessment is made in six different areas: intensity of exertion, duration of exertion per work cycle, efforts per minute, wrist posture, speed of exertion and task duration. The calculated score is translated into one of three risk levels: low, moderate and high risk [39]. Since the development of SI, an updated version has been published [48]. However, at the time of data collection in the present study, the version was not yet available. Previous inter-observer reliability studies have shown that the ICC for the risk level was 0.54 and for the risk score was between 0.43 and 0.64 [31,33,38] indicating moderate reliability as well as poor to moderate reliability, respectively, while studies of the intra-observer reliability have shown ICCs for the risk level ranging from 0.56 to 0.82 and an ICC for the risk score of 0.76, indicating moderate to good reliability as well as moderate reliability, respectively [31,40].
6. Repetitive work and work posture models (SWEA) is a two-part checklist included in the Swedish Work Environment Authority’s (SWEA) provisions on physical ergonomics, AFS 2012:2, and was originally developed in a pan-Nordic project in 1994 [49]. When using the checklist for repetitive work, an assessment is made in four different areas: work cycle, postures and movements, scope for action and work content. The classification for each of the areas is made in three levels: green, yellow or red (corresponding to low, average and high). Regarding a number of aggravating factors, the time when the work is performed and how it is distributed over the day, the assessor makes a summary assessment of the included parameters, whereby the work cycle is considered the overriding factor. In the work postures checklist, an assessment is made separately for neck, shoulder/arm, back and leg. The classification for each of the areas is made in three levels: green, yellow or red (corresponding to low, moderate and high risk). No previous studies of the inter- and intra-observer reliability have been found.

2.2. Recruitment

Twelve ergonomists, all registered physiotherapists (RPT) (a common combination in Sweden), were recruited through contacts with different OHS companies or through social media posts to members of the Swedish Ergonomist and Human Factors Society (EHSS). To be included in the study, the ergonomists should be employed by an OHS company (or equivalent) and have at least one year of work experience with risk assessments. All twelve ergonomists were women, mean age 49 years (range 39–55 years). All had extensive work experience within physical ergonomics, mean 13 years (range 4–26 years), and ten out of twelve ergonomists performed more extensive risk assessment assignments four times per year or more often. Prior to the study, all ergonomists had used the SWEA method, six of the ergonomists had used HARM and five ergonomists had used QEC. Only one ergonomist had used SI and none of the ergonomists had used ART or OCRA.

2.3. Training of Ergonomists

Initially, the ergonomists individually learned and trained on the six selected methods during a three-week period using an e-learning platform [50]. The training encompassed: (1) a pre-recorded lecture on general aspects of risk assessment, (2) pre-recorded instruction videos with “walk-through” examples where the six selected methods were applied on different work tasks and (3) self-supported training using a video library with film clips (two to six minutes long) of different work tasks. The films were accompanied by written information on the work task, e.g., task duration, pause and rests, weights of handled goods, and employees’ ratings of force exertion, discomfort and work demands. The manuals and the protocols for all six methods were available for download on the e-learning platform.

2.4. Risk Assessments

Ten video-recorded work tasks from different work sectors were included (See Table 1). For all chosen work tasks, the work postures and movements were largely of a repetitive character (Table 1).
The video recordings were made using two to four video cameras from different angles to enable the best possible conditions for the risk assessments. For each work task, the different views were synchronized into one video with multiple windows to show the different views of the worker with a close-up on hand and wrist movements. Each of the finalized video recordings was two to six minutes long.
During the performance of the risk assessments the ergonomists, seated in the same room, watched the video recordings on individual laptop computers and were allowed to pause or repeat the playback as needed. The ergonomists were requested to perform the assessments individually, without conferring with each other.
Not all items included in the methods could be rated by solely looking at the video. Consequently, additional information was pre-given to the ergonomists in a supplementary document. This document covered information such as the duration of work tasks during the work day, pause and rest schedules, weight of handled goods, visual demands, worker’s ratings of discomfort on Borg’s CR10-scale [51] as well as the level of work demands and control (Table 1). To enable intra-observer reliability analyses, the risk assessments performed were repeated in a second session, with at least four weeks between the occasions [52].

2.5. Analyses of Reliability and Validity

Calculations of inter-observer reliability were based on the ergonomists’ assessments in the first session. Calculation of intra-observer reliability was based on the first and second session for the ergonomists that repeated their assessments. Statistics for each of the ten work tasks regarding the individual items rated in the methods, as well as for the total risk scores of each method, were calculated. The risk scores were then transposed into risk levels according to the specific instructions for each of the methods.
For each method, except for SWEA, the duration of the work task is rated and given a value which influences the final score. In the instructions to the ergonomists, information regarding the duration of each work task was pre-given in a supplemented document (Table 1). To decrease the variability between the work tasks, inter-observer reliability was also calculated imputing a standardized work task duration, using 3 h 45 min for all ten work tasks.
Further, the six methods differed somewhat with regards to their coverage of exposure variables/body regions, see Table 2.
In the present study, the methods’ concurrent validity was evaluated. Consensus assessments were carried out by three experts, each expert with more than 20 years’ experience in both performing risk assessments of physical exposures, as well as university level teaching within this field. The experts also had extensive research experience, as well as governmental experience concerning work-environment legislation [53].
In the first step, the three experts made individual assessments in accordance with the manual of each method. In the second step, the three experts compared and discussed their individual assessments until consensus was reached. In the third and final step three months later, they (together in the group) repeated the assessments in the reversed order of methods. The assessments from this time were very similar to those of the first time. They decided upon the few discrepancies and agreed upon their final consensus assessments. The experts’ consensus assessments were used as a gold standard in the computation of the concurrent validity of the ergonomists’ ratings.

2.6. Statistical Methods

The proportional agreement in percent and the linearly weighted Kappa coefficient (Klw, see below) were chosen as the primary parameters for the analyses of both inter- and intra-observer reliability as well as for concurrent validity. However, to enable comparisons with other studies, several additional parameters were computed.
Proportional agreement (%) was calculated as the number of rating pairs in agreement divided by the total number of rating pairs. However, to take agreement due to chance into account, proportional agreement should be presented together with other parameters, for example, kappa statistics [52,54]. Cohen’s kappa was calculated for both inter-observer and intra-observer reliability; for intra-observer reliability, the kappa value was calculated for each of the observers, and then the mean value of these kappa values was calculated [54]. Similarly, for the concurrent validity, the kappa value was calculated for each of the ergonomists paired with the experts’ consensus assessments, and then the mean value of these kappa values was calculated. For the inter-observer reliability, pairwise kappa values were first computed, and then averaged over all pairs, since Cohen’s kappa is only applicable for when two observers are used or when test–retest reliability is evaluated. This averaging was conducted in the way suggested by Davies and Fleiss (1982), where the expected agreement, Pe, in Cohen’s kappa formula for each pairwise comparison, K = (Po − Pe)/(1 − Pe), is substituted with the average Pe of all pairs [55]. Po is the proportional agreement.
However, Cohen’s unweighted kappa does not distinguish minor from major discrepancies in ratings, and since the risk ratings for the different included methods represent ordinal data (low, moderate, high risk, very high risk, etc.), linearly weighted kappa [56,57] was computed and averaged in the same way as the unweighted kappa [55,58,59].
The intraclass correlation (ICC) two-way absolute agreement method 2.1 was used in accordance with Shrout and Fleiss [60], and was computed to facilitate comparisons with other studies [27,33,35,39,40]. ICC is mostly applicable for continuous data but has been used in previous reliability studies on ordinal data. Kendall’s coefficient of concordance (KCC) was also computed. KCC is a non-parametric relative to ICC that is applicable with ordinal data [61].
For the interpretation of kappa values, the recommendations (fair 0.21–0.40, moderate 0.41–0.60, substantial 0.61–0.80, almost perfect 0.81–1.00 and perfect 1.00) by Landis and Koch was used [62].
In cases where an observer had missed to fill in a rating for an item, the missing value was replaced by the median category of the other observers’ ratings. See Table 3 for the percentages of these occurrences.
The statistical computations were carried out using scripts written in MATLAB version 8.5 (MathWorks Inc., Natick, MA, USA), the output parameters of which, for small samples, were compared and found to agree with corresponding parameters of the statistical software R version 4.1.3 (R Foundation for Statistical Computing: Vienna, Austria, 2022) or SPSS version 27 (IBM: Armonk, New York, NY, USA, 2022). MATLAB was used in order to obtain time-effective analyses, since there were no functions for multi-observer linearly weighted kappa in R or SPSS.

2.7. Ethical Considerations

The Regional Ethical Review Board in Stockholm (Dnr 2013/308–31/3) gave ethical approval for the present study. Informed oral and written consent was obtained from both the ergonomists performing the risk assessments as well as from the workers featured in the video-recorded work tasks. Ethical considerations of relevance may be the video recordings, which could be considered an intrusion of privacy for the participating workers. For the ergonomists, the results of their individual risk assessments may constitute a transparency risk. Hence, both the workers and the ergonomists were informed that the data was constricted to the project group members, and that any results from the study would be presented only on group level.

3. Results

Twelve ergonomists were recruited for this project. However, not all of the ergonomists completed all risk assessments with all methods.
In the first session, at least 10 out of 12 recruited ergonomists completed the assessments of all items in all methods (Table 3). In total, 9168 item ratings were performed, and missing ratings (1.4%) were replaced with the group median value. Based on the item ratings, altogether, 1230 overall risk level assessments were made.
In the second session, at least 6 out of 12 recruited ergonomists completed the assessments using all methods (Table 3). In total, 7093 item ratings were performed, and missing ratings (0.9%) were replaced with the group median value. Based on the item ratings, altogether, 900 overall risk level assessments were made.
The expert group performed in all 2520 item ratings, making 90 overall risk level assessments. The expert group did not perform any risk assessments for SWEA neck posture, SWEA shoulder/arm posture or SWEA back posture.
The distribution of the risk levels assessments from the ergonomists first and second session as well as the expert groups’ ratings (ranging from low risk level to high risk level as stipulated in each method) is shown in Table 3.

3.1. Reliability

3.1.1. Inter-Observer Reliability

The averaged linearly weighted kappa (Klw) of the overall risk level assessments differed between the six methods, showing the highest inter-observer reliability for HARM and ART (0.65), and the lowest for SWEA overall postures and movements (0.21) (Table 4). When the standardized work task duration was imputed to all work tasks, the linearly weighted kappa (Klw) decreased for all methods except OCRA, and the decrease was largest for HARM with a decrease from 0.65 to 0.26 (Table 5).
Further, it was of particular interest to analyse more in detail the items that the ergonomists rated by actually observing the worker in the video-recorded work tasks, such as postures and movement/repetition (Table 2). The results of the linearly weighted kappa (Klw) show that the inter-observer reliability, especially for hand/wrist posture, was very low for several of the methods, and that items relating to movements, such as repetition, showed the highest inter-observer reliability (Table 6). Complete results for all rated items per method are shown in Appendix B (Table A3).
The ergonomists obtained the same risk level assessments to different degrees in the ten work tasks. In Figure 1 an example is given from the first and second session using the risk level assessments for QEC (Figure 1). The figure shows that, concerning the inter-observer agreement, the risk level assessment was the same for all 10 ergonomists in some of the work tasks, whilst for other work tasks the risk levels differed.

3.1.2. Intra-Observer Reliability

The intra-observer reliability of risk level averaged over the ten work tasks are presented in Table 7. However, the level of intra-observer agreement differed between work tasks for all methods. In Figure 1 an example is given from the first and second session using the risk level assessments for QEC.
The linearly weighted Kappa (Klw) for the overall risk level assessments differed between the methods, ranging between 0.30 to 0.70, showing the highest agreement for HARM, and the lowest for SWEA postures and movements. When the standardized work task duration was imputed to all work tasks, the linearly weighted kappa (Klw) decreased for all methods, and the decrease was largest for HARM with a decrease from 0.70 to 0.47 and ART (left arm a decrease from 0.65 to 0.36, and right arm from 0.68 to 0.43) (Table 8).

3.2. Validity

The concurrent validity of the overall risk level assessments was assessed by comparing the ergonomists’ ratings to the ratings made by the expert group (Table 9). In these computations, the standardized work task duration was imputed to all work tasks. The validity differed between the methods, with the highest linearly weighted kappa averaged (Klw) for HARM (0.42) and ART (0.54), and the lowest Klw for SWEA overall repetition (0.31). Regarding QEC, besides the method giving a total risk level (Klw = 0.47), risk level assessments were also computed for individual body parts (neck, shoulders, wrist and back), showing the highest Klw for neck (0.88), and the lowest Klw for back (0.31).

4. Discussion

The objective of the present study was to assess the inter-observer reliability, the intra-observer reliability, as well as the concurrent validity, of six well-known methods that previously have been presented as suitable for assessing risk in work tasks that are physically demanding for the upper extremities. The work tasks chosen for the risk assessments displayed various levels of exposure to hand-intensive and repetitive work tasks, as well as different degrees of complexity. Previous studies on the reliability of observational risk assessment methods have been performed on different materials and have used different reliability measures in the evaluations (See Appendix A, Table A1 and Table A2). Therefore, a second objective in the present study was to include several methods in the same study and assess the reliability and validity of these methods using the same study protocol.

4.1. Inter-Observer Reliability

The results show that for all methods (overall risk levels) the inter-observer percentage agreement ranged between 39 to 83, showing the highest agreement for SI and the lowest agreement for OCRA. The linearly weighted Kappa (Klw) for the inter-observer reliability differed between the methods, ranging from 0.18 (SI) to 0.65 (HARM and ART). As for the concurrent validity, the results were comparable to the inter-observer reliability. According to the criterions suggested by Landis and Koch [62], a Kappa value between 0.60 and 0.80 can be considered to represent “substantial agreement”, whilst a Kappa value between 0.21 and 0.40 equals a “fair agreement”. The majority of the Kappa values were found in the range between 0.41 and 0.60, which represents a “moderate agreement”. However, these thresholds are indicative and arbitrary, and they usually refer to unweighted kappa, which are lower than the linearly weighted kappa used here. Therefore, the thresholds should be interpreted with precaution. If the outcome and consequences of the assessments had been more critical, higher criteria should have been used. It is also possible to argue that stricter criteria should be applied to intra-observer reliability than to inter-observer reliability because the between-observer variance is not present in intra-observer reliability. However, for simplicity we chose the same criteria for both intra- and inter-observer reliability.
Compared to other reliability studies of the six included methods in this study, our results showed both higher and lower inter-observer reliability.
For example, as for the inter-observer reliability of the QEC method, the ICCs for the total score have been shown to be higher (0.71–0.87 [47], 0.86 [35] and 0.93 [37]), than what we found (0.69), while others, using different kappa statistics, have reported values for each body part and item assessed (Klw = 0.61–0.85 [36]), that were close to ours (Klw 0.35–0.85) or considerably lower (K = 0–0.47 [27]) than our results (K = 0.24–0.82).
Regarding the SI method and the inter-observer reliability of the total risk score, a study of cheese production tasks [33], similar in design to our study, showed a substantially higher ICC (0.59) compared to ours (0.18). This was also the case when comparing to other reliability studies of the SI, such as studies of tasks within poultry slaughtering (ICC = 0.54 [31]) as well in health care and manufacturing, the latter using linearly weighted kappa statistics (Klw = 0.41 [39] compared to our result Klw = 0.18). Higher ICC compared to ours have also been shown in a study of manufacturing and material handling [38], where the ICC for individual assessments was lower (0.43) than for assessments made in teams (0.64).
Our results of the ART method showed ICCs for the total risk level (0.70–0.75) that were lower than previously found (0.75, 0.87 [30] and 0.77 [31]). Likewise, our results of the ICCs for the total risk level of the OCRA-method were lower than described by others (0.80 [33] and 0.72 [31]).
The only study found of the inter-observer reliability of the HARM method [32] included similar assessment tasks as in our study. The study showed an ICC for the total risk level (0.73) which was lower, but close, to our result (0.77).
The inter-observer reliability of SWEA-method in our study was found to be one of the lowest among the included methods, with kappa values indicating a fair agreement. No other studies, known to us, have previously assessed the inter-observer reliability of this method.

4.2. Intra-Observer Reliability

As expected, a somewhat higher reliability was found within observers (intra-observer reliability) compared to between observers (inter-observer reliability), with an overall percentage agreement ranging between 45% in OCRA to 79% in ART. The corresponding Klw ranged between 0.13 for SI and for SWEA (Shoulder/arm posture) to 0.88 for QEC (Neck). Compared to other studies of the six methods included in our study, our results of the intra-observer reliability were generally lower.
Previous studies of the QEC method have shown both lower and higher intra-observer reliability compared to what we found. Regarding the ICC for the total risk score, our ICC was higher (0.79) than in the study of five observers and 13 tasks (0.41–0.60 [35]) but lower than shown in a study of one observer and one task (0.89 [37]). In studies where reliability statistics only have been reported for each body part and items, the ICC values of hospital cleaning tasks (0.61–1.00 [63]) and kappa values of industrial tasks (0.45–0.53 [27]) have shown to be higher than what we found (ICC = 0.1–0.72 and K= 0.32–0.59, respectively).
Our findings of the ART method showed that the ICCs for the total risk level (0.74, 0.78) were lower than previously reported (0.84, 0.99 [30] and 0.9 [31]). Even our ICC for the risk level of the OCRA method (0.72) was lower to what has been reported by others (0.85) [31].
Concerning the SI method, our study showed substantially lower ICCs for the total risk level (0.10) to a study of manufacturing tasks [40], both for individual observers (0.56), as well as compared to the results when the same tasks were assessed in teams (0.82).
In the present study, we found that the intra-observer reliability of HARM was the highest among the methods with a kappa value that could be interpreted as substantial agreement, while SI and SWEA showed the lowest reliability and kappa values that indicate a fair to moderate agreement. No previous studies, as far as we have found, have examined the intra-observer reliability of either the HARM or the SWEA method.

4.3. Further Discussions of the Results in the Present Study

A tentative explanation as to why the HARM and ART methods showed the highest inter-observer agreement (Klw), were at first thought to be the layout design of the scoring sheets, which for both methods included comprehensible photographs/drawings showing neutral and awkward postures for different body regions such as wrist, elbow, arm and neck. However, when studying the computed separate Klw for each rated item, all Klw for items concerning postures and repetition/movements were below 0.44, which is comparable to the findings in a previous study by Eliasson et al. (2017) where ergonomists performed risk assessments without the use of any specific method nor photographs/drawings [21]. Hence, the well-defined illustrations did not seem to facilitate the ratings of postures and movements. The relatively high inter-observer reliability of HARM and ART instead seems to be explained by the supplemented information concerning each work task that was pre-given to the ergonomists (see methods section). Task duration especially has a high impact in these two methods on the resulting estimated risk level; therefore, the agreement in risk levels was high when different times were assigned different work tasks (Table 4), but low when a standardised time was used (Table 5). This raises thoughts on how the results would have turned out if the ergonomists had not had any pre-given information, but had had to get this by individual interviews of the workers, which normally is the case. Throughout the methods, the Klw was generally the lowest for the item ratings of postures of small body regions such as the wrist and hand. This is in line with previous research, which has identified the challenges of visually assessing postures and movements of small body regions [22,64,65,66].
In the present study, we have chosen to use both the proportional agreement (%) and the linearly weighted kappa (Klw). These two parameters do not always correspond, and the reason why it is important to include the proportional agreement (%) in the result is because there are situations where the linearly weighted kappa (Klw) may be low whilst the proportional agreement (%) is high. As an example, if the observers were to choose between standing and sitting and almost everyone choose standing, there would be a high agreement (%), but also a high expected agreement, Pe, and a low Klw (see formula in Section 2.6.) [67]. This explains why the Klw for SI regarding intra-observer reliability is only 0.13 while the proportional agreement is 77%. This may also be an explanation for the intra-observer reliability results regarding SWEA (Shoulder posture).
Regarding the validity, the Klw were in line with the inter-observer reliability for each method, which can be expected since the validity parameter calculated in the present study was concurrent validity. Where there is a large inter-observer variation, as for instance in wrist posture ratings, there is also a large variation around the gold standard ratings. Similarly, for a method with a large inter-observer variation in the total score, there are also large variations when the observer’s scores are compared to the gold standard total score.
As shown above, and in agreement with previous findings, there is a considerable variation between ergonomists’ assessments of risk levels for MSDs in the observation methods. However, since observation without the use of any specific method have a lower and non-acceptable reliability [21], it is recommended to use one or more systematic observational-based risk assessment methods. Another approach would be to combine an observational method with validated methods of direct measurements where the items of the lowest reliability in the observational method of choice are replaced by using technical methods, especially so when an intervention is to be evaluated.
Regardless of the choice of method used for quantification of exposure levels, it is important to consider that additional information regarding, e.g., psychosocial and organizational factors, as well as results from occupational health surveillance regarding musculoskeletal disorders, is needed for a more comprehensive risk assessment [68,69]. Furthermore, the use of observational risk assessment methods includes many different aspects in a compact way. Several methods also involve the experiences and opinions of the worker in the assessment. The methods may facilitate by increasing the interest of work environment and ergonomics, the knowledge of different risk factors, and may provide a basis for a participatory approach in the risk assessment.
The inter- and intra-observer reliability in the present study may have been affected by several factors. Although the video recordings were short, the observers may still have concentrated on different parts of the recordings, which may lead to differences in the assessments. Further, the previous knowledge and experience of risk assessment in the different work sectors may have differed between the observers.
A further affecting factor could be difficulties in assessing postures and movements from video recordings compared to live observations. However, Mathiassen et al. (2013) found that posture assessments based on video recordings is beneficial, since this provides the possibility to conduct repeated ratings of the same work sequence [70]. In the present study, the video recordings were composed of synchronized video windows that displayed the work sequence from several different angles, something which may have contributed to provide a more comprehensive picture [40,71].
Regarding the intra-observer reliability, an often stated problem is possible changes between the test and the re-test occasion [72]. In the present study, this problem is addressed by a design where the assessments are made using video recordings, meaning that the potential sources of variation (such as alteration in job performance or change of the worker performing the job) is overcome.

4.4. Methodological Considerations

The data in the present study consists to a large extent of variables with more than two scale steps. Hence, linearly weighted kappa was considered to be the most suitable choice for the analyses of both inter- and intra-observer reliability as well as for concurrent validity. Linearly weighted kappa can differ between one- and two-step differences (a two-step difference is given double the weight of a one-step difference), which is not the case with Cohen’s unweighted Kappa or quadratic weighted kappa [56,57,73]. While the unweighted kappa and agreement percentage become lower with an increased number of items, the linearly weighted kappa is not dependent on item numbers, and facilitates comparisons of assessment methods with different numbers of steps in their scales [56].
However, to be able to compare the findings in the present study with those from earlier studies on reliability and validity of observational methods (see Appendix A; Table A1 and Table A2), the choice was made to include several of the most commonly used statistical parameters such as the Cohen’s unweighted Kappa, the intraclass correlation coefficient (ICC) and Kendall’s coefficient of concordance (KCC) [60,61]. The intraclass correlation coefficient (ICC) is often recommended for use in multi-observer comparisons and KCC is a non-parametric relative to ICC. In theory, if all ergonomists perform the assessments with perfect agreement, the validity may still be low (i.e., if all observers do the same but incorrect assessments). On the contrary, when the reliability is low (low agreement) there cannot be a high validity, given that a high reliability is a necessary but not sufficient condition for high validity [74].
The present study investigated the concurrent validity, with the expert group as gold standard. Another approach would have been to use technical measurements of postures and movements as gold standard. However, this was not within the scope of the present study.
As for other types of validity, such as predictive validity, observational methods have been sparsely evaluated. However, the SI method has previously been found to be associated with disorders in the distal part of the upper extremities [39]. In the present study it was not possible to investigate the predictive validity, for which longitudinal data and large cohorts would have been required [75].
In the present study, all ergonomists were used to performing risk assessments with different observational methods in their work, and they underwent the same training in the six included methods. The training was aimed at ensuring that the ergonomists had equal minimum level required knowledge of each of the methods. The ergonomists also had access to the training material during the whole period. However, it cannot be ruled out that the differences regarding their prior knowledge and experience in the six methods might influence the reliability scores. The SWEA method was familiar to all ergonomists beforehand, whilst the methods OCRA and ART had not been used by any of the ergonomists prior to the present study. However, if previous familiarity of a method would have affected the results, it is plausible to assume that the reliability scores for SWEA would have been higher, especially with regards to intra-observer reliability, which was not the case in the present study.

4.5. Strengths and Limitations

The heterogeneity of a studied phenomenon (in the present study equal to the work tasks) is an important parameter in reliability studies [76]. Even though the selection of the ten work tasks represented varying degrees of repetitiveness and risk levels, a higher number of diverse work tasks would have improved the study, and there are still many occupations and work tasks that were not included. Nevertheless, because of the large variation between work sectors and job characteristics between the work tasks in the study, the results can be considered to represent repetitive jobs rather well. Further, to ask the participating ergonomists to perform even more ratings than what was demanded in the present study was considered unjustifiable. Still, since the ergonomists agreed to different degrees in the various work tasks, the results may have been somewhat different if other or more tasks had been included.
A limitation in the present study is that the number of ergonomists that completed all assessments decreased, for various reasons, during the data collection period. However, it can be seen as a strength that the inter-observer reliability computations, as well as the validity computations, are based on at least eleven ergonomists, and that the intra-observer reliability computations are based on at least six ergonomists (for most of the computations, eight to ten ergonomists). It is not uncommon that reliability computations are based on fewer observers [27,33,35,39]. Further, all ergonomists were experienced within their occupation, had previous experience of risk assessment assignments and performed risk assessments regularly. A possibility in the present study would have been to also ask the ergonomists to provide subjective ratings of the different methods with regards to usability and feasibility, in relation to observation of different work tasks (postures, movements and body parts). However, this was outside of the scope of the present study.
The group of observers in the present study consisted of solely women, and any systematic gender difference in the assessments could not be investigated, meaning that if there would be a gender dependence where women constantly rate items in the methods higher or lower, it would have influenced the reliability parameters in a negative way. However, the observers represent rather well the population of ergonomists in Sweden, where approximately 80% of the ergonomists with a professional background within physiotherapy are women [34].
Normally, in real life, the ergonomist meets the individual worker at the workplace and hence has the opportunity to investigate the workplace in more detail and to question the worker concerning additional information that adds to the risk assessment. Moreover, the ergonomist is then often able to observe the worker from different angles and for a longer period of time, which gives a more comprehensive representation of the work postures and movements. In the present study, the ergonomists performed their risk assessments by watching videos of workers performing different work tasks, and the rather short length of the videos may have influenced the possibilities to observe postures and movements. Moreover, any interaction with the worker/workplace was not possible. To manage this limitation in the present study, the ergonomists were supplied with written work environment-related information concerning the exposures in different work tasks. Hence, all ergonomists received this very same information; if they had collected this information themselves as in real life, and then used the methods, the reliability would have been lower.

4.6. Future Research

The scope of the present study was to investigate the reliability of several risk assessment methods using the same material and calculating the same statistical parameters. Few studies have compared to what extent different methods agree as to the risk levels calculated [46,77,78]. Further, a review article by Joshi et al., 2019, comparing different observational methods, shows that the result of a risk assessment is dependent on what method is used, and that the correlation of outcome between different methods is weak [79]. There is still a need for more such studies, with several methods and the same material.
Further, based on the results in the present study concerning the low reliability of assessments of work postures and movements, especially regarding the wrist and hand, future research should focus on developing methods that can combine the accuracy of direct measurements with existing observational methods.

5. Conclusions

All methods’ total-risk linearly weighted kappa values (when all tasks were set to the same duration) were lower than 0.5 (0.15–0.45). Moreover, the concurrent validity values were in the same range with regards to total-risk linearly weighted kappa (0.31–0.54) Although these values are often considered as being fair to substantial, they mean agreements lower than 50% when the expected agreement by chance has been compensated for. Hence, the risk of misclassification is substantial. The intra-observer reliability was only somewhat higher (0.16–0.58). Regarding the ART and HARM methods, it is worth noting that the work task duration has a high impact in the risk level calculation, which should be considered in studies of reliability. This study indicates that when experienced ergonomists use observational risk assessment methods, the reliability is low. As seen in other studies, assessments of hand/wrist postures were especially difficult to rate. In light of these results, complementing observational risk assessments with technical methods may be needed, especially when evaluating the effects of ergonomics interventions.

Author Contributions

Conceptualization, T.N., P.J.J., K.E., K.K., P.L. and M.F.; methodology, T.N., P.J.J., K.E., K.K. and M.F.; validation, M.F.; formal analysis, T.N., X.F. and M.F.; investigation, T.N., I.-M.R., P.J.J., K.E., K.K., P.L. and M.F.; resources, T.N., K.E., K.K., P.L., X.F. and M.F.; data curation, T.N., I.-M.R., X.F. and M.F.; writing—original draft preparation, T.N., I.-M.R. and M.F.; writing—review and editing, T.N., I.-M.R., P.J.J., K.E., K.K., X.F., P.L. and M.F.; visualization, T.N. and M.F.; supervision, T.N. and M.F.; project administration, T.N., I.-M.R., K.E. and M.F.; funding acquisition, T.N., K.K. and M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Swedish Research Council for Health, Working Life and Welfare (Forte), Grant Number 1212-1202 and AFA Insurance, Grant Numbers 180098 and 180254.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the Regional Ethical Review Board in Stockholm, Sweden (project reference number 2013/308–31/3).

Informed Consent Statement

Informed oral and written consent was obtained from all subjects involved in the study.

Data Availability Statement

Data available on request due to ethical restrictions. The data presented in this study are available on request from the corresponding author. The data are not publicly available since the informants in the study have been guaranteed anonymity.

Acknowledgments

The authors wish to acknowledge the participating ergonomists for taking part in the study and performing such a vast number of risk assessments.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Examples of studies that has assessed the inter-observer reliability of one of five risk assessment methods included in the study; Assessment of repetitive tasks of the upper limbs (ART); Hand Arm Risk Assessment Method (HARM); Occupational Repetitive Actions of the Upper Limbs checklist (OCRA); Quick exposure check (QEC); Strain Index (SI). (No reliability studies were found for SWEA). ICC = Intra Class Correlation, K = Cohens kappa, % = Proportional Agreement, Kqw = Quadratically Weighted Kappa, Klw = Linearly Weighted Kappa.
Table A1. Examples of studies that has assessed the inter-observer reliability of one of five risk assessment methods included in the study; Assessment of repetitive tasks of the upper limbs (ART); Hand Arm Risk Assessment Method (HARM); Occupational Repetitive Actions of the Upper Limbs checklist (OCRA); Quick exposure check (QEC); Strain Index (SI). (No reliability studies were found for SWEA). ICC = Intra Class Correlation, K = Cohens kappa, % = Proportional Agreement, Kqw = Quadratically Weighted Kappa, Klw = Linearly Weighted Kappa.
MethodObserversWork Tasks
Assessed
Inter-Observer ReliabilityReference
ARTN = 2
Occupational health students
N = 14
Wood marquetry work tasks
Risk score: Occasion 1: ICC = 0.87; Occasion 2: ICC: 0.75 Roodbandi, Choobineh et al., 2015 [30]
ARTN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk Level: Kqw = 0.72; ICC = 0.77 (95%CI 0.70–0.82)
Risk score: Kqw = 0.67; ICC = 0.73 (95%CI 0.65–0.78)
Frequency/Repetition movement: Kqw = 0.79; ICC = 0.84 (95%CI 0.76–0.88)
Force: Kw = 0.71; ICC = 0.76 (95%CI 0.71–0.83)
Awkward postures: Kqw = 0.64; ICC = 0.70 (95%CI 0.65–0.74)
Additional factors: Kqw = 0.47; ICC = 0.58 (95%CI 0.51–0.65)
Motamedzade, Mohammadian et al., 2019 [31]
HARMN = 11
Occupational health practitioners
N = 5
Cutting human tissue, processing electric cord, cashier work, meat packing, microscope work
Risk score: ICC = 0.73
Force exertions: ICC = 0.47
Neck/shoulder postures: ICC = 0.36
Forearm/wrist postures: ICC = 0.12
Other factors: ICC = 0.55
Douwes and de Kraker 2014 [32]
OCRAN = 7
Occupational health researchers/graduate students
N = 21
Cheese production work tasks
Risk level: ICC = 0.80 (95%CI: 0.70–0.89)
Risk score: ICC = 0.68 (95%CI: 0.56–0.80)
Frequency of technical actions: ICC = 0.68 (95%CI: 0.54–0.81)
Force exertion: ICC = 0.42 (95%CI: 0.28–0.59)
Awkward posture/movement: ICC = 0.54 (95%CI: 0.39–0.69)
Additional factors: ICC = 0.21 (95%CI: 0.10–0.37)
Paulsen, Gallu et al., 2015 [33]
OCRAN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk level: Kqw = 0.68; ICC = 0.72 (95%CI 0.69–0.79)
Risk score: Kqw = 0.62; ICC = 0.66 (95%CI 0.50–0.71)
Frequency of technical actions: Kqw = 0.70; ICC= 0.73 (95%CI 0.58–0.80)
Force exertion: Kqw = 0.60; ICC = 0.63 (95%CI 0.53–0.69)
Awkward postures/movements: Kqw = 0.52; ICC = 0.56 (95%CI 0.47–0.61)
Additional factors: Kqw = 0.65; ICC = 0.68 (95%CI 0.63–0.74)
Motamedzade, Mohammadian et al., 2019 [31]
QECPhase 1
N = 18
Practitioners

Phase 2
N = 6
Practitioners
Phase 1
N = 18
Industrial static and dynamic work tasks (combinations of high repetition and low force, and low repetition with high force for both seated and standing postures were observed)

Phase 2
N = 3
Work tasks; cleaning a floor using a buffing machine, pipetting whilst standing at a laboratory bench, word processing
Phase 1
Back posture: % = 73; K = 0.33
Back movement: % = 71; K = 0.17
Shoulder/arm posture: % = 80; K = 0. 47
Shoulder/arm movement: % = 79; K = 0.38
Wrist/hand posture: % = 79; K= missing value
Wrist/hand movement: PA % = 76; K = 0.42
Neck posture: % = 65; K = 0.20
Phase 2
Back posture: % = 83
Back movement: % = 83
Shoulder/arm posture: % = 94
Shoulder/arm movement: % = 61
Wrist/hand posture: % = 100
Wrist/hand movement: % = 67
Neck posture: % = 67
David, Woods et al., 2008 [27]
QECN = 5
Physical therapists with previous experience in occupational health and safety

N = 107
Workers
N = 13
Textile sewing work tasks in a manufacturing plant
Total score: ICC = 0.86 (95%CI 0.82–0.90)
Back score: ICC = 0.70 (95%CI 0.63–0.77)
Shoulder/arm score: ICC = 0.73 (95%CI 0.66–0.80)
Wrist/hand score: ICC = 0.82 (95%CI 0.77–0.86)
Neck score: ICC = 0.62 (95%CI 0.54–0.70)
Comper, Costa et al., 2012 [35]
QECN = 7
Ergonomists within occupational health services
N = 51
Work tasks in automotive warehouse, hospital laundry, hospital kitchen, automotive assembly, real estate caretaker, hospital janitor
Back score: % = 78; Klw = 0.79; ICC = 0.94 (95%CI 0.80–0.93)
Shoulder/arm score: % = 71; Klw = 0.61; ICC = 0.83 (95%CI 0.70–0.91)
Wrist/hand score: % = 88; Klw = 0.83; ICC = 0.93 (95%CI 0.87–0.96)
Neck score: % = 86; Klw = 0.85; ICC = 0.95 (95%CI 0.91–0.97)
Oliv, Gustafsson et al., 2019 [36]
QECN = 2
Occupational health experts

N = 50
Workers
N = 1
Construction work task
Back score: ICC = 0.93 (95%CI 0.88–0.96)
Shoulder/arm score: ICC = 0.88 (95%CI 0.81–0.93)
Wrist/hand score: ICC = 0.88 (95%CI 0.81–0.93)
Neck score: ICC = 0.79 (95%CI 0.66–0.88)
Total score: ICC = 0.93 (95%CI 0.88–0.96)
Mokhtarinia, Abazarpour et al., 2020 [37]
QECN = 4
Occupational therapy students
N = 15
Static and dynamic work tasks in different occupations including healthcare professions, assistants and manual workers.
Total score: ICC = 0.71–0.97 Cheng and So 2014 [47]
SIN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk level: Kqw = 0.52; ICC = 0.54 (95%CI 0.49–0.61)
Risk score: Kqw = 0.44; ICC = 0.46 (95%CI 0.33–0.59)
Intensity of exertion: Kqw = 0.39; ICC = 0.44 (95%CI 0.34–0.48)
Duration of exertion: Kqw = 0.50; ICC = 0.53 (95%CI 0.47–0.60)
Efforts per minute: Kqw = 0.51; ICC = 0.55 (95%CI 0.46–0.62)
Hand/wrist posture: Kqw = 0.37; ICC = 0.42 (95%CI 0.37–0.45)
Speed of work: Kqw = 0.46; ICC = 0.50 (95%CI 0.45–0.54)
Motamedzade, Mohammadian et al. 2019 [31]
SIN = 15
9 ergonomists, 6 students

N teams = 5
3 raters per team
N = 73 for item ratings
N = 12 for total score ratings

Work tasks within manufacturing (e.g., product assembly), meat/poultry processing (e.g., meat, fat and skin cutting, trimming and ripping) and distal upper extremity-intensive material handling (e.g., manipulation of small- to medium-sized products/boxes
Individual raters
Risk score: ICC = 0.43 (95%CI 0.25–0.70)
Intensity of exertion: ICC = 0.77 (95%CI 0.63–0.90)
Duration of exertion: ICC = 0.80 (95%CI 0.67–0.91)
Efforts per minute: ICC = 0.81 (95%CI 0.68–0.92)
Hand/wrist posture: ICC = 0.66 (95%CI 0.45–0.88)
Speed of work: ICC = 0.81 (95%CI 0.64–0.94)
Team ratings
Risk score: ICC = 0.64 (95%CI 0.40–0.85)
Intensity of exertion: ICC = 0.81 (95%CI 0.65–0.92)
Duration of exertion: ICC = 0.87 (95%CI 0.76–0.95)
Efforts per minute: ICC = 0.88 (95%CI 0.76–0.95)
Hand/wrist posture: ICC = 0.48 (95%CI 0.18–0.81)
Speed of work: ICC = 0.93 (95%CI 0.83–0.98)
Stevens, Vos et al., 2004 [38]
SIN = 4
3 expert ergonomists
1 novice student
N = 125
Cyclic work tasks within manufacturing (e.g., equipment assembly, sawmill work and product testing), and health care (e.g., housekeeping, laundry work and office work).
Overall
Risk score: Klw = 0.41; Spearman r = 0.57
Intensity of exertion: Klw = 0.22; Spearman r = 0.28
Duration of exertion: Klw = 0.27; Spearman r = 0.37
Efforts per minute: Klw = 0.26; Spearman r = 0.40
Hand/wrist posture: Klw = 0.34; Spearman r = 0.49
Speed of work: Klw = 0.44; Spearman r = 0.62

Expert-Expert
Risk score: Klw = 0.49; Spearman r = 0.68
Intensity of exertion: Klw = 0.31; Spearman r = 0.38
Duration of exertion: Klw = 0.34; Spearman r = 0.50
Efforts per minute: Klw = 0.35; Spearman r = 0.49
Hand/wrist posture: Klw = 0.42; Spearman r = 0.64
Speed of work: Klw = 0.41; Spearman r = 0.57

Expert-Novice
Risk score: Klw = 0.27; Spearman r = 0.41
Intensity of exertion: Klw = 0.19; Spearman r = 0.46
Duration of exertion: Klw = 0.21; Spearman r = 0.34
Efforts per minute: Klw = 0.16; Spearman r = 0.34
Hand/wrist posture: Klw = 0.26; Spearman r = 0.26
Speed of work: Klw = 0.48; Spearman r = 0.64
Spielholz, Bao et al., 2008 [39]
SIN = 7
Occupational health researchers/graduate students
N = 21
Cheese production work tasks
Risk level: ICC = 0.54 (95%CI 0.40–0.70)
Risk score: ICC = 0.59 (95%CI 0.45–0.73)
Intensity of exertion: ICC = 0.39 (95%CI 0.24–0.56)
Duration of exertion: ICC = 0.40 (95%CI 0.25–0.57)
Efforts per minute: ICC = 0.60 (95%CI 0.46–0.74)
Hand/wrist posture: ICC = 0.16 (95%CI 0.06–0.31)
Speed of work: ICC = 0.30 (95%CI 0.18–0.47)
Paulsen, Gallu et al., 2015 [33]
Table A2. Examples of studies that has assessed the intra-observer reliability of one of five risk assessment methods included in the study; Assessment of repetitive tasks of the upper limbs (ART); Hand Arm Risk Assessment Method (HARM); Occupational Repetitive Actions of the Upper Limbs checklist (OCRA); Quick exposure check (QEC); Strain Index (SI). (No reliability studies were found for SWEA). ICC = Intra Class Correlation, K = Cohens kappa, % = Proportional Agreement, Kqw = Quadratically Weighted Kappa, Klw = Linearly Weighted Kappa.
Table A2. Examples of studies that has assessed the intra-observer reliability of one of five risk assessment methods included in the study; Assessment of repetitive tasks of the upper limbs (ART); Hand Arm Risk Assessment Method (HARM); Occupational Repetitive Actions of the Upper Limbs checklist (OCRA); Quick exposure check (QEC); Strain Index (SI). (No reliability studies were found for SWEA). ICC = Intra Class Correlation, K = Cohens kappa, % = Proportional Agreement, Kqw = Quadratically Weighted Kappa, Klw = Linearly Weighted Kappa.
MethodObserversWork Tasks
Assessed
Intra-Observer ReliabilityReference
ARTN = 2
Occupational health students
N = 14
Wood marquetry work tasks
Risk score: Rater 1: ICC = 0.84; Rater 2: ICC = 0.99Roodbandi, Choobineh et al., 2015 [30]
ARTN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk Level: Kqw = 0.82; ICC = 0.90 (95%CI 0.85–0.94)
Risk score: Kqw = 0.76; ICC = 0.81 (95%CI 0.77–0.85)
Frequency/Repetition movement: Kqw = 0.85; ICC = 0.92 (95%CI 0.86–0.95)
Force: Kqw = 0.81; ICC = 0.86 (95%CI 0.81–0.89)
Awkward postures: Kqw = 0.77; ICC = 0.80 (95%CI 0.75–0.86)
Additional factors: Kqw = 0.75; ICC = 0.78 (95%CI 0.71–0.85)
Motamedzade, Mohammadian et al., 2019 [31]
OCRAN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk level: Kqw = 0.79; ICC = 0.85 (95%CI 0.79–0.89)
Risk score: Kqw = 0.68; ICC = 0.76 (95%CI 0.67–0.83)
Frequency of technical actions: Kqw = 0.84; ICC = 0.90 (95%CI 0.86–0.93)
Force exertion: Kqw = 0.70; ICC = 0.74 (95%CI 0.69–0.78)
Awkward postures/movements: Kqw = 0.74; ICC = 0.80 (95%CI 0.75–0.82)
Additional factors: Kqw = 0.82; ICC = 0.88 (95%CI 0.82–0.91)
Motamedzade, Mohammadian et al., 2019 [31]
QECPhase 1
N = 8
Practitioners
Phase 1
N = 18
Industrial static and dynamic work tasks (combinations of high repetition and low force, and low repetition with high force for both seated and standing postures were observed)
Phase 1
Back posture: PA % = 73; K = 0.52; Spearman r = 0.66
Back movement: PA % = 76; K = 0.50; Spearman r = 0.66
Shoulder/arm posture: PA % = 70; K = 0.50; Spearman r = 0.62
Shoulder/arm movement: PA % = 74; K = 0.53; Spearman r= 0.64
Wrist/hand posture: PA % = 77; K = 0.45; Spearman r = 0.45
Wrist/hand movement: PA % = 68; K = 0.50; Spearman r = 0.69
Neck posture: PA % = 67; K = 0.48; Spearman r = 0.58
David, Woods et al., 2008 [27]
QECN = 5
Physical therapists with previous experience in occupational
health and safety
N = 107 workers
N = 13
Textile sewing work tasks in a manufacturing plant
Total score: ICC = 0.41–0.60
Back score: ICC = 0.40–0.57
Shoulder/arm score: ICC = 0.19–0.61
Wrist/hand score: ICC = 0.35–0.49
Neck score: ICC = 0.16–0.58
Comper, Costa et al., 2012 [35]
QECN = 1
Occupational health expert
N = 30
Workers
N = 1
Construction work task
Total score: ICC = 0.89 (95%CI 0.79–0.95)
Back score: ICC = 0.87 (95%CI 0.74–0.93)
Shoulder/arm score: ICC = 0.79 (95%CI 0.61–0.89)
Wrist/hand score: ICC = 0.86 (95%CI 0.72–0.93)
Neck score: ICC = 0.74 (95%CI 0.52–0.86)
Mokhtarinia, Abazarpour et al., 2020 [37]
QECN = 1
Physician
N = 20
Workers
N = 3
Hospital cleaning work tasks: window cleaning, floor cleaning, cleaning floor with buffing machine
Back score: ICC = 0.806
Shoulder/arm score: ICC = 0.767
Wrist/hand score: ICC = 0.845
Neck score: ICC = 0.600
Back posture score: ICC = 0.902
Back movement score: ICC = 0.668
Shoulder/arm posture: ICC = score 0.768
Shoulder arm movement score: ICC = 0.791
Wrist/hand posture score: ICC = 1
Wrist/hand movement score: ICC = 0.877
Neck posture score: ICC = 0.608
Ozcan, Kesiktaş et al., 2008 [63]
SIN = 9
Ergonomists > 2 years of ergonomic risk assessment
N = 30
Work tasks within poultry slaughtering, assembling and manufacturing aluminium containers
Risk level: Kqw = 0.76; ICC = 0.82 (95%CI 0.77–0.87)
Risk score: Kqw = 0.72; ICC = 0.65 (95%CI 0.58–0.71)
Intensity of exertion: Kqw = 0.71; ICC = 0.83 (95%CI 0.76–0.88)
Duration of exertion: Kqw = 0.80; ICC = 0.86 (95%CI 0.81–0.90)
Efforts per minute: Kqw = 0.78; ICC = 0.85 (95%CI 0.80–0.89)
Hand/wrist posture: Kqw = 0.77; ICC = 0.72 (95%CI 0.67–0.78)
Speed of work: Kqw = 0.82; ICC = 0.86 (95%CI 0.78–0.90)
Motamedzade, Mohammadian et al., 2019 [31]
SIN = 14
9 ergonomists
6 students
(one drop-out, only 14 in test-retest analyses)
N teams = 5
3 raters per team
N = 73 for item ratings
N = 12 for total score ratings

Work tasks within manufacturing (e.g., product assembly), meat/poultry processing (e.g., meat, fat and skin cutting, trimming and ripping) and distal upper extremity intensive material handling (e.g., manipulation of small to medium sized products/boxes
Individual raters
Risk score: ICC = 0.56 (95%CI 0.45–0.67)
Intensity of exertion: ICC = 0.90 (95%CI 0.87–0.92)
Duration of exertion: ICC = 0.90 (95%CI 0.87–0.93)
Efforts per minute: ICC = 0.92 (95%CI 0.90–0.94)
Hand/wrist posture: ICC = 0.82 (95%CI 0.76–0.87)
Speed of work: ICC = 0.90 (95%CI 0.85–0.93)

Team ratings
Risk score: ICC = 0.82 (95%CI 0.72–0.89)
Intensity of exertion: ICC = 0.93 (95%CI 0.88–0.95)
Duration of exertion: ICC = 0.87 (95%CI 0.80–0.92)
Efforts per minute: ICC = 0.90 (95%CI 0.84–0.94)
Hand/wrist posture: ICC = 0.66 (95%CI 0.46–0.80)
Speed of work: ICC = 0.92 (95%CI 0.87–0.96)
Stephens, Vos et al., 2006 [40]

Appendix B

Table A3. The inter-observer reliability for the items (posture and movements/repetition) in each method that were rated by the ergonomists (in the first session) from the video-recorded work tasks. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC).
Table A3. The inter-observer reliability for the items (posture and movements/repetition) in each method that were rated by the ergonomists (in the first session) from the video-recorded work tasks. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC).
Inter-Observer Reliability
MethodNAssessment%KKlwICCKCC
ART LEFT11Arm movements0.620.330.390.460.52
Arm/hand repetition0.530.290.370.470.59
Head/neck posture0.520.250.340.450.59
Back posture0.560.190.210.300.40
Arm posture0.660.330.340.310.43
Wrist posture0.590.180.170.160.27
Hand/finger grip0.460.120.160.220.37
ART RIGHT11Arm movements0.690.230.250.370.47
Arm/hand repetition0.640.360.440.570.69
Head/neck posture0.510.250.330.440.56
Back posture0.550.170.210.310.41
Arm posture0.600.270.320.350.39
Wrist posture0.600.190.210.250.42
Hand/finger grip0.550.270.300.400.50
HARM12Force exertions0.450.320.300.380.46
Neck/Shoulder posture0.430.150.250.260.41
Forearm/wrist posture0.330.090.140.180.30
OCRA11Movement repetition0.230.140.350.630.75
Force0.420.230.400.510.66
Shoulder posture0.540.220.280.350.50
Elbow movement0.370.050.030.010.14
Wrist posture0.490.130.140.150.30
Grip0.430.180.250.370.43
Repetitiveness0.610.420.530.660.77
QEC12Back posture0.540.290.340.440.49
Back movements0.350.170.260.340.48
Shoulder/arm posture0.610.360.390.450.55
Shoulder/arm movements0.570.190.210.240.37
Hand/wrist posture0.700.170.170.190.31
Hand/wrist movements0.590.300.440.560.61
Neck posture0.740.370.390.450.50
SI 112Force % work cycle0.480.230.260.380.54
Efforts per minute0.570.280.420.520.66
Hand/wrist posture0.470.150.170.200.32
SWEA10Neck posture0.510.170.220.320.45
Shoulder/arm posture0.560.160.210.290.40
Back posture0.600.120.160.250.34
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that the highest number of ergonomists had rated (i.e., the right hand in all tasks but meat netting) was used for the reliability computations for the rated items.

References

  1. De Kok, J.; Vroonhof, P.; Snijders, J.; Roullis, G.; Clarke, M.; Peereboom, K.; van Dorst, P.; Isusi, I. Work-related musculoskeletal disorders: Prevalence, costs and demographics in the EU. In Eupropean Risk Observatory; European Agency for Safety and Health at Work—EU-OSHA: Luxembourg, 2019. [Google Scholar]
  2. Lotters, F.; Meerding, W.J.; Burdorf, A. Reduced productivity after sickness absence due to musculoskeletal disorders and its relation to health outcomes. Scand. J. Work Environ. Health 2005, 31, 367–374. [Google Scholar] [CrossRef] [PubMed]
  3. Nyman, T.; Grooten, W.J.; Wiktorin, C.; Liwing, J.; Norrman, L. Sickness absence and concurrent low back and neck-shoulder pain: Results from the MUSIC-Norrtalje study. Eur. Spine J. 2007, 16, 631–638. [Google Scholar] [CrossRef] [PubMed]
  4. Bevan, S. Economic impact of musculoskeletal disorders (MSDs) on work in Europe. Best Pract. Res. Clin. Rheumatol. 2015, 29, 356–373. [Google Scholar] [CrossRef] [PubMed]
  5. Summers, K.; Jinnett, K.; Bevan, S. Musculoskeletal Disorders, Workforce Health and Productivity in the United States; The Center for Workforced Health and Performance, Lancaster University: London, UK, 2015. [Google Scholar]
  6. Van Rijn, R.M.; Huisstede, B.M.; Koes, B.W.; Burdorf, A. Associations between work-related factors and specific disorders of the shoulder—A systematic review of the literature. Scand. J. Work Environ. Health 2010, 36, 189–201. [Google Scholar] [CrossRef]
  7. Van Rijn, R.M.; Huisstede, B.M.; Koes, B.W.; Burdorf, A. Associations between work-related factors and specific disorders at the elbow: A systematic literature review. Rheumatology 2009, 48, 528–536. [Google Scholar] [CrossRef]
  8. Lang, J.; Ochsmann, E.; Kraus, T.; Lang, J.W. Psychosocial work stressors as antecedents of musculoskeletal problems: A systematic review and meta-analysis of stability-adjusted longitudinal studies. Soc. Sci. Med. 2012, 75, 1163–1174. [Google Scholar] [CrossRef]
  9. Palmer, K.T.; Harris, E.C.; Coggon, D. Carpal tunnel syndrome and its relation to occupation: A systematic literature review. Occup. Med. 2007, 57, 57–66. [Google Scholar] [CrossRef]
  10. Bongers, P.M.; Ijmker, S.; van den Heuvel, S.; Blatter, B.M. Epidemiology of work related neck and upper limb problems: Psychosocial and personal risk factors (part I) and effective interventions from a bio behavioural perspective (part II). J. Occup. Rehabil. 2006, 16, 279–302. [Google Scholar] [CrossRef]
  11. Nordander, C.; Ohlsson, K.; Akesson, I.; Arvidsson, I.; Balogh, I.; Hansson, G.A.; Stromberg, U.; Rittner, R.; Skerfving, S. Risk of musculoskeletal disorders among females and males in repetitive/constrained work. Ergonomics 2009, 52, 1226–1239. [Google Scholar] [CrossRef]
  12. Nordander, C.; Ohlsson, K.; Akesson, I.; Arvidsson, I.; Balogh, I.; Hansson, G.A.; Stromberg, U.; Rittner, R.; Skerfving, S. Exposure-response relationships in work-related musculoskeletal disorders in elbows and hands—A synthesis of group-level data on exposure and response obtained using uniform methods of data collection. Appl. Ergon. 2013, 44, 241–253. [Google Scholar] [CrossRef]
  13. Tompa, E.; Dolinschi, R.; de Oliveira, C.; Amick, B.C., III; Irvin, E. A systematic review of workplace ergonomic interventions with economic analyses. J. Occup. Rehabil. 2010, 20, 220–234. [Google Scholar] [CrossRef] [PubMed]
  14. Driessen, M.T.; Proper, K.I.; van Tulder, M.W.; Anema, J.R.; Bongers, P.M.; van der Beek, A.J. The effectiveness of physical and organisational ergonomic interventions on low back pain and neck pain: A systematic review. Occup. Environ. Med. 2010, 67, 277–285. [Google Scholar] [CrossRef] [PubMed]
  15. European Council. Council Directive 89/391/EEC of 12 June 1989 on the Introduction of Measures to Encourage Improvements in the Safety and Health of Workers at Work; European Agency for Safety and Health at Work: Bilbao, Spain, 1989. [Google Scholar]
  16. Dahlqvist, C.; Hansson, G.A.; Forsman, M. Validity of a small low-cost triaxial accelerometer with integrated logger for uncomplicated measurements of postures and movements of head, upper back and upper arms. Appl. Ergon. 2016, 55, 108–116. [Google Scholar] [CrossRef]
  17. Yang, L.; Grooten, W.J.A.; Forsman, M. An iPhone application for upper arm posture and movement measurements. Appl. Ergon. 2017, 65, 492–500. [Google Scholar] [CrossRef]
  18. Eliasson, K.; Lind, C.M.; Nyman, T. Factors influencing ergonomists’ use of observation-based risk-assessment tools. Work 2019, 64, 93–106. [Google Scholar] [CrossRef]
  19. Wells, R.P.; Neumann, W.P.; Nagdee, T.; Theberge, N. Solution Building Versus Problem Convincing: Ergonomists Report on Conducting Workplace Assessments. IIE Trans. Occup. Ergon. Hum. Factors 2013, 1, 50–65. [Google Scholar] [CrossRef]
  20. Whysall, Z.J.; Haslam, R.A.; Haslam, C. Processes, barriers, and outcomes described by ergonomics consultants in preventing work-related musculoskeletal disorders. Appl. Ergon. 2004, 35, 343–351. [Google Scholar] [CrossRef] [PubMed]
  21. Eliasson, K.; Palm, P.; Nyman, T.; Forsman, M. Inter- and intra- observer reliability of risk assessment of repetitive work without an explicit method. Appl. Ergon. 2017, 62, 1–8. [Google Scholar] [CrossRef]
  22. Takala, E.P.; Pehkonen, I.; Forsman, M.; Hansson, G.A.; Mathiassen, S.E.; Neumann, W.P.; Sjogaard, G.; Veiersted, K.B.; Westgaard, R.H.; Winkel, J. Systematic evaluation of observational methods assessing biomechanical exposures at work. Scand. J. Work Environ. Health 2010, 36, 3–24. [Google Scholar] [CrossRef]
  23. Graben, P.R.; Schall, M.C., Jr.; Gallagher, S.; Sesek, R.; Acosta-Sojo, Y. Reliability Analysis of Observation-Based Exposure Assessment Tools for the Upper Extremities: A Systematic Review. Int. J. Environ. Res. Public Health 2022, 19, 10595. [Google Scholar] [CrossRef]
  24. Health and Safety Executive. Upper Limb Disorders in the Workplace, 2nd ed.; HSE Books: Sudbury, UK, 2002. [Google Scholar]
  25. Douwes, M.; de Kraker, H. HARM overview and its application: Some practical examples. Work 2012, 41 (Suppl. S1), 4004–4009. [Google Scholar] [CrossRef] [PubMed]
  26. Occhipinti, E.; Colombini, D. A Checklist for Evaluating Exposure to Repetitive Movements of the Upper Limbs Based on the OCRA Index. In International Encyclopedia of Ergonomics and Human Factors, 2nd ed.; Karwowski, W., Ed.; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
  27. David, G.; Woods, V.; Li, G.; Buckle, P. The development of the Quick Exposure Check (QEC) for assessing exposure to risk factors for work-related musculoskeletal disorders. Appl. Ergon. 2008, 39, 57–69. [Google Scholar] [CrossRef] [PubMed]
  28. Moore, J.S.; Garg, A. The Strain Index: A proposed method to analyze jobs for risk of distal upper extremity disorders. Am. Ind. Hyg. Assoc. J. 1995, 56, 443–458. [Google Scholar] [CrossRef] [PubMed]
  29. Ferreira, J.; Gray, M.; Hunter, L.; Birtles, M.; Riley, D. Development of an Assessment Tool for Repetitive Tasks of the Upper Limbs (ART). In Research Report RR707; Health and Safety Executive: Derbyshire, UK, 2009. [Google Scholar]
  30. Roodbandi, A.J.; Choobineh, A.; Feyzi, V. The Investigation of Intrarater and Inter-rater Agreement in Assessment of Repetitive Task (ART) as an Ergonomic Method. Occup. Med. Health Aff. 2015, 3, 1–5. [Google Scholar] [CrossRef]
  31. Motamedzade, M.; Mohammadian, M.; Faradmal, J. Investigating Intra-Rater and Inter-Rater Reliability of Three Upper-Limb Risk Assessment Methods. Iran. J. Health Saf. Environ. 2019, 6, 1267–1271. [Google Scholar]
  32. Douwes, M.; de Kraker, H. Development of a non-expert risk assessment method for hand-arm related tasks (HARM). Int. J. Ind. Ergon. 2014, 44, 316–327. [Google Scholar] [CrossRef]
  33. Paulsen, R.; Gallu, T.; Gilkey, D.; Reiser, R., II; Murgia, L.; Rosecrance, J. The inter-rater reliability of Strain Index and OCRA Checklist task assessments in cheese processing. Appl. Ergon. 2015, 51, 199–204. [Google Scholar] [CrossRef]
  34. Rhén, I.-M.; Forsman, M. Inter- and intra-rater reliability of the OCRA checklist method in video-recorded manual work tasks. Appl. Ergon. 2020, 84, 103025. [Google Scholar] [CrossRef]
  35. Comper, M.L.; Costa, L.O.; Padula, R.S. Clinimetric properties of the Brazilian-Portuguese version of the Quick Exposure Check (QEC). Rev. Bras. Fisioter. 2012, 16, 487–494. [Google Scholar] [CrossRef]
  36. Oliv, S.; Gustafsson, E.; Baloch, A.N.; Hagberg, M.; Sandén, H. The Quick Exposure Check (QEC)—Inter-rater reliability in total score and individual items. Appl. Ergon. 2019, 76, 32–37. [Google Scholar] [CrossRef]
  37. Mokhtarinia, H.R.; Abazarpour, S.; Gabel, C.P. Validity and reliability of the Persian version of the Quick Exposure Check (QEC) in Iranian construction workers. Work 2020, 67, 387–394. [Google Scholar] [CrossRef] [PubMed]
  38. Stevens, E.M., Jr.; Vos, G.A.; Stephens, J.P.; Moore, J.S. Inter-rater reliability of the strain index. J. Occup. Environ. Hyg. 2004, 1, 745–751. [Google Scholar] [CrossRef]
  39. Spielholz, P.; Bao, S.; Howard, N.; Silverstein, B.; Fan, J.; Smith, C.; Salazar, C. Reliability and validity assessment of the hand activity level threshold limit value and strain index using expert ratings of mono-task jobs. J. Occup. Environ. Hyg. 2008, 5, 250–257. [Google Scholar] [CrossRef] [PubMed]
  40. Stephens, J.-P.; Vos, G.A.; Stevens, E.M.; Steven Moore, J. Test–retest repeatability of the Strain Index. Appl. Ergon. 2006, 37, 275–281. [Google Scholar] [CrossRef]
  41. Cicchetti, D.V. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess. 1994, 6, 284. [Google Scholar] [CrossRef]
  42. Koo, T.K.; Li, M.Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
  43. Occhipinti, E.; Colombini, D. Updating reference values and predictive models of the OCRA method in the risk assessment of work-related musculoskeletal disorders of the upper limbs. Ergonomics 2007, 50, 1727–1739. [Google Scholar] [CrossRef]
  44. Occhipinti, E. OCRA: A concise index for the assessment of exposure to repetitive movements of the upper limbs. Ergonomics 1998, 41, 1290–1311. [Google Scholar] [CrossRef] [PubMed]
  45. David, G.; Woods, V.; Buckle, P. Further Development of the Usability and Validity of the Quick Exposure Check; HSE Research Report 211; HSE Books: Sudbury, UK, 2005. [Google Scholar]
  46. Brown, R.; Li, G. The development of action levels for the ‘Quick Exposure Check’ (QEC) system. In Contemoprary Ergonomics 2003; McCabe, P.T., McCabe, P.T., Eds.; Taylor & Francis: London, UK, 2003; pp. 41–46. [Google Scholar]
  47. Cheng, A.S.; So, P.C. Development of the Chinese version of the Quick Exposure Check (CQEC). Work 2014, 48, 503–510. [Google Scholar] [CrossRef]
  48. Garg, A.; Moore, J.S.; Kapellusch, J.M. The Revised Strain Index: An improved upper extremity exposure assessment model. Ergonomics 2017, 60, 912–922. [Google Scholar] [CrossRef]
  49. Nordiska Ministerrådet. Vägar Till Färre Arbetsskador—Utveckling av Nordisk Ergonomitillsyn, Modeller för Ergonomisk Riskvärdering TemaNord 1994:514; Nordiska Ministerrådet: Copenhagen, Denmark, 1994. [Google Scholar]
  50. Eliasson, K.; Forsman, M.; Nyman, T. Exploring ergonomists experiences after participation in a theoretical and practical research project in observational risk assessment tools. Int. J. Occup. Saf. Ergon. 2021, 28, 1136–1144. [Google Scholar] [CrossRef]
  51. Borg, G. Borg’s Perceived Exertion and Pain Scales; Human Kinetics: Champaign, IL, USA, 1998. [Google Scholar]
  52. Streiner, D.L.; Norman, G.R.; Cairney, J. Health Measurement Scales: A Practical Guide to Their Development and Use; Oxford University Press: Oxford, UK, 2015. [Google Scholar] [CrossRef]
  53. Kjellberg, K.; Lindberg, P.; Nyman, T.; Palm, P.; Rhen, I.-M.; Eliasson, K.; Carlsson, R.; Balliu, N.; Forsman, M. Comparisons of six observational methods for risk assessment of repetitive work—Results from a consensus assessment. In Proceedings of the 19th Triennial Congress of the International Ergonomics Association, Melbourne, VIC, Australia, 9–14 August 2015. [Google Scholar]
  54. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
  55. Davies, M.; Fleiss, J.L. Measuring Agreement for Multinomial Data. Biometrics 1982, 38, 1047–1051. [Google Scholar] [CrossRef]
  56. Warrens, M.J. Conditional inequalities between Cohen’s kappa and weighted kappas. Stat. Methodol. 2013, 10, 14–22. [Google Scholar] [CrossRef]
  57. Cohen, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 1968, 70, 213–220. [Google Scholar] [CrossRef] [PubMed]
  58. Hallgren, K.A. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor. Quant. Methods Psychol. 2012, 8, 23–34. [Google Scholar] [CrossRef]
  59. Sawa, J.; Morikawa, T. Interrater Reliability for Multiple Raters in Clinical Trials of Ordinal Scale. Drug Inf. J. 2007, 41, 595–605. [Google Scholar] [CrossRef]
  60. Shrout, P.E.; Fleiss, J.L. Intraclass Correlations : Uses in Assessing Rater Reliability. Psychol. Bull. 1979, 86, 420–428. [Google Scholar] [CrossRef]
  61. McDowell, I. Measuring Health: A Guide to Rating Scales and Questionnaires; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
  62. Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef]
  63. Ozcan, E.E.; Kesiktaş, N.; Alptekin, K.; Ozcan, E.E. The reliability of Turkish translation of quick exposure check (QEC) for risk assessment of work related musculoskeletal disorders. J. Back Musculoskelet. Rehabil. 2008, 21, 51–56. [Google Scholar] [CrossRef]
  64. Palm, P.; Josephson, M.; Mathiassen, S.E.; Kjellberg, K. Reliability and criterion validity of an observation protocol for working technique assessments in cash register work. Ergonomics 2016, 59, 829–839. [Google Scholar] [CrossRef]
  65. Dartt, A.; Rosecrance, J.; Gerr, F.; Chen, P.; Anton, D.; Merlino, L. Reliability of assessing upper limb postures among workers performing manufacturing tasks. Appl. Ergon. 2009, 40, 371–378. [Google Scholar] [CrossRef]
  66. Bao, S.; Howard, N.; Spielholz, P.; Silverstein, B.; Polissar, N. Interrater Reliability of Posture Observations. Hum. Factors 2009, 51, 292–309. [Google Scholar] [CrossRef]
  67. Sim, J.; Wright, C.C. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys. Ther. 2005, 85, 257–268. [Google Scholar] [CrossRef]
  68. Eliasson, K.; Palm, P.; Nordander, C.; Dahlgren, G.; Lewis, C.; Hellman, T.; Svartengren, M.; Nyman, T. Study Protocol for a Qualitative Research Project Exploring an Occupational Health Surveillance Model for Workers Exposed to Hand-Intensive Work. Int. J. Environ. Res. Public Health 2020, 17, 6400. [Google Scholar] [CrossRef] [PubMed]
  69. European Union Information Agency for Occupational Safety and Health (EU-OSHA). Identifying Ill Health through Health Surveillance. 2021. Available online: https://osha.europa.eu/en/themes/work-related-diseases/health-surveillance (accessed on 2 April 2023).
  70. Mathiassen, S.E.; Liv, P.; Wahlström, J. Cost-efficient measurement strategies for posture observations based on video recordings. Appl. Ergon. 2013, 44, 609–617. [Google Scholar] [CrossRef] [PubMed]
  71. Denis, D.; Lortie, M.; Rossignol, M. Observation Procedures Characterizing Occupational Physical Activities: Critical Review. Int. J. Occup. Saf. Ergon. 2000, 6, 463–491. [Google Scholar] [CrossRef]
  72. Fagarasanu, M.; Kumar, S. Measurement instruments and data collection: A consideration of constructs and biases in ergonomics research. Int. J. Ind. Ergon. 2002, 30, 355–369. [Google Scholar] [CrossRef]
  73. Brenner, H.; Kliebsch, U. Dependence of weighted kappa coefficients on the number of categories. Epidemiology 1996, 7, 199–202. [Google Scholar] [CrossRef] [PubMed]
  74. American Educational Research Association. Standards for Educational and Psychological Testing; American Educational Research Association: Washington, DC, USA, 2014. [Google Scholar]
  75. Mokkink, L.B.; Terwee, C.B.; Patrick, D.L.; Alonso, J.; Stratford, P.W.; Knol, D.L.; Bouter, L.M.; de Vet, H.C.W. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J. Clin. Epidemiol. 2010, 63, 737–745. [Google Scholar] [CrossRef]
  76. De Vet, H.C.; Terwee, C.B.; Knol, D.L.; Bouter, L.M. When to use agreement versus reliability measures. J. Clin. Epidemiol. 2006, 59, 1033–1039. [Google Scholar] [CrossRef]
  77. Chiasson, M.-È.; Imbeau, D.; Aubry, K.; Delisle, A. Comparing the results of eight methods used to evaluate risk factors associated with musculoskeletal disorders. Int. J. Ind. Ergon. 2012, 42, 478–488. [Google Scholar] [CrossRef]
  78. Garg, A.; Kapellusch, J.; Hegmann, K.; Wertsch, J.; Merryweather, A.; Deckow-Schaefer, G.; Malloy, E.J.; Wistah Hand Study Research, T. The Strain Index (SI) and Threshold Limit Value (TLV) for Hand Activity Level (HAL): Risk of carpal tunnel syndrome (CTS) in a prospective cohort. Ergonomics 2012, 55, 396–414. [Google Scholar] [CrossRef] [PubMed]
  79. Joshi, M.; Deshpande, V. A systematic review of comparative studies on ergonomic assessment techniques. Int. J. Ind. Ergon. 2019, 74, 102865. [Google Scholar] [CrossRef]
Figure 1. The results of the ten ergonomists (Erg A–J) that performed the risk level assessments using QEC (four levels) of the ten work tasks (WT) in both the first and second session. The colours refer to the different risk levels given in QEC. Green = Low, Yellow = Moderate, Orange = High, and Red = Very high. Bold and underlined numbers in the second session denote a difference from their first session result.
Figure 1. The results of the ten ergonomists (Erg A–J) that performed the risk level assessments using QEC (four levels) of the ten work tasks (WT) in both the first and second session. The colours refer to the different risk levels given in QEC. Green = Low, Yellow = Moderate, Orange = High, and Red = Very high. Bold and underlined numbers in the second session denote a difference from their first session result.
Ijerph 20 05505 g001
Table 1. Descriptions of the ten different video-recorded work tasks in the study.
Table 1. Descriptions of the ten different video-recorded work tasks in the study.
Work TaskTask ActivityHours 1 per WorkdayHandled Goods (kg)Environment,
Physical Factors
Discomfort (CR-10)Work Demands and Control 2
1Unpacking groceries to shelves in a supermarket storejust above 42Good3Partly autonomy
2Putting nets around roasts at a slaughterhousejust above 42.5–4.5Cold, wet, noisy4Group autonomy
3Throwing small boxes into containers (post sorting) just above 23Cold during winter, warm during summer, noisy, difficulty concentrating3–4Controlled
4Putting bundles of letters into boxes (post sorting) approx. 62Cold during winter, warm during summer, noisy, difficulty concentrating3–4Controlled
5Deboning meat at a slaughterhouseapprox. 73–4Cold, wet, noisy, sharp knives3–4Group autonomy
6Assembling enginesjust under 32Good2.5Controlled
7Cutting hairjust above 41Good3Autonomy
8Cleaning lavatoriesapprox. 51Good2Partly autonomy
9Supermarket cashier workapprox. 71–5Good3Controlled
10Cleaning stairsjust under 41Usually good, sometimes cold3Partly autonomy
1 Pre-set task duration. 2 Autonomy: The worker controls the work himself/herself as if self-employed. Partly autonomy: The worker controls the work task but is limited in time and by obligations of other work tasks included in the work. Group autonomy: a group of employees control and divide work tasks within the group. Controlled: The work task is completely time-controlled by work instructions and space-controlled by the physical design of the workplace.
Table 2. Description of body regions where posture and movements/repetitions were assessed from the video and rated by the ergonomists, for each of the six methods.
Table 2. Description of body regions where posture and movements/repetitions were assessed from the video and rated by the ergonomists, for each of the six methods.
PostureMovement/Repetition
BackNeckShoulder/
Arm
ElbowWrist/
Hand
BackNeckShoulder
/Arm
ElbowWrist/
Hand
ARTXXX X X X
HARM XX X XXXX
OCRA XXX XXX
QECXXX XX X X
SI X X
SWEAXXX X X X
Table 3. Number of ergonomists that completed the risk assessments in the first session and in the second session (in brackets), number of items rated in each method, total number of performed item ratings and total number of risk level assessments, and the distribution, in percent, of risk levels in the assessments in the different risk levels stipulated in each of the methods, from “1” = lowest risk to “5” highest risk. The risk level distribution for the expert group is shown in bold print.
Table 3. Number of ergonomists that completed the risk assessments in the first session and in the second session (in brackets), number of items rated in each method, total number of performed item ratings and total number of risk level assessments, and the distribution, in percent, of risk levels in the assessments in the different risk levels stipulated in each of the methods, from “1” = lowest risk to “5” highest risk. The risk level distribution for the expert group is shown in bold print.
MethodNumber of
Ergonomists
Items
Rated
Performed
Ratings
Risk level
Assessments
Distribution, in Percent, of Risk Levels 1
From Low (1) to High (5)
12345
ART left arm11 (9)121320 (1080)110 (90)17 (17) 2044 (39) 5039 (44) 30--
ART right arm11 (9)121320 (1080)110 (90)8 (12) 1035 (32) 4056 (56) 50--
HARM12 (8)273240 (2160)120 (80)27 (25) 2048 (54) 7026 (21) 10--
OCRA11 (10)121320 (1200)110 (100)26 (28) 2015 (15) 2016 (19) 3033 (27) 209 (11) 10
QEC total12 (10)7840 (700)120 (100)2 (2) 1015 (11) 057 (59) 5027 (28) 40-
QEC Neck12 (10)2240 (200)120 (100)0 (0) 013 (14) 1054 (50) 6033 (36) 30-
QEC Shoulder12 (10)5600 (500)120 (100)2 (2) 1038 (39) 2060 (59) 700 (0) 0-
QEC Wrist12 (10)5600 (500)120 (100)0 (0) 033 (35) 3068 (65) 700 (0) 0-
QEC Back12 (10)6720 (600)120 (100)17 (14) 1038 (39) 3041 (44) 605 (3) 0-
SI highest score12 (10)6720 (600)120 (100)3 (6) 2015 (6) 083 (88) 80--
SWEA Overall Repetition12 (8)1120 (80)120 (80)8 (11) 1042 (41) 3051 (48) 60--
SWEA Overall Postures and movements12 (8)1120 (80)120 (80)20 (15) 2073 (70) 808 (15) 0--
SWEA Neck posture 210 (6)1100 (60)100 (60)31 (13) -55 (60) -14 (27) ---
SWEA Shoulder/arm Posture 210 (6)1100 (60)100 (60)26 (23) -69 (72) -5 (5) ---
SWEA Back posture 210 (6)1100 (60)100 (60)22 (10) -64 (82) -14 (8) ---
1 ART, HARM, SI, SWEA (three levels, 1–3); QEC (four levels 1–4); OCRA (five levels, 1–5). 2 No ratings were performed by the expert group.
Table 4. Inter-observer reliability of risk levels, with pre-given specific differing work task durations for the ten work tasks (See Table 2). The item ratings were first used to compute the risk scores for each method and were then transposed into risk levels according to the instructions for each of the methods. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks.
Table 4. Inter-observer reliability of risk levels, with pre-given specific differing work task durations for the ten work tasks (See Table 2). The item ratings were first used to compute the risk scores for each method and were then transposed into risk levels according to the instructions for each of the methods. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks.
Inter-Observer Reliability
MethodAssessmentN%KKlwICCKCC
ARTLeft Arm (3 levels)11680.500.580.700.77
Right Arm (3 levels)11780.590.650.750.72
HARMTotal (3 levels)12730.580.650.770.79
OCRATotal (5 levels)11390.210.430.620.65
QECTotal (4 levels)12680.460.550.690.72
Neck (4 levels)12910.850.870.900.92
Shoulder (4 levels)12710.420.440.510.57
Wrist (4 levels)12860.670.670.700.73
Back (4 levels)12570.350.490.670.70
SI 1Highest score (3 levels)12830.200.180.180.33
SWEAOverall repetition (3 levels)12580.260.300.390.48
Overall postures and movements (3 levels)12650.180.210.280.35
Neck posture (3 levels)10510.170.220.320.45
Shoulder/arm posture (3 levels)10560.160.210.290.40
Back posture (3 levels)10600.120.160.250.34
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that yielded the highest score was used for the inter-observer reliability computations.
Table 5. Inter-observer reliability of risk levels, with standardized work task duration, using 3 h 45 min for all ten work tasks. The item ratings were first used to compute the risk scores for each method and were then transposed into risk levels according to the instructions for each of the methods. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks.
Table 5. Inter-observer reliability of risk levels, with standardized work task duration, using 3 h 45 min for all ten work tasks. The item ratings were first used to compute the risk scores for each method and were then transposed into risk levels according to the instructions for each of the methods. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks.
Inter-Observer Reliability
MethodAssessmentN%KKlwICCKCC
ARTLeft Arm (3 levels)11600.320.420.570.66
Right Arm (3 levels)11620.340.410.550.56
HARMTotal (3 levels)12750.260.260.300.36
OCRATotal (5 levels)11420.250.450.640.71
QECTotal (4 levels)12760.390.420.510.56
Neck (4 levels)12910.820.820.840.86
Shoulder (4 levels)12630.290.330.440.52
Wrist (4 levels)12780.480.480.510.55
Back (4 levels)12590.240.300.400.50
SI 1Highest score (3 levels)12810.160.150.140.29
SWEA 2Overall repetition (3 levels)12580.260.300.390.48
Overall postures and movements (3 levels)12650.180.210.280.35
Neck posture (3 levels)10510.170.220.320.45
Shoulder/arm posture (3 levels)10560.160.210.290.40
Back posture (3 levels)10600.120.160.250.34
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that yielded the highest score was used for the inter-observer reliability computations. 2 For SWEA, the work task duration is not part of the methods, hence the results with and without standardised work task duration is the same.
Table 6. The inter-observer reliability, in the linearly weighted Kappa (Klw), for the items in each method that were rated by the ergonomists (in the first session) from the video-recorded work tasks. The item with the lowest Klw (Min item), and the item with the highest Klw (Max item) refer to the min and max columns.
Table 6. The inter-observer reliability, in the linearly weighted Kappa (Klw), for the items in each method that were rated by the ergonomists (in the first session) from the video-recorded work tasks. The item with the lowest Klw (Min item), and the item with the highest Klw (Max item) refer to the min and max columns.
MethodKlw Min–MaxMin ItemMax Item
ART0.17–0.44Wrist postureArm/hand repetition
HARM0.14–0.30Forearm/wrist postureForce exertions
OCRA0.03–0.53Elbow movementRepetitiveness
QEC0.17–0.44Hand/wrist postureHand/wrist movements
SI 10.17–0.42Hand/wrist postureEfforts per minute
SWEA0.16–0.22Back posture Neck posture
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that the highest number of ergonomists had rated (i.e., the right hand in all tasks but meat netting) was used for the reliability computations for the rated items.
Table 7. Intra-observer reliability of risk levels, with pre-given specific differing work task durations for the ten work tasks (See Table 2). Number of ergonomists (n), proportional agreement (%) mean Cohens kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels.
Table 7. Intra-observer reliability of risk levels, with pre-given specific differing work task durations for the ten work tasks (See Table 2). Number of ergonomists (n), proportional agreement (%) mean Cohens kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels.
Intra-Observer Reliability
MethodAssessmentn%KKlwICCKCC
ARTLeft Arm (3 levels)9740.590.650.740.88
Right Arm (3 levels)9790.620.680.780.86
HARMTotal (3 levels)10780.640.700.790.89
OCRATotal (5 levels)10450.290.520.720.85
QECTotal (4 levels)10770.600.680.790.88
Neck (4 levels)10920.870.880.920.96
Shoulder (4 levels)10780.570.580.620.83
Wrist (4 levels)10890.760.760.770.89
Back (4 levels)10670.490.600.740.87
SI 1Highest score (3 levels)10770.150.130.100.56
SWEAOverall repetition (3 levels)8680.410.470.560.80
Overall postures and movements (3 levels)8710.270.30 0.360.68
Neck posture (3 levels)6620.240.320.470.76
Shoulder/arm posture (3 levels)6670.090.130.200.60
Back posture (3 levels)6720.410.440.510.75
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that yielded the highest score was used for the intra-observer reliability computations.
Table 8. Intra-observer reliability of risk levels, with standardized work task duration, using 3 h 45 min for all ten work tasks. Number of ergonomists (n), proportional agreement (%) mean Cohens kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels.
Table 8. Intra-observer reliability of risk levels, with standardized work task duration, using 3 h 45 min for all ten work tasks. Number of ergonomists (n), proportional agreement (%) mean Cohens kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels.
Intra-Observer Reliability
MethodAssessmentn%KKlwICCKCC
ARTLeft Arm (3 levels)9720.500.560.660.84
Right Arm (3 levels)9710.490.560.660.82
HARMTotal (3 levels)10810.460.470.210.71
OCRATotal (5 levels)10480.310.500.680.83
QECTotal (4 levels)10810.560.580.620.81
Neck (4 levels)10920.840.840.850.93
Shoulder (4 levels)10690.390.420.490.75
Wrist (4 levels)10840.630.640.670.83
Back (4 levels)10730.440.480.570.80
SI 1Highest score (3 levels)10730.220.160.120.58
SWEA2Overall repetition (3 levels)8680.410.470.560.80
Over all postures and movements (3 levels)8710.270.30 0.360.68
Neck posture (3 levels)6620.240.320.470.76
Shoulder/arm posture (3 levels)6670.090.130.200.60
Back posture (3 levels)6720.410.440.510.75
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that yielded the highest score was used for the intra-observer reliability computations. 2 For SWEA, the work task duration is not part of the methods, hence the results with and without standardised work task duration is the same.
Table 9. Concurrent validity of risk levels with standardized work task duration, using 3 h 45 min for all ten work tasks. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels. To assess the validity, the ergonomists’ ratings were compared to those of a group of three experts.
Table 9. Concurrent validity of risk levels with standardized work task duration, using 3 h 45 min for all ten work tasks. Number of ergonomists (n), proportional agreement (%), Cohen’s kappa (K), linearly weighted kappa averaged over pairs (Klw), intraclass correlation (ICC) and Kendall’s coefficient of concordance (KCC) for the results of the ergonomists’ ratings of the ten different video-recorded work tasks. The ratings were first used to compute the score for each method and then converted into risk levels. To assess the validity, the ergonomists’ ratings were compared to those of a group of three experts.
Concurrent validity
MethodAssessmentn%KKlwICCKCC
ARTLeft Arm (3 levels)11660.460.540.650.84
Right Arm (3 levels)11650.410.480.600.78
HARMTotal (3 levels)12740.410.420.440.73
OCRATotal (5 levels)11440.280.440.590.79
QECTotal (4 levels)12750.350.470.630.81
Neck (4 levels)12940.880.880.890,95
Shoulder (4 levels)12670.430.480.580.81
Wrist (4 levels)12840.580.580.610.81
Back (4 levels)12680.340.310.290.66
SI 1Highest score (3 levels)12830.340.350.360.69
SWEAOverall repetition (3 levels)12590.270.310.380.69
Overall postures and movements (3 levels)12760.360.380.440.73
1 Although instructed to rate both hands, the ergonomists sometimes did not rate the less active hand, therefore in each video the hand that yielded the highest score was used for the intra-observer reliability computations.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nyman, T.; Rhén, I.-M.; Johansson, P.J.; Eliasson, K.; Kjellberg, K.; Lindberg, P.; Fan, X.; Forsman, M. Reliability and Validity of Six Selected Observational Methods for Risk Assessment of Hand Intensive and Repetitive Work. Int. J. Environ. Res. Public Health 2023, 20, 5505. https://doi.org/10.3390/ijerph20085505

AMA Style

Nyman T, Rhén I-M, Johansson PJ, Eliasson K, Kjellberg K, Lindberg P, Fan X, Forsman M. Reliability and Validity of Six Selected Observational Methods for Risk Assessment of Hand Intensive and Repetitive Work. International Journal of Environmental Research and Public Health. 2023; 20(8):5505. https://doi.org/10.3390/ijerph20085505

Chicago/Turabian Style

Nyman, Teresia, Ida-Märta Rhén, Peter J. Johansson, Kristina Eliasson, Katarina Kjellberg, Per Lindberg, Xuelong Fan, and Mikael Forsman. 2023. "Reliability and Validity of Six Selected Observational Methods for Risk Assessment of Hand Intensive and Repetitive Work" International Journal of Environmental Research and Public Health 20, no. 8: 5505. https://doi.org/10.3390/ijerph20085505

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop