The Literature Review of Tools and Methods for Evaluating Physical Activity Program at Workplace

Background: workplace is an important setting for health promotion on physical activity programs. Establishing PA intervention in workplace is believed to give more substantial impact than in many other communities. However, the effectiveness of the physical activity intervention in occupational space has been questioned. Physical activity is a convoluted and varying behavior, and the ability to measure the association between physical activity and chronic diseases is strongly dependent on the validity of the tools. Objective: The goal of this study is to review the type of measurements, reliability and validity of instruments/methods of measuring occupational PA. Method: Relevant peer-reviewed journals were collected from two electronic database, Medline and PubMed, using advanced search strategy and eligibility criteria. Results: The searching strategy has generated 413 articles in total and the criteria have narrowed the result to 12 relevant articles. The BRFSS, the IPAQ-L, the MOSPA-Q, the OSPAQ and the OPAQ have been proven to have good reliability. However, these questionnaires showed poor to moderate criterion validity, thus objective measures of occupational PA, such as accelerator, is still the best option.


INTRODUCTION
Physical activity (PA) is believed to be an important modifiable risk factor for several chronic disease 1-5 . Studies have revealed that the lack of physical activity leads to several chronic diseases, such as diabetes, heart disease, stroke, and cancer [6][7][8] . It is also believed that routine PA could provide a number of health benefits including reduced risk of morbidity as well as reduced risk of premature mortality.
Most of the adult population spend their time in workplace, sedentarily without doing significant physical activity (PA). And it is not until recently, workplace is more recognized as an important setting for health promotion to endorse physical activity programs. Establishing PA intervention in workplace is believed to give more substantial impact than in many other communities 7 . According to World Health Organization 6 , workplace offers several advantages in which considerable number of the working population can be reached and multiple levels of influence on behavior can be targeted. However, the effectiveness of the physical activity intervention in occupational space has been questioned.
There are limited information on the effects of occupational PA on health, as only few studies have adequately examined the outcome of occupational PA with the aim of assessing the health benefits. Available findings have shown discrepancy; where some studies suggest the protective effects of occupational PA against 9-11 , for example, cardiovascular disease, while others observe no or negative relationship 12-14 . Some studies have suggested that the health benefits of PA might vary for different domains of PA 11,13,15 . Holtermann, Mortensen 16 has also explained contrasting cardiovascular effects of PA carried out in different domains, such as during work and leisure time.
There are different forms of monitoring and evaluation of a PA program depending on the objectives of the program. And the result of this evaluation should be well communicated to the management/ public health workers. Most intervention and epidemiological studies use questionnaires rather than objective measures to monitor and measure PA. The main reason was practicality and feasibility. However, physical activity is a convoluted and varying behavior 17 , and the ability to measure the association between physical activity and chronic diseases is strongly dependent on the validity of the tools. Therefore, it is essential to have a reliable measurement instrument as the basis for drawing conclusion regarding to the impact of the program. This study will focus on measure-ment tools used to monitor and evaluate PA intervention program in workplace setting. The goal of this study is to review the type of measurements, reliability and validity of these instruments/methods; and find optimal methods of measuring occupational PA.

Literature Research
Information collected from two electronic database: Medline and PubMed. I searched relevant peer-reviewed journals in PubMed using the following advanced search strategy: ( . The similar strategy also applied for Medline using: (physical activity.mp. or Motor Activity/) AND (workplace or worksite or occupational space).mp. AND (Reliability or validity).mp. or accuracy/).

Eligibility Criteria
The searching strategy has generated 413 articles in total (122 Medline and 291 in PubMed). All the articles were sorted out based on the relevance to the keywords and then screened for possible inclusion based on title and abstract. The inclusion criteria were 1) the article should focus on physical activity in workplace setting, 2) the study was reliability and/ validity study which contained information on measurement properties, 3) the article should be published in English language, and 4) it was published from 1995 to 2019. Studies that focused on specific population, such as pregnant women or elderly were excluded, as well as studies related to specific disease or symptoms. These criteria have narrowed the result to 12 relevant articles (Appendix 1&2).

RESULT AND DISCUSSION
Most of the identified literatures were examining the reliability and validity of tools that used to measure behavior during work, such as walking, lifting, sitting, etc. While, intervention such as spatial configuration, perceived physical-social environmental factors, and travel behavior, were represented only by one article respectively. In this review, the studies on behavior PA (sitting, walking, standing, lifting, etc.) are classified as objective measures, subjective measures, and criterion standards. Objective measures such as motion sensor (accelerometer and pedometer) are frequently used in physical activity studies, especially as comparison tools to measure criterion validity. Castillo-Retamal and Hinckson 18 , in their review reported that accelerometer can be used to capture light intensity PA that rarely detected by selfreport measures. But, it also has drawback on detecting upper body movement while sitting or standing. Generally, objective measures offer good reliability and validity (ICC= 0.80-0.96, 14 days test-retest period; r = 0.92-0.96, p<0.001). However, these objective measures are overly expensive for population-based studies.
Subjective measures are the most common tools used to measure factors associated with physical activity. In workplace setting, questionnaires are the first preference, followed by motion sensor, and indirect calorimeter 18 . Subjective techniques are more preferable because they allow assessment in short period of time for large number of samples. In addition, they offer cheaper techniques and have been shown not to significantly disturb work tasks. Kwak, Proper 19 have conducted a systematic review over 31 articles. This systematic review assessed reliability and/ validity of 30 questionnaires in regard of work index/ activity score, energy expenditure, and duration of activity 19 . Four prominent questionnaires in repeatability were identified: the BRFSS, the IPAQ-L, the MOSPA-Q, and the OPAQ (ICC 0.76-0.83). Good repeatability in the work index was shown by the BRFSS, and satisfying repeatability in energy expenditure and duration of activity were shown by the IPAQ-L and the MOSPA-Q. While the OPAQ showed good repeatability on measuring duration of activity. All of these result based on strong level of evidence. And all the four questionnaires are deemed to be feasible to be applied in workplace setting as they are short and take less than five minutes to complete. The BRFSS is a single-items measure that is more preferable for rapid assessment of occupational PA level in surveillance study 20 . A more inclusive assessment of time spent are offered by the MOSPA-Q and the OPAQ 21,22 . Both are multiple-items measures with various occupational categorize. The same as the MOSPA-Q, the IPAQ-L was also reliable for measuring both energy expenditure and duration of activity, but this questionnaire is more focus on moderate and vigorous intensity PA, thus it is more suitable for assessing health-enhancing PA. However, in term of validity, none of the questionnaires showed good validity against accelerometer. The TCQ was the only questionnaire that showed moderate objective criterion validity on energy expenditure (r=0.5), but there was no appropriate study on its reliability. While, moderate-to-high subjective criterion validity was shown by TOQ in regard of energy expenditure and duration activity (r= 0.57-0.92). 19 Jancey, Tye 25 , particularly studied measurement properties of the OSPAQ and found strong to moderate reliability (ICC 0.66-0.83), and moderate correlation against accelerometer. Comparison made by Chau, Van Der Ploeg 26 on the OSPAQ and the MOSPA-Q has shown that the OSPAQ was better than the MOSPA-Q in term of reliability and validity, especially in estimating time spent sitting and standing at work. The OSPAQ was also deemed suitable tools to measure multiple health behaviors in epidemiological studies as it needs shorter time for completion and lower burden.
Criterion standard is another option of measurement tool beside of objective and subjective measures. Criterion standards include indirect calorimeter, direct observation, and doubly labelled water (DLW). They offer excellent reliability and validity on energy expenditure. Indirect calorimeter works by estimating energy expenditure from VO2 consumption and VCO2 production 18 . The benefit of this measure is it can detect small change in the variable studied. However, this measure is not suitable for population-based studies since it can only possible to be used in small sample over short period of time 18 . Other shortcomings of this measures rely on its cost and difficulty to apply into entire work days. However, Pernold, Tornqvist 27 suggested that indirect calorimeter can be employed for epidemiological study as validation tool. An alternative tool to measure energy consumption was developed by Bernmark, Forsman 28 . Their study was estimating oxygen consumption by measuring heart rate. But, this study showed poor precision, thus, this alternative criterion standard is not reliable for PA measurement.
Non-behavioral variable such as spatial configuration characteristics and perceived physical-environmental factors at workplace are believed to provide insight on occupant movements. For measuring this type of intervention, Duncan, Rashid 29 has developed a self-report instrument named the Office Environmental and Sitting Scale (OFFESS) to measure influence of spatial configuration on sitting behavior. It is reported that OFFESS scales have good internal consistency (α= 0.7 -0.86) and also reliability (ICC = 0.7 -0.87, 3 days testretest period). This questionnaire is suitable for epidemiological studies since the overall length of the OFFESS is relatively short. The work-site Supportive Environment for Active Living Survey (SEALS) were designed by Blunt and Hallam 30 to measure the influence of perceived physical-social environmental factors at workplace to physical activity. This survey has shown good internal consistency and reliability (α = 0.79 -0.86, Pearson correlation = 0.73 -0.96, P<0.005). While, article by Petrunoff, Xu (31) measured travel mode and travel time to predict the moderate-to-vigorous physical activity during travel to workplace using an online survey: www.activetravel.net.au/professionals/ tools. This survey has moderate criterion validity (ρ=0.75; Kw= 0.62, P<0.0001) and good reliability (ρ = 0.83; Kw = 0.82; P<0.0001).

CONCLUSION
Work postures are the most measured variables in occupational PA studies among other predictors of PA level. These variables are commonly measured by subjective measures such as questionnaires. Questionnaires are deemed the most efficient and feasible instruments for estimating various level of physical activity in workplace setting. They offer practicality, and some of them provide good statistical characteristics. The BRFSS, the IPAQ-L, the MOSPA -Q, the OSPAQ and the OPAQ have been proven to have good reliability. However, these questionnaires showed poor to moderate criterion validity, thus objective measures of occupational PA, such as accelerator, is still the best option. Limitation There are three main limitations on this study. One of them is the publication bias, as the search strategy only focus on reliability and/validity study which possibly have not identified all studies that measure physical activity at workplace. The number of the source was only 12 articles, 10 of which were single studies with different focus of measurement tools. Because of the small number of journals that met the criteria, the finding of this study may not be generalizable across workplace, occupation, and PA programs. And differences in object measured, type of instruments, and statistical methods that were used, make it difficult to compare the findings of these 12 different studies.
Second limitation of this review is the scoring method that was used to classify the reliability and validity of the instruments. Others might have preferred to use different cut-off point for scoring the data, but in this review, Castillo-Retamal and Hinckson 18 scoring was used to assessed reliability and validity data (good: >0.75, moderate: 0.5-075, and poor : <0.5). An-other potential limitation could be in the judgment on the feasibility of implementing the instruments for occupational PA evaluation, especially for large epidemiological studies. Only 5 out of 12 included studies mentioned the feasibility of the instruments application for large epidemiological studies. Thus, the feasibility criteria mostly based on these 5 articles and my own judgment.