Objectives We aimed at developing and evaluating statistical models predicting objectively measured occupational time spent sedentary or in physical activity from self-reported information available in large epidemiological studies and surveys.

Methods Two-hundred-and-fourteen blue-collar workers responded to a questionnaire containing information about personal and work related variables, available in most large epidemiological studies and surveys. Workers also wore accelerometers for 1–4 days measuring time spent sedentary and in physical activity, defined as non-sedentary time. Least-squares linear regression models were developed, predicting objectively measured exposures from selected predictors in the questionnaire.

Results A full prediction model based on age, gender, body mass index, job group, self-reported occupational physical activity (OPA), and self-reported occupational sedentary time (OST) explained 63% (R2 adjusted) of the variance of both objectively measured time spent sedentary and in physical activity since these two exposures were complementary. Single-predictor models based only on self-reported information about either OPA or OST explained 21% and 38%, respectively, of the variance of the objectively measured exposures. Internal validation using bootstrapping suggested that the full and single-predictor models would show almost the same performance in new datasets as in that used for modelling.

Conclusions Both full and single-predictor models based on self-reported information typically available in most large epidemiological studies and surveys were able to predict objectively measured occupational time spent sedentary or in physical activity, with explained variances ranging from 21–63%.

Even in modern information societies, a considerable proportion of the working population is exposed to physical activity
at work (1, 2). In a national survey in 2012, 39% of the Danish workforce reported to have a job where ≥75% of the time required some physical
activity, such as standing and walking (3). More self-reported time spent in physical activity during work has been associated with increased risk of long-term sickness
absence (2, 4), premature drop-out from the workforce (5), and cardiovascular and all-cause mortality (1, 6, 7). On the other hand, other workers spend a large proportion of the time at work being sedentary (8–10), which has been suggested to be associated with increased all-cause mortality (11), musculoskeletal pain (12), and obesity (13).

Occupational time spent sedentary and in physical activity have so far mainly been determined using questionnaires that are
feasible to administer in a large population, such as in national surveys (14–16). However, questionnaires have been criticized for giving biased and imprecise results compared to objective measurements
(17). Systematic and random measurement errors may lead to misleading results, both when documenting time spent sedentary and
in physical activity and when determining associations with relevant outcomes such as health and well-being. As an alternative,
objective measurements using accelerometers offer accurate information of time spent sedentary and in physical activity (18, 19). Thus, accelerometer recordings have been used as the gold standard for validating questionnaire-based data on time spent
sedentary and in physical activity (20, 21). However, accelerometers demanding more resources to use than questionnaires (22), disqualifing them from most large-scale studies.

An attractive compromise would be to predict objectively measured occupational time spent sedentary and in physical activity
from self-reported information that would generally be available in most large epidemiological studies and surveys. Explicit
prediction models have been proposed before to predict time spent sedentary and in physical activity (23–25), but these studies have not developed models for exposures at work, which may show associations with self-reported predictors
other than leisure time exposures. A few previous studies have, indeed, developed prediction models for time spent sedentary
and in physical activity specifically at work (26–30). However, they have mainly focused on predicting answers to some self-reported variables by another type of self-reported
information. This approach increases the risk of correlated error or common-method bias (31).

Another limitation of previous prediction models addressing time spent sedentary and in physical activity at work is that
the predictors included in the models, such as cognitive (32) and psychosocial variables – including social norms, self-efficacy and advantages of sitting less (26) – are not normally available in large epidemiological studies and surveys. Developing models based on predictors that typically
appear in large epidemiological studies and surveys would increase the utility of the models in the context of, for instance,
public health surveys and cohort studies of occupational health.

As a general endeavor in exposure modelling, examination of simple models based on few predictors is of interest, since parsimonious
models may be easier to use and more stable than models based on many predictors. In the present context, this would call
for assessments of the performance of models based only on selected questionnaire variables that can be expected to be particularly
predictive of sedentary behavior and physical activity at work. Thus, this study aimed at developing and evaluating statistical
models predicting objectively measured time spent sedentary and in physical activity at work from self-reported variables
which would generally be available in large epidemiological studies and surveys. A secondary aim was to examine the extent
to which single-predictor models based on questions regarding occupational sedentary time (OST) or occupational physical activity
(OPA) can predict the result of objective measurements of corresponding variables.

Methods

Study design and population

Recruitment flow of the study population is shown in Appendix A (www.sjweh.fi/data_repository.php). Workers were recruited from seven blue-collar occupations (manufacturing, assembling, construction, cleaning, garbage collection,
mobile plant operation, and health services) in the cross-sectional New Method for Objective Measurements of Physical Activity
in Daily Living (NOMAD) study (33) to obtain a wide range of exposures while maintaining homogeneity among workers with respect to socioeconomic status.

The Ethics Committee for the Capital Region in Denmark approved the study (journal number H-2-2011-047), which was conducted
in accordance with the Helsinki declaration.

Procedure

At each workplace, data were collected continuously during a four-day period, with research staff being present at the workplace
on days one and four (33, 34). On day one, workers interested in participating in the study underwent anthropometric measurements and completed questionnaires
addressing variables related to demographics, health, lifestyle, work and psychosocial factors. Also on day one, objective
measurements of time spent sedentary and in physical activity at work were initiated by equipping the workers with two accelerometers
(Actigraph GT3X, ActiGraph LLC, Florida, USA) and a diary for noting working hours. On day four, workers returned the measurement
equipment and the diary. Approximately 80% of the workers declared that the objective measurements were collected during typical
working days.

Occupational time spent sedentary and in physical activity

Sedentary behavior and physical activity were analyzed using the custom-made Acti4 software according to established procedures
(34, 35). The software identifies a number of different activity types, as well as the gross body posture. For the present study,
we merged periods of sitting and lying into “sedentary time” and collapsed periods with any type of physical activity into
one category, ie, “physical activity”. Thus physical activity is defined to occur whenever the worker is not sitting or lying.

Time spent sedentary and in physical activity was averaged for each specific worker across all working periods with valid
measurement data. A working period was considered valid if it comprised ≥4 hours of work, and corresponded to ≥75% of that
individual’s self-reported average working time per day. Workers with at least one valid work day were included in further
analyses.

Predictors

The predictors used for modelling in this study were selected a priori from the questionnaire based on (i) whether they would
likely predict time spent sedentary or in physical activity according to previous studies (26, 29, 30, 32, 36–38), (ii) whether they are commonly available in large epidemiological studies and surveys, and (iii) whether they showed a
large relative dispersion between workers in our material. Based on these criteria, we arrived at including self-reported
information on age, gender, body mass index (BMI), job type, OST, and OPA. These predictors are described in detail in Appendix
B (www.sjweh.fi/data_repository.php). Selecting predictors a priori without knowing their relationship to the outcomes is a recommended approach in modern statistical
literature as it reduces the risk of capitalizing on chance and arriving at spurious relationships between predictors and
outcomes (39–41).

Statistical analyses

All predictors were treated as continuous variables except for gender, job type, and OPA which were treated as categorical
variables. Statistical operations were performed using the R software package “rms” (42).

The six predictors (cf. Appendix B) were modeled together against each objectively measured exposure using least-square linear
regression analyses to develop a full prediction model. The available degrees of freedom for statistical analysis in this
study were sufficient to allow inclusion of all six variables.

The sensitivity of the full model to selection of predictors was analyzed by running models removing one predictor at a time
and observing the resulting change in the explained variance.

In addition, single-predictor models were developed based only on those predictors that directly focused on sedentary behavior
and/or physical activity. Thus, two models were developed using least-square linear regression analyses, one based on self-reported
information on OST (variable #5 in Appendix B) and another based on self-reported OPA (variable #6 in Appendix B). The performance
of the resulting full and single-predictor models was evaluated by the R2 adjusted and the mean squared error (MSE) of estimation. The residuals of the models were examined for normal distribution
and homoscedasticity.

The expected ability of the full and single-predictor models in predicting objectively measured exposures in new datasets
was estimated using a bootstrap resampling procedure (43). Five-hundred bootstrapped virtual datasets were drawn with replacement from the source population and were of the same
size. For each virtual dataset, we fitted a model including the same predictors as in the original model. This re-fitted model
was then applied to the source dataset, and the model fit parameters were compared to the parameters obtained in the original
fit. The differences, ie, the “optimism” of the original model, were averaged across all bootstrap repeats and used as an
overall measure of optimism, reflecting the extent to which the original model capitalized on chance (44).

Results

Descriptives of the participating workers are shown in table 1. Of the 214 workers, most were engaged in manufacturing (27%) and least in mobile plant operations (5%).

Table 1

Descriptive statistics of the predictors considered for inclusion in the models estimating objectively measured time spent
sedentary and in physical activity at work among blue-collar workers (N=214). [OPA=occupational physical activity; OST=occupational
sedentary time; Min=minimum; Max=maximum; SD=standard deviation; BMI=body mass index]

Table 3 shows the resulting coefficients of both the full and the single-predictor models estimating objectively measured sedentary
time at work. Since time spent sedentary and in physical activity as defined in the present study are complementary variables
(ie, they add up to 100% of the working time) we focus on presenting, in detail, only the results of modelling sedentary time
at work. Results pertaining to the prediction model for time spent in physical activity are presented in Appendix C (www.sjweh.fi/data_repository.php). The full model predicted 63% of the variance. Older age and higher BMI were found to be significant predictors of less
objectively measured sedentary time at work. Additionally, workers who reported more OPA (answer categories 2–4 of the single
four-graded question on OPA, variable #6, cf. Appendix B) were exposed to less objectively measured sedentary time at work
compared to those reporting that their work was mostly sedentary and did not require strenuous physical activity (category
1 of the OPA question). Assemblers, garbage collectors and manufacturing workers had markedly less sedentary time at work
than mobile plant operators. When predictors were removed from the models one at a time, explained variance was reduced between
1% (gender) and 18% (job group; cf. table 3).

Table 3

Models predicting objectively measured sedentary time at work on the basis of answers to questions normally available in large
epidemiological studies and surveys. [B=regression coefficient; 95% CI=95% confidence interval; R2adj=coefficient of determination (explained variance) adjusted for the number of terms in the model; OPA=occupational physical
activity; OST=occupational sedentary time; MSE=mean squared error].

a R2adj=explained variance in a model comprising all predictor variables but the one represented by the corresponding row. Note that
OPA categories and job groups were removed en bloc.

b Mostly work while standing or walking but not requiring strenuous physical activity.

c Work while standing or walking with some lifting and carrying.

d Heavy or fast moving work that is physically strenuous.

The developed full model (table 3) can be used to predict sedentary time at work for any worker characterized by some specific combination of the predictor
variables in the model. For example, a 53-year-old female assembler, with a BMI of 29.1 kg/m2, who has reported “mostly sedentary work that does not require strenuous physical activity” (response category 1 of the OPA
question), and stated her sedentary time to be “almost all the time”, is predicted by the model to be sedentary 67.8% of her
time during working hours according to objective measurement.

Figure 1

Objectively measured versus predicted occupational time spent sedentary (A; illustrating the full model in table 3) and in physical activity (B; illustrating the complementary full model in Appendix C). Line of identity is included in the
diagrams

The single-predictor model predicting objectively measured sedentary time at work only from self-reported OPA, showed an explained
variance, R2 adjusted, of 21% (MSE 260.0; table 3). The other single-predictor model based on the worker’s self-reported OST explained 38% of the variance of objectively measured
sedentary time at work (MSE=212.4; table 3).

Internal validation using bootstrapping

The bootstrap validation of the full model revealed an optimism of 5% in explained variance, R2. When models developed from
the virtual bootstrap datasets were used on the source data, MSE was 136.1, as compared to 117.0 in the original fit (cf.
table 3). The bootstrap validation of the single-predictor models based on self-reported answers on OPA or OST revealed an optimism
of 2% and 0%, respectively, in explained variance when predicting objectively measured sedentary time at work. The corresponding
MSE were 268.5 (original fit 260.0) and 215.2 (original fit 212.4).

Discussion

This study aimed at developing and evaluating prediction models for estimating time spent sedentary or in physical activity
during working hours by self-reported variables that would normally be available in large epidemiological studies and surveys.
Among blue-collar workers, a full model explained 63% (adjusted R2) of the variance of objectively measured time spent sedentary, and since we defined physical activity to occur whenever the
worker was not sedentary, our model for predicting physical activity had the same performance. Bootstrap validation suggested
that this performance was somewhat optimistic; the expected adjusted R2 when using the model in new datasets would be 57%.
Single-predictor models based only on self-reported OPA or OST explained 21% and 38%, respectively, of the variance in objectively
measured exposures. The single-predictor models were very stable according to the bootstrap validation.

This study is novel in predicting objectively measured time spent sedentary or in physical activity at work from self-reported
information that would normally be available in large epidemiological studies and surveys not specifically designed to predict
these exposures. The performance of the full model based on age, job group, BMI, and self-reported OST and OPA was similar
to the best performances of previous questionnaires on occupational sedentary and physical activity (45–47). The performance of our model was even better than previously developed models using a customized set of variables to predict
self-reported OST or OPA (26, 29, 30). Also, these previous models produce estimates of self-reported OST or OPA, and thus do not adjust for the bias present
in these self-reports, relative to objectively measured data. When used in investigations on the effects of sedentary behavior
or physical activity on different health outcomes, these models may therefore lead to biased associations and misleading estimates,
eg, of the health effects of being sedentary for a particular proportion of the working day. In producing estimates of objectively
measured exposure our models avoid this fallacy.

In our full model, age was observed to be negatively associated with sedentary time at work, which corroborates many previous
studies (26, 30, 32). BMI also tended to be negatively associated with time spent sedentary, and this contradicts previous studies (26, 29). However these studies were conducted on the general working population (26, 29) and the models predicted self-reported OST, not objectively measured time as in our study (26, 29). More studies are needed to verify the observed negative association of BMI with objectively measured sedentary time at
work, and to disclose whether this association is specific to blue-collar workers.

The predictor “job group” contributed substantially to explain variance in objective exposure, as shown by the considerable
decrease in R2 when removing this particular predictor from the full model (table 3). Job group has previously proven to be an important predictor of sedentary behavior and physical activity (26). However, this previous study categorized participants mainly as white- or blue-collar workers and did not use a detailed
occupational classification. Thus, our finding that even job group within the segment of blue-collar workers appears to be
an important predictor of sedentary time and physical activity is novel, and encourages further research into the performance
of this predictor even for other blue-collar jobs.

As expected, workers reporting more OST were also, on average, more sedentary according to objective measurements, as predicted
by our full model. Similarly, workers reporting more OPA (categories 2–4 of the OPA question) generally had less objectively
measured sedentary time at work, compared to those reporting that their work was mostly sedentary and did not require strenuous
physical activity (category 1 of the OPA question). These results indicate that self-reported OST and OPA do have some potential
to predict objective exposures. We did, indeed, find that a fair prediction of objectively measured sedentary time at work
could be obtained even using single-predictor models based on a single question about either OST or OPA (cf. table 3). These results point to a similar or better predictive ability of these self-reported data than found in previous studies
addressing sedentary behavior and physical activity (48, 49). Still, answers to these questions cannot explain the major part of the variance of the corresponding objective measures.
For the single item measuring OST, one explanation may be that response categories are defined using ambiguous terms such
as “almost” and “rarely/very little”, which workers may find difficult to interpret. Answers to the OPA question could only
explain 21% of the variance in objectively measured exposures. One reason may be that the four response categories of the
OPA question do not fully cover the range of physical activities occurring during work. For example, some workers may have
a job comprising considerable sedentary time, but also occasional periods of high physical load; mixed exposures are not clearly
captured by any specific response alternative to the OPA question. Notably, this ambiguity may be present even though we slightly
modified the answer categories for the OPA question compared to its original form (50) so as to pursue better clarity. For example, response category 1 in the original version (“predominantly sitting”) use the
term “sitting” without further specification. This may lead to confusion among workers responding to this question, and the
modified OPA answer attempts to be more specific in saying, “Mostly sedentary work that does not require strenuous physical
activity’’. In response category 4, the term “heavy manual work” used for the original OPA question was replaced with “heavy
or fast moving work that is physically strenuous”, which we believe is more explanatory. Due to these slight discrepancies,
our prediction models may perform differently if they are applied to answers on the original OPA question.

For a prediction model to be useful, it should not only perform well for the dataset on which it was developed, but also for
new datasets. We could not perform a genuine external validation of our prediction models since a new similar dataset was
not available. In lack of new data for testing the models’ performance, we used an alternative validation technique, i.e.,
internal bootstrap validation, which has been recommended in statistical literature on modelling (40, 44, 51). The bootstrap validation suggested a 5% optimism in variance explained by the full models predicting time spent sedentary
and in physical activity at work. Thus, a considerable proportion of exposure variance could be explained by the full models
even after adjusting for optimism. We therefore consider our models to be useful in other datasets, even when taking into
consideration that bootstrap validation may result in a too-positive impression of model performance since it inherently reflects
the structure of the source data. Also, as our objective measurements were collected for a limited number of days, which leads
to uncertainty in the resulting “true” mean exposure estimates, we would expect our models to perform even better in predicting
mean exposures for longer periods of time, for which sampling variance will be less pronounced.

Our definition and operationalization of sedentary time is consistent with the majority of previous research on sedentary
behavior (52, 53). However, defining physical activity as being strictly complementary to sedentary time (ie, equivalent to “non-sedentary
time”) deviates from some previous studies (54, 55). This reflects an inconsistency in the literature regarding how to define physical activity. For example, Smith, Hamer (55) defined physical activity on basis of a step count while Buman, Hekler (54) used thresholds based on accelerometer counts. Due to this discrepancy in the operational definition of physical activity,
we emphasize that our models are valid only when adopting equivalent definitions of physical activity.

Practical implication of the findings and future research

Our full prediction models will be particularly useful in future studies in which data can be collected on all predictors.
The models may also, with some hesitation, be used to obtain post-hoc estimates of objectively measured time spent sedentary
and in physical activity at work in existing studies if data on the predictor variables included in our models are available.
Danish national surveys of working life regularly collect self-reported information on all predictor variables used in our
model. In the Danish national survey on work and health conducted in 2010 (56), data were collected on age, BMI, gender, OST, OPA, and job group using similar questions as in the present study. Thus,
we believe that our full models have the potential to be used in a retrospect reconsideration of data on exposures and occupational
risks, at least in Danish national surveys. We emphasize that the prediction models developed in the present study are specific
to the source population of blue-collar workers, and we recommend future studies to test our models in other occupations and,
if necessary, develop adjusted models that fit those occupations better. We also encourage studies testing our models for
use in the general population.

Additional predictors that we did not include in our models may have the ability to predict sedentary time and physical activity,
for instance psychosocial variables (26). Thus, we encourage investigations into the contents of other large-scale survey data materials to disclose whether they
do, indeed, include the predictors used in our models (or subsets thereof) and/or additional potential predictors, for instance
psychosocial variables. To this end, variables that may strongly depend on the character of the job, psychosocial factors
being one example, should be included with caution, due to their potentially strong correlation with occupation; and a possible
predictive ability should be interpreted with this correlation in mind. We encourage efforts to develop new, short, yet reliable
questionnaires assessing occupational sedentary behavior and physical activity, and verify if they can predict “true” exposures
better than the self-reported OPA or OST used in our study. If so, they could be attractive for future large scale population
surveys. In this line of development, we encourage paying consideration to the trade-off between resources required to collect
predictors and the eventual performance of the model. We did not in the present study compare costs associated with collecting
exposure data directly by objective measurements versus costs of developing prediction models using questionnaire data, but
we emphasize that a major rationale for modeling exposures in future studies would be that they deliver more exposure data
at a lower cost than objective measurements, and thus represent a favorable trade-off between cost and performance (57, 58). We also emphasize the need for validating the performance of any future model in new datasets, either using a genuinely
new sample or by internal bootstrap validation techniques; this has very rarely been attempted in previous modelling studies
(41).

Concluding remarks

This study showed that full prediction models based on self-reported information that would normally be available in large
epidemiological studies and surveys could predict objectively measured time spent sedentary and in physical activity at work
with a reasonably good accuracy. Internal validation of the prediction models indicated that performance would decrease slightly
if they were used in new datasets from the studied occupations, but that they may still be useful for other populations of
blue-collar workers, with due caution. Single-predictor models using information specifically addressing sedentary behavior
and physical activity showed lower performance than full models, but still offered an attractive opportunity to predict objectively
measured exposures. We suggest that our models may be used to revisit previous studies based only on self-reports, as well
as an inspiration for designing future large epidemiological studies and surveys addressing sedentary behavior and physical
activity on the basis of self-reported information.

Saltin, B, & Grimby, G. (1968). Physiological analysis of middle-aged and old former athletes. Comparison with still active athletes of the same ages. Circulation, 38(6), 1104-15, http://dx.doi.org/10.1161/01.CIR.38.6.1104.