Health services researchers often use administrative data for
characterizing length of stay (LOS) to address a range of objectives.
For example, they may examine how LOS (as a dependent variable) varies
as a function of patient characteristics (e.g., age, race, insurance
status, presence of comorbidity), processes of care (e.g., speed of
emergency department response, types of medications administered or
interventions applied, discharge protocols, etc.), or institutional
characteristics (e.g., teaching hospital, mental health facility, etc.)
[1-7]. Alternatively, they may examine LOS as a potential explanatory
variable for predicting other outcomes [8] or they may restrict their
cohort to patients meeting specific LOS criteria [9]. Furthermore,
accurate identification of intervals of inpatient care is required for
studies using an episodes-of-care approach [10].

The concept of LOS is simple: time from admission to discharge.
However, a number of methodological considerations arise when Veterans
Health Administration (VHA) data are used for calculating LOS. First,
goals of the project must be carefully considered, because this will
influence the algorithm selected. Is the focus on acute or long-term
care, on medical-surgical or mental health stays? Is the objective to
examine total LOS across multiple years or LOS during a particular
interval of study? Second, the algorithm must account for technical,
data-quality issues. These include duplicate records, overlapping or
sequential inpatient stays, transfers between different inpatient units,
and inpatient stays that are recorded in a subsequent year.

Despite that numerous studies focus on LOS, these subtleties of LOS
calculation have received little attention. This oversight could have
serious implications: algorithm choice can influence conclusions in
health services studies [11-13], although to our knowledge this
possibility has not been studied in the specific case of LOS. As VHA
leadership increasingly seeks to obtain accurate estimates of healthcare
costs and use evidence to guide strategic planning decisions, it is
critical that the evidence base supporting those decisions be as
accurate as possible.

One example of a clinical scenario wherein LOS algorithm choice
could influence conclusions is mental health condition (MHC)-related
differences in inpatient care use. Prior studies both within and outside
the VHA have documented that, compared with patients without MHCs,
patients with MHCs tend to use more inpatient care [6,1419]. Thus,
patients with MHCs represent a particularly high-intensity, high-cost
group likely to merit special attention by VHA policy makers. However,
some characteristics of the way patients with MHC receive inpatient care
may make their VHA records disproportionately susceptible to variation
in algorithm choice. For example, patients with MHC might be more likely
to experience more complex patterns of inpatient care (e.g.,
transferring between a medical unit and a psychiatric unit during the
course of a single hospitalization episode), or to receive care in
extended-care settings, where stays can be long and can span multiple
fiscal years (FYs). Such factors could potentially influence LOS
calculations differently for patients with MHC versus those without MHC.

We used VHA administrative data to examine how application of
incrementally more refined algorithms for calculating LOS during 1 year
of care affected conclusions about mean LOS in a national cohort of VHA
patients with diabetes. Then, as an illustrative example of the
practical implications of such methodological decisions, we examined
whether the magnitude of observed mental illness-related disparities in
mean LOS varied as a function of LOS algorithm applied.

METHODS

Study Context

This work is part of a larger study examining the effect of MHC on
processes of outpatient diabetes care in FY03. Because the focus of that
study is on outpatient care, we wished to identify (and ultimately
exclude from the larger study) patients who were institutionalized
(i.e., on inpatient status) for the majority of FY03. Therefore, our
goal was to identify, for each patient in our cohort, all days in FY03
during which the patient was on inpatient status (acute care or extended
care). We were not seeking to characterize total LOS for the patients in
our cohort (which could have spanned multiple years), but only those
inpatient days that occurred during FY03. The process of creating our
LOS variable and the effect of algorithm choice on conclusions about
MHC-related differences in LOS is the focus of the present study.

Subjects

The cohort was drawn from the FY02 Diabetes Epidemiology Cohort
(DEpiC), a census of patients with diabetes in VHA nationally. DEpiC is
used extensively for VHA epidemiological and health services research
[20]. DEpiC identifies patients with diabetes based on the presence of
at least one instance of an antiglycemic prescription or at least two
instances of a diabetes International Classification of Diseases-9th
Revision (ICD-9) code in inpatient or outpatient records. Among the
911,451 FY02 DEpiC members who were veterans, used VHA outpatient care
at least once in FY02, and were alive as of the first day of FY03, we
selected the 784,321 whose MHC status could be verified, as described
next (in subsidiary analyses, we included the full 911,451 subjects,
including those with "MHC Possible" status).

Steps to Assemble Raw Record-Level Database of Inpatient Stays

We started by creating a record-level file containing every
instance of inpatient care recorded in any inpatient database available
in centralized VHA files. We selected only records that contained at
least 1 day of inpatient care in FY03. We also deleted duplicate
records. In 10 sequential "steps," we pulled all nonduplicate
inpatient records containing any FY03 inpatient care for patients in our
cohort from the following FY03 files:

Step 1. Bedsection file, which represents acute care hospital
stays.

Step 2. OBS (Observation) file, which represents short (e.g.,
overnight) acute care stays during which the patient is observed
regarding the potential need for admission to an acute care bed.

Step 4. Census file (for Bedsection, OBS, and EXT), which include
records for all patients who still held inpatient status on the last day
of the FY, and thus for whom a discharge date was not available when the
files for that FY were created.

Step 5. Non-VHA file.

Step 6. Fee basis file.

(These latter two files reflect care received outside of VHA but
with funding for the care provided by VHA.)

We then searched FY04 and FY05 files for any records that included
some FY03 care:

Next, we processed this raw database in sequential
"stages." Stage A represented the raw file at any given step.
In stage B, we deleted pre-FY03 and post-FY03 care. Specifically, for
records with an admission date earlier than the first day of FY03, we
deleted any days preceding FY03 (i.e., we modified the record to begin
on the first day of FY03), because we were interested in days of care
during FY03, not total LOS for the patient across multiple years.
Similarly, for records with a discharge date later than the last day of
FY03, we modified the record to end on the last day of FY03.

In stage C, we addressed overlapping stays. Several types of
overlap were observed, as illustrated in Figure 1. In some cases, the
entire stay (admission date through discharge date) was contained within
the time interval of another record. This might happen, for example, if
a patient in a rehabilitation unit was temporarily transferred to an
acute care observation bed for an intercurrent illness like pneumonia.
If the patient was not formally discharged from the rehabilitation
facility prior to the transfer, then the time interval of the short-term
stay (appearing in the OBS file) could be bracketed by the interval of
the long-term stay (appearing in the EXT file). In other cases an
overlap occurred (e.g., the admission date of one record fell between
the admission and discharge dates of a subsequent record, or the
discharge date of a record fell between the admission and discharge date
of a subsequent record). In other cases, contiguous admissions occurred
(i.e., the discharge date of one record was the same as the admission
date of a subsequent record). For all these overlap cases (which could
involve a pair of records or even three or more records), we created a
single contiguous episode of FY03 inpatient care by assigning the
admission date to be the first admission date in FY03 among the
overlapping records and the discharge date to be the last discharge date
in FY03 among the overlapping records. The resulting file at step 10,
stage C, was our final record-level file of inpatient stays.

Variables

We calculated LOS for each record as the number of days from its
start through end dates. At each step/stage, we calculated a cumulative
LOS for each patient by adding the record-level LOS for all records
included in that step/stage.

To identify patients with MHC, we used the Agency for Health
Research and Quality's Clinical Classifications Software (with
minor modifications) to generate a list of ICD-9 codes indicating the
presence of MHC [21]. A patient was assigned a "Yes" for MHC
status if he/she had at least one instance of an MHC ICD-9 code in any
inpatient record or outpatient face-to-face clinic visit at baseline
(FY01-02) and at least one confirmatory ICD-9 in the study period
(FY03). If he/she had no instance of an MHC ICD-9 in FY01 through 03,
then he/she was assigned MHC status "No." Otherwise, MHC
status was considered "Possible." That is, the MHC Possible
group represents those patients who had an MHC diagnosis in the baseline
period or in the study period, but not both. Cases with MHC Possible
status were excluded from main analyses; this allowed us to compare LOS
in two more sharply defined groups (MHC Yes vs MHC No).

[FIGURE 1 OMITTED]

Analysis

We tabulated the number of records and calculated mean LOS within
each cell of a 10 x 3 matrix representing the steps and stages of
database development. Next, in each cell, we calculated mean LOS as a
function of MHC status. We then calculated the difference (A) in mean
LOS among patients with MHC versus those without MHC and compared mean
LOS for the MHC Yes versus MHC No groups using a two-sample t-test. We
applied Bonferroni correction for compounding of Type I error across
multiple comparisons. Results of hypothesis tests are declared
statistically significant for p < 0.05 after Bonferroni correction.

RESULTS

Among the 784,321 patients with diabetes in the full cohort,
152,591 were identified as having evidence of an MHC diagnosis (MHC
Yes). Among the subset of 92,255 patients who received any inpatient
care in FY03 (based on step 10, stage C), 39,452 had MHC Yes. Table 1
presents the age, sex, Physical Comorbidity Index score (a count from
0-35, developed for case mix adjustment in VHA patients [22-23]), and
primary care use in the full cohort and in the subset who used inpatient
care, by MHC status.

Table 2 catalogs the number of records and LOS at each step/stage
in the database assembly process. The cumulative number of patients who
are identified as having received inpatient care in FY03 (based on stage
C) increases progressively from step 1 to step 10 (as do the number of
records). For example, when the OBS file was added to the Bedsection
file, an additional 10,660 records were added for stays that did not
perfectly duplicate a Bedsection file stay for that patient. This is
expected, because additional evidence of inpatient care is added at each
step. More noteworthy is that some steps contribute more records than
others.

The number of records does not change at stage B (compared with
stage A), because this processing step truncates records (to include
only inpatient days during FY03) but does not delete records. However,
at stage C (record consolidation), the number of records drops
substantially, because overlapping stays are merged into a single,
longer stay.

Consistent with these observations, mean LOS at stage C increased
progressively with sequential steps (i.e., as more sources of data were
added), except at step 2 (where patients with short OBS stays were
added) and at step 5 (where patients with non-VHA stays were added).
Similarly, mean LOS decreased progressively with sequential stages. That
is, mean LOS decreased from stage A to stage B as non-FY03 days were
deleted (which would be relevant to a study like ours that focuses on
care received in a single FY). Mean LOS also decreased from stage B to
stage C as overlapping days were deleted (which would be relevant to the
accuracy of the LOS estimate in any study design). Across the 10 x 3
matrix, mean LOS ranged from 13.8 to 74.9 days.

Table 3 presents LOS by MHC status at every step/ stage in the
database assembly process. The calculated diffference (A) in mean LOS
between the MHC Yes and the MHC No groups varied markedly by algorithm
and was statistically significant (p < 0.001) at every step/stage.
Correction for multiple comparisons did not statistically affect any
findings significantly. As illustrated in Figure 2, step 1, [DELTA] =
4.1 at stage A and 3.8 at stage C. In contrast, at step 10, [DELTA] =
57.8 at stage A and 15.5 at stage C (p < 0.01 for both
between-algorithm comparisons of the values of [DELTA]).

To obtain the LOS in stage C, for each pair of overlapping records,
we generated a single record by setting the FY03 admission date as the
earliest of the two admission dates and the FY03 discharge date as the
latest of the two discharge dates. We repeated this process iteratively
until all pairwise overlaps were addressed. This data processing stage
was the most involved, because it needed to account for multiple
potential overlap patterns, as illustrated schematically in Figure 1 .
The most common overlap pattern (pattern I) was contiguous records,
i.e., where the discharge date of one record was the admission date of
the following record. This pattern would happen, for example, if a
patient were admitted to one bed section (e.g., to the Psychiatry
Department for suicidal ideation) and then transferred to another bed
section (e.g., to General Medicine for a hospital-acquired infection).
Of note, we used the Bedsection files for these analyses. VA Bedsection
files create a new record each time a patient transfers to a different
clinical service ("bedsection") during a hospital stay. This
is in contrast to the VA Main files, which create a new record for each
stay; all contiguous bedsection stays are combined in a single record.
Had we used the Main file instead of the Bedsection file, we expect that
we would not have encountered this particular form of overlap. Other
overlap patterns were also observed, as Figure 1 shows. Of note, step
10, stage B, yielded LOSs of more than 365 days for 3.2 percent of the
MHC Yes group and 1.4 percent of the MHC No group, clearly representing
a residual problem with the algorithm; in contrast, no patient had LOS
greater than 365 days at stage C. This finding supports the importance
of the stage C processing.

[FIGURE 2 OMITTED]

In a subsidiary analysis, we found that both the admission and
discharge dates fell within FY03 for 91 percent of records at step 10,
stage A. In those instances, the full LOS for that episode of care was
captured and no truncation was required.

Our main analyses excluded patients who had MHC Possible status
(i.e., those patients who had an MHC diagnosis in the baseline period or
in the study period, but not both). In another subsidiary analysis (see
online Appendix), we repeated the main analysis in the initial cohort (n
= 911,451), calculating mean LOS as a function of MHC as a three-way
variable (MHC Yes, MHC Possible, MHC No). Mean LOS for the MHC Possible
group was consistently intermediate between that for the MHC Yes and MHC
No groups. For example, for the MHC Possible group, mean LOS was 16.2 at
step 1, stage A; 15.1 at step 1, stage C; 90.3 at step 10, stage A; and
34.4 at step 10, stage C.

DISCUSSION

Choices about what algorithm to use when identifying episodes of
inpatient care substantially alter conclusions about the overall
intensity of inpatient use and about MHC-related disparities in LOS. Not
searching across all appropriate sources of data can lead to failure to
capture a substantial amount of inpatient care, thus leading to
underestimates of LOS. Decisions about how to process records can
likewise influence calculated LOS. While other studies have documented
that algorithm choice can influence conclusions drawn from VHA data
[11-13], we are not aware of this result having been previously
documented for LOS.

Researchers have access to many sources of data about VHA
patients' nonambulatory care. Indeed, the large number of sources
can bewilder investigators new to VHA administrative data, who may be
unsure which files to select. Fortunately, the technical manuals
developed by the Department of Veterans Affairs Information Resource
Center (available at http://www.virec.research.va.gov/) and the
Department of Veterans Affairs Health Economics Resource Center
(available at http://www.herc.research.va.gov/) explain these files in
detail. Our data provide further empiric information to help guide these
decisions. First, our results confirm that adding more data sources
identifies more inpatient days. Second, our results indicate that the
EXT and Census files are especially important sources of incremental
days of inpatient care. Third, our results indicate that adding more
data sources also changes conclusions about the magnitude of effect
(though not the direction of effect) of MHC on LOS. The step at which
this has a particularly pronounced effect is the addition of EXT files,
indicating that, compared with patients with no MHC, patients with MHC
have disproportionately more frequent or prolonged stays in the
long-term care setting.

Investigators using any VHA database need to examine data closely
to determine whether data processing steps are necessary. In the case of
inpatient files, our data indicate that in addition to the standard
procedure of deleting pure duplicate records, investigators must account
for overlapping stays (wherein a single day can be counted twice) and,
for studies such as ours that focus on a single year of care, to
truncate days falling before or after the FY of interest. Such pitfalls
could, in some cases, reflect data quality problems, such as a
data-entry error in admission or discharge date. However, in many cases,
they may not represent deficits in the quality of VHA administrative
data, but instead may reflect VHA clinical/administrative record-keeping
practices. For example, a single stay could legitimately be recorded in
more than one file if these files are used differently. Similarly, a fee
basis stay (with the correct admission and discharge dates) could be
filed in a subsequent year's records if a delay occurred in receipt
of the bill from the outside vendor. Regardless of whether some of these
factors represent data quality problems, investigators need to account
for them; if not, some patients will have inflated estimates of LOS.
Indeed, without such corrections, some patients will appear to be on
inpatient status for more than 365 days in a single FY.

While the focus of this study is on the issue of algorithm choice
for calculation of LOS, we use MHC-related disparities in LOS as a case
study to illustrate what can happen if such issues are not considered.
Health services researchers frequently examine disparities in processes
and outcomes of care. Historically, interest in disparities related to
characteristics like race, sex, and age has been great, but emerging
evidence suggests that disparities related to MHC status are also common
[9,24]. We demonstrated that the magnitude of MHC-related differences in
LOS varied markedly as a function of LOS algorithm. Thus, the
methodological issues raised here are not just theoretical: algorithm
choice can have marked effects on conclusions in healthcare disparities
research.

In the course of conducting analyses for this illustrative example,
a subsidiary benefit was that informative findings about associations
between MHC status and LOS emerged. Patients with MHC spent more of FY03
on inpatient status than did patients with no MHC; this was a consistent
and robust finding across every algorithm examined. This finding is
consistent with other studies that have shown heavier use of inpatient
services by patients with MHC [6,14-19]. Our study also shows that some
types of care (e.g., EXT) are associated with a disproportionately
greater MHC effect. Another strength of our approach is that we
distinguished between patients with stronger evidence of MHC (i.e., at
least one MHC diagnosis at baseline in FY01-02 and at least one
confirmatory MHC diagnosis in the study period, FY03) and patients with
less certain (Possible) MHC status (i.e., presence of an MHC diagnosis
either at baseline or in the study period, but not both). Our subsidiary
analyses provide information about MHC Possible patients, a group that
has not been well characterized in prior work. The MHC Possible group is
likely heterogeneous and includes patients with an erroneous MHC
diagnosis, with transient or resolved MHC, or with less severe MHC, as
well as patients who receive part of their care outside the VHA system.
Mean LOS for the MHC Possible group consistently fell between the mean
LOS observed for the MHC Yes and the MHC No groups.

Interpretation of our findings is subject to several caveats.
First, our aim was to calculate total number of days spent on inpatient
status during FY03; values should not be interpreted as indicating total
LOS across years. However, for 91 percent of records, the patient's
complete stay was contained within FY03. Second, we did not use the VHA
Decision Support System (DSS) Outpatient (OPAT) file as a data assembly
step. In the OPAT file, Stay Type 42, Bedsection 80 refers to nursing
home care reimbursed by VHA in any particular month. However, dates of
admission and discharge could not be accurately generated from that
source. Third, our focus was on VHA use. Depending on an
investigator's study question, capturing inpatient days spent in
other settings might also be important, such as days identified from
Medicare claims data, which can be linked to VHA administrative data
[25]. Fourth, because the purpose of our study was to identify periods
during which the patient was on nonoutpatient status, our LOS
calculations included both acute care and long-term care days. Studies
focusing on one or the other setting might need to consider other
methodological issues. For example, a patient's stay in a skilled
nursing facility could have short gaps (e.g., for a brief acute care
stay), which might not be captured with the databases used. Fifth, our
main analyses excluded patients whose MHC status could not be
ascertained with certainty (MHC Possible), so LOS estimates cannot be
generalized to all VHA patients. Subsidiary analyses suggested that
these excluded patients had intermediate LOS and that algorithm choice
similarly affected LOS calculations for them. Sixth, MHC diagnoses came
from ICD-9 diagnosis codes in VHA administrative data rather than from
direct assessment of patients' MHC. Given the known problem of
underdiagnosis of MHC [26-27], some patients with MHC are likely
included in the MHC No group. This would be expected to bias results
toward the null.

This study examines methods that should be considered when an
algorithm is developed that uses VHA data to calculate LOS. The specific
algorithm selected will depend on the research question, such as--

* What types of inpatient care are of interest? For example, is the
focus on acute care, extended care, care received on a fee basis outside
of VHA or some combination of these sources? If rehabilitative/extended
care is the focus, will additional sources (e.g., VHA EXT, fee basis,
non-VHA and DSS OPAT files, as well as Medicare or Medicaid files) be
queried, and how will multiyear stays be addressed?

* Is the focus on care received in a particular time interval (such
as one FY) or on a full episode of inpatient care? If the former, will
subsequent years' files be searched for stays recorded in a
subsequent FY, and what is the expected incremental benefit versus cost
of pulling data from multiple years? If the latter, how many years of
data will be searched to identify the complete LOS, which could
potentially span many years?

* Is the objective to characterize private sector inpatient care
received as well, and if so, should other sources (such as Medicare
claims data) be queried?

Careful consideration of these study design issues should yield an
algorithm tailored to a particular study's objectives.

CONCLUSIONS

Accounting for the methodological issues raised here should help
VHA health services researchers avoid pitfalls in calculation of VHA
LOS, such as failure to capture care recorded in more obscure data
sources (leading to underestimates of LOS) or duplicate counting of some
days of care (leading to overestimates of LOS). This result is expected
to support more robust estimates for economic analyses, since inpatient
costs contribute disproportionately to total cost of VHA care. This
result is also expected to enhance the accuracy of data VHA uses in its
evidence-based efforts to redesign its healthcare delivery systems,
which aim to improve the quality of care provided to veterans.

Financial Disclosures: The authors have indicated that no competing
interests exist.

Funding/Support: This material was based on work supported in part
by the National Institutes of Health (grant NIDDK 1 R01 DK07120201) and
by VA Health Services Research and Development Service, (grant RCS
90-001).

Additional Contributions: The authors are grateful to Ciaran
Phibbs, PhD; Todd Wagner, PhD; and Susan Schmitt, PhD, for their advice
and conceptual input during the process of developing the LOS algorithm.
The views expressed in this article are those of the authors and do not
necessarily represent the views of the VA.