We strongly believe that it would be extremely useful, to say the least, to have
standardization of functional status measures at least in post-acute care so that if
similar patients are treated in different post-acute settings, or if patients are
treated in successive post-acute care settings, that we would have a means of
measuring them. ... It would expand the utility of regularly collected
information. (National Committee on Vital and Health Statistics, 2003).

Potential Measurement Solution: Linking Instruments

There is more than one way to coordinate rehabilitation services along the

continuum of care. For example, one might institute the use of the same instrument to

measure functional ability in all rehabilitation settings so that the results would be

directly comparable. In this manner, the patient's progress can be easily tracked across

settings and services. Unfortunately, attempts to use the FIM across settings have met

with limited success as there are sizeable obstacles to implementing such a plan.

Already, there has been a huge monetary investment in our current attempts to measure

rehabilitation outcomes. To figuratively, "throw out" these current instruments and

replace them with one new and improved universal measure would likely be cost-

prohibitive (Cohen & Marino, 2000). There would also likely be significant resistance to

implementing such a plan. In our capitalistic economy, private fortunes are often tied up

in maintaining the status quo and significant resistance would be likely. Furthermore,

significant costs would be incurred in training staff in a new system, as well as in the

implementation of a new database.

The nature of the predicament now facing rehabilitation services of having

multiple, yet incompatible instruments, attempting to quantify a single construct is not a

new one. It has been successfully overcome in others areas, for example, within the basic

sciences. Major scientific advances have been possible in part because the instruments

used to measure a construct were standardized and, in some cases, linked to other similar

instruments. An example of this is the historical attempts to measure the construct of

temperature. What began as a human sensation of"hot" and "cold" evolved into the field

of thermometrics, the measurement of temperature (Bond & Fox, 2001). In A.D. 180,

Galen mixed equal quantities of ice and boiling water to establish a "neutral" point for a

seven-point scale having three levels of warmth and three levels of coldness (Bond &

Fox). Then, in the 17th century, Santorio of Padua used a tube of air inverted in a

container of water so that the water level rose and fell with temperature changes. He

calibrated the scale by marking the water levels at the temperature of flame and ice

scores used in CTT. Scores that have interval properties can be analyzed appropriately

using parametric statistics, while such analyses may be inappropriate on ordinal data.

Additionally, logit measures may remove measurement bias at the extreme ends of the

measured construct, while extreme raw scores are biased by nature and may

underestimate the magnitude of a difference or change score at the extremes (Cella &

Chang, 2000). Another drawback of CTT is that tests developed in this manner are

sample dependent. This means that items may look difficult when they are administered

to examinees at the low end of the score continuum; and alternately, the same items look

easy to those examinees at the high end of the score continuum. Thus, the item statistics

are dependent upon the ability level of the subject sample and have little value when

measuring subjects of a different ability level. Similarly, the problem of test dependency

can be defined as one where the person statistics are dependent upon the difficulty of the

test. If one changes the difficulty of the items in the tests, the two scores are no longer

comparable.

There is some indication that IRT estimates of health outcomes are more

responsive to changes in health status over time. McHorney et al. (1997) found that the

sensitivity of the SF-36 physical functioning scale to differences in disease severity was

greater for a Rasch model-based scoring than it was for simple summated scoring. Fisher

(1997) states,

as it becomes increasingly clear that the accountability of educators,
psychologists, health care providers, and other professionals cannot remain tied to
scale-dependent indicators of unknown or low statistical sufficiency, the
practicality, scientific rigor, and mathematical beauty of scale-free measurement
will become more widely appreciated. (p. 93)

Hays et al. (2000) predict IRT methods will be used in health outcome measurement on a

rapidly increasing basis in the 21st century.

Two mathematical models that are appropriate for linking functional outcome

measures are (a) the one-parameter IRT model (the Rasch model), which solves for

person ability through the single parameter and item difficulty, and (b) the two-parameter

model, which solves for person ability through two parameters, item difficulty, and item

discrimination. There is fervent debate over which model should be employed for

psychometric analysis and linking instruments. The debate ranges from whether a

scientific model should be made to fit the data (two-parameter model) or the data to fit

the model (one-parameter model) (Wright, 1997). There is also the issue of whether item

discrimination should be held constant across items (Rasch model) or allowed to vary

between items (two-parameter model) (McHorney, 2002).

While several studies indicate item discrimination is not constant across

Rasch model has been shown to produce stable linking with sample sizes of 300-400

(Kolen & Brenan, 1995; Skaggs & Lissitz, 1986).

Rasch analysis can be used to link healthcare inventories that measure the same

construct. By linking inventories in this manner, one can improve the usefulness of both

measures through

* Refining the rating scale.
S Identifying the items that form a unidimensional construct.
* Verifying the expected difficulty hierarchy of the items.
* Providing for a means of converting scores between the two measures .
* Matching the ADL measures to specific descriptions provided by the scale.

The Rasch theory stipulates that a respondent's probability of answering an item

correctly is dependent only on two factors: the respondent's ability and the

* Bathing in the MDS has a separate rating scale: 0-Independent, 1-Supervision, 2-
Physical help limited to transfer only, 3-Physical help in part of bathing activity, 4-
Total dependence, 8-Activity itself did not occur during entire 7 days.
** Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually
Continent, 2-Occasionally Continent, 3-Frequently Incontinent, 4-Incontinent

exclude one's face and hands. Yet, the MDS also incorporates bathtub or shower

transfers as part of bathing, while the FIM has a separate item for tub and shower

transfers. The MDS has one item for dressing, while the FIM divides the task into

dressing, upper body and dressing, lower body. In this study, the FIM item for dressing,

FIMTM Items

Eating

lower body was matched with the MDS item for dressing since, based on the lab's

clinical judgment, dressing the lower body would be considered a more difficult task than

dressing the upper body. This more difficult aspect of the ADL is incorporated in the one

MDS item for dressing. The FIM item for toileting is matched with the MDS item for

toilet use, as they have similar definitions, although the MDS includes toilet transfer in

the task while the FIM has a separate item for transfer. The MDS item for transfers is

then matched with the FIM item for transfers: bed, chair, and wheelchair. The bowel and

bladder control items on the FIM are matched with the bowel and bladder continence

items on the MDS. The FIM item for walk/wheelchair addresses one's ability to walk or

use a wheelchair safely on a level surface, while the MDS has four items for walking to

include walk in room, walk in corridor, locomotion on unit and locomotion off unit.

Although not included in the definition of the FIM item for walk/wheelchair, "150 feet is

specified as the performance criterion in the clarification of the rating scale" (Rogers,

to the MDS item for locomotion off unit. Furthermore, the FIM incorporates safety into

the definition of many of its items, such as grooming, bathing, dressing, transfers,

toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers,

Gwinn & Holm, 2001).

The FIM and MDS have different response scales on which the physical

functioning items are scored. A clear distinction in the administration of these two

assessments is that the items of the FIM are scored at the time of the assessment, while

ratings on the MDS are based on observed performance over a 7-day period.

Furthermore, the FIM items have seven response levels, while the MDS has a range of

five. The FIM scoring criteria are shown in Table 3-2. The MDS scoring criteria are

shown in Table 3-3.

While the FIM motor items assess the percent of effort that is provided to the

patient to accomplish a task, the MDS measures the number of times during a 7-day time

period a patient required a certain level of assistance to perform a task.

Table 3-2. FIM scoring criteria

7 Complete independence

6 Modified independence

Supervision
(Standby prompting)

All of the tasks described as making up the activity are
typically performed safely, without modification,
assistive devices, or aid and within a reasonable
amount of time.

One or more of the following may be true: the activity
requires an assistive device; the activity takes more
than reasonable time, or there are safety (risk)
considerations.

Supervision or Setup-Subject requires no more help
than standby, cuing or coaxing, without physical
contact, or, helper sets up needed items or applies
orthoses.

Minimal assist Subject requires no more help than touching, and
(Minimal Prompting) expends 75% or more of the effort.

Moderate Assistance Subject requires more help than touching, or expends
(Moderate Prompting) half (50%) or more (up to 75%) of the effort.

Maximal assistance Subject expends less than 50% of the effort, but at
(Maximalprompting) least 25%.

1 Total Assistance

Subject expends less than 25% of the effort.

(Evans, 2002)

Table 3-3. MDS scoring criteria
0 Independence No help or staff oversight OR Staff help/oversight
provided only one or two times during the last seven
days.
1 Supervision Oversight, encouragement, or cueing provided three or
more times during last 7 days -OR- Supervision (3 or
more times) plus physical assistance provided, but
only one or two times during the last 7 days.
2 Limited Assistance Resident highly involved in activity, received physical
help in guided maneuvering of limbs or other
nonweight-bearing assistance o three or ore occasions
-OR- limited assistance (3 or more times), plus one
weight-bearing support provided, but for only one or
two times during the last 7 days.
3 Extensive Assistance While the resident performed part of activity over last
seven days, help of following type(s) was performed
three or more times:
--Weight-bearing support provided three or ore times;
--Full staff performance of activity (3 or more times)
during part (but not all) of last 7 days.
4 Total Dependence Full staff performance of the activity during the entire
7-day period. There is complete nonparticipation by
the resident in all aspects of the ADL definition task.
If staff performed the activity for the resident during
the entire observation period, but the resident
performed part of the activity himself/herself, it would
not be coded as a "4" (Total Dependence).
(CMS, 2003).

Procedures Involved in the Creation of the FIM/MDS Conversion Table

For the purposes of creating a FIM-MDS conversion table, Velozo (2004)

obtained linked FIM and MDS scores from the records of 254 subjects. The linking of

instruments using IRT methodologies is generally dependent on item calibrations, which

are the "difficulty" measures of the items. In essence, item calibrations serve as the

markings on the conversion ruler. Rasch analysis of the FIM and MDS converts a

patient's responses on the instrument items to a measure of ADL/motoric function.

37

Prior to performing the Rasch analysis, several steps were taken so that the FIM

and MDS rating scales were conceptually consistent. One inconsistency between the

FIM and MDS is that the MDS includes a rating for "activity did not occur." Using a

procedure adapted by Jette, Haley, and Ni (2003), this MDS rating was recorded as part of

the "total dependence" rating. The rationale underlying this decision was that a likely

explanation for an activity not occurring was that the activity could not be performed

DeJong, G. (2001). Open letter from ACRM to HCFA on proposed medicare PPS. A
letter prepared for the President of the American Congress of Rehabilitation
Medicine under the auspices of ACRM's Research Policy and Legislation
Committee and the Committee's PPS Workgroup. Archives ofPhysical Medicine
and Rehabilitation, 82, 567-569.

U.S. Department of Health and Human Services. (2002). A profile of older Americans.
Retrieved July 30, 2003, from
http://www.aoa.gov/prof/Statistics/profile/highlights.asp

U.S. Food and Drug Administration. (1997). Guidance for industry: In vivo
bioequivalence studies based in population and individual bioequivalence
approaches. Rockville, MD: U.S. Department of Health and Human Services.

U.S. Food and Drug Administration. (1999). Draft guidance on average, population, and
individual approaches to establishing bioequivalence. Rockville, MD: U.S.
Department of Health and Human Services.

Katherine L. Byers, MHS, CRC, CVE, is a doctoral candidate in the rehabilitation

science doctoral (RSD) program at the University of Florida, College of Public Health

and Health Professions. Ms. Byers received a Bachelor of Arts degree in behavioral

sciences from Rice University in Houston, Texas in 1989. She then completed a Master

of Health Science (MHS) degree in rehabilitation counseling at the University of Florida

in 1991 and subsequently obtained certifications as both a rehabilitation counselor and as

a vocational evaluator. Over a period of 9 years, Ms. Byers worked in positions of

increasing responsibility in the field of rehabilitation before entering the University of

Florida's rehabilitation science doctoral program in January of 2000. While completing

the requirements of the degree, Ms. Byers was employed as a research assistant and then

as a program coordinator for Dr. Craig Velozo, an assistant professor in the Department

of Occupational Therapy at the University of Florida. Accomplishments during Ms.

Byers' doctoral career include winning the 2002 John Muthard Research Award from the

University of Florida's College of Health Professions, Department of Rehabilitation

Counseling. She also was selected to make a poster presentation at the Third National

Rehabilitation Research and Development Meeting in Washington, DC, in 2002, and at

the 2004 ACRM-ASNR Joint Conference.

PAGE 1

TESTING THE ACCURACY OF LINKING HEALTHCARE DATA ACROSS THE CONTINUUM OF CARE By KATHERINE L. BYERS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004

PAGE 2

ii ACKNOWLEDGMENTS I would like to thank the VA Office of Academic Affairs, Washington, DC, PreDoctoral Associated Health Rehabilitation Research Fellowship Program, and the VA HSR&D/RR&D Rehabilitation Outcomes Res earch Center (RORC) of Excellence, Gainesville, Florida, for funding this study. Within the VA, I would like to especially thank Dr. Maude Rittman for acting as my fellowships program director and for her continual support and encouragement throughout the process. Additional thanks go to Dr. Christa Hojlo, Chief of Nursing Home Ca re, for her assistance in obtaining MDS data and to Mr. Clifford Marshall, Rehabilitation Planning Specialist, for his assistance in obtaining FIM data. I extend my gratitude to my doctoral advisor and VA Fellowship Preceptor, Dr. Craig Velozo, who also acted as my mentor and the chairperson of my supervisory committee. His guidance and support throughout this process have been invaluable. Furthermore, I am appreciative of the support provided by the other members of my committee, including the cochair, Dr. Ronald Spitznagel, Dr. Elizabeth Swett and Dr. Anne Seraphine. Special thanks go to Dr. Richard Smith, an expert in Rasch analysis, who analyzed the original data. Other student members of Dr. Velozos research team have been invaluable in the completion of this dissertation, and I owe them a debt of gratitude. This is especially true of Ms. Inga Wang, who has worked closely on this project. And finally, my family has been a source of continual support throughout this process, and I would like to thank them for their tireless encouragement.

vii ABSTRACT Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TESTING THE ACCURACY OF LINKING HEALTHCARE DATA ACROSS THE CONTINUUM OF CARE By Katherine L. Byers August 2004 Chair: Craig A. Velozo Major Department: Rehabilitation Science The purpose of this project was to test the accuracy of a conversion table designed to transform a score on the physical ability component of the Functional Independence MeasureTM (FIM) to its corresponding score on the Minimum Data Set (MDS) and vice versa. The records of 2,297 VA patients with scores on both the FIM and MDS, which were completed within 7 days of one another between July 2002 and June 2003, were obtained from the VA's Austin Automation Center (AAC). The FIM-MDS conversion table, generated from an independent sample using Rasch measurement techniques, was then used to transform actual scores on the FIM and MDS to their corresponding converted scores. The equivalence of the variances of the two score distributions was determined by examining their means and variances. It was hypothesized that 75% of the actual and converted scores on the FIM and the MDS would be within five points of one another. Effect size was determined, as was the percent of subjects having actual and converted FIM and MDS scores within five points of one another.

PAGE 8

viii Twenty-four percent of the FIM and 37% of the MDS actual and converted scores were within five points of one another, respectively, and therefore fell short of the standard set at 75% for the conversion to be considered accurate. Yet, the effect size for the conversion of both FIM and MDS scores was .2, demonstrating an 85.3% overlap between the two score distributions. The correlation between the FIM actual and converted scores was .724 while the correlation between the MDS scores was .745. While the development of a FIM-MDS translation table appears promising, the results of this study do not provide strong enough evidence to support the premise that this first attempt at creating a FIM-MDS conversion table has resulted in an instrument that would provide an accurate means of converting scores within a clinical setting.

PAGE 9

1 CHAPTER 1 INTRODUCTION Over the past few decades, the total number of people receiving rehabilitation services in the United States has grown while the provision of such services has extended into settings outside of acute rehabilitation settings. This growth in demand has been fueled in part by changes in the population demographics, as the number of individuals over the age of 65 has increased, and this trend is expected to continue for many years into the future (Cornman & Kingson, 1996; Gutheil, 1996). According to the U.S. Department of Health and Human Services (2002), 35.6 million people or 12.3% of the U.S. population was over the age of 65 in 2002. Since 1990, this percentage had more than tripled (4.1% in 1900 to 12.3% in 2002). By the year 2030, the older population will more than double again to about 70 million people or 20% of the total population. Similarly, the proportion of U.S. veterans over age 65 is projected to increase from 26% in 1990 to 46% in 2020 (Veterans Administration, 2001). A greater number of elderly people in our society are associated with an increased demand for rehabilitation services. Dillingham, Pezzin, and MacKenzie (2003) reported that aging is accompanied by an increased risk of diminished health status and a greater likelihood of requiring rehabilitation services. As noted in the Centers for Disease Control (CDC) Health, United States, 2001 (2002), the prevalence of both chronic conditions and activity limitations increases with age, with health-related limitation in mobility or self-care increasing fourfold between the ages 65 to 74, and 85 or older. In 1997, more than half of the older population (54.5%) reported having at least one

PAGE 10

2 disability, with more than a third (37.7%) reporting at least one severe disability (U.S. Department of Health and Human Services, 2002). There has also been an increase in the number of patients being admitted to postacute care (PAC) settings after discharge from acute care hospitals (Iwanenko, Fiedler, & Granger, 1999; Johnson, Kramer, Lin, Kowalsky, & Steiner, 2000). Iwanenko et al. note that between 1991 and 1997, the number of patients admitted annually to PAC settings rose from 12,468 to 49,844. Stineman (2001) noted that the most significant challenge to current medicine was likely the care of people with chronic, incurable diseases and injuries. The U.S. healthcare system provides a variety of routes to recovery from physical injuries, ailments, or impediments. PAC settings, also called subacute care or transitional care settings, are a type of short-term care program provided by many long-term care facilities and hospitals. Treatment in such settings may include rehabilitation services, specialized care for certain conditions (such as stroke and diabetes), and postsurgical care and other services associated with the transition between the hospital and home. Residents on these units often have been hospitalized recently and typically have complicated medical needs. The goal of subacute care is to discharge residents to their homes or to a lower level of care. Current PAC settings include comprehensive inpatient rehabilitation units attached to acute care or freestanding hospitals, skilled nursing facilities (SNFs), outpatient rehabilitation facilities, freestanding outpatient clinics, and home health care services. Each person with a potentially disabling impairment has a unique care trajectory that may include sequential admission to more than one PAC setting. An example of the possible variations is that a person diagnosed with stroke, after being discharged from an acute care hospital, is then admitted to an inpatient rehabilitation program before being released to home with outpatient services. Yet,

PAGE 11

3 another individual with the same diagnosis is discharged from the acute hospital setting directly into a skilled nursing facility. SNFs were established under the 1965 Medicare legislation and are certified by Medicare to provide 24-hour nursing care and rehabilitation services in addition to other medical services. SNF-based rehabilitation units have become a rapidly growing segment of the rehabilitation continuum over the past decade, as policy makers have searched for less costly delivery systems for rehabilitation. While inpatient treatment provides a full complement of professionals practicing in a hospital setting, it is one of the most costly of the rehabilitation services (Keith, Wilson, & Gutierrez, 1995). SNFs, on the other hand, have lower costs, mainly because construction, regulatory and staff requirements are less stringent than they are in hospitals (Keith et al.). As a result, SNFbased rehabilitation has been used increasingly as a substitute for traditional inpatient care (Keith et al.). Additionally, many older patients do not meet the Medicare requirements to receive inpatient rehabilitation services, which includes being able to tolerate three hours of therapy on a daily basis, fitting within one of the required diagnostic mixes, as well as being able to make significant progress over a fairly short length of time (Keith et al.). In such cases, SNF settings have become an appropriate alternative. In rehabilitation programs, a patients functional enhancement is the primary goal (DeJong, 2001). The ability to evaluate a patients status is central to rehabilitation efforts, for example, to track a patients recovery, to determine the effectiveness of treatment, or to estimate resource use (Penta, 2004). It is well documented that ones ability to function physically is an important component of a patients self-report of health status (Haley, McHorney, & Ware, 1994; Hart, 2000; McHorney, Haley, & Ware, 1997; Raczek et al., 1998; Segal, Heinemann, Schall, & Wright, 1997). Since the 1950s,

PAGE 12

4 1FIMTM is a trademark of the Uniform Data System for Medical Rehabilitation, a division of U. B. Foundation Activities, Inc. functional status measures have served as a means to monitor outcomes within medical centers. Yet to this day, there is no clear and commonly accepted definition of function or a clear delineation between instruments that assess functional outcomes and those that evaluate other health concepts. As a result, the ability to compare one instrument measuring functional status to another can be fraught with complications. Currently in the U.S., two distinct instruments are used to monitor functional outcomes in in-patient rehabilitation settings and SNFs. Traditional rehabilitation facilities have almost uniformly adopted the Functional Independence Measure (FIMTM)1as a means of monitoring patients functional ability. The FIM instrument provides a measure of disability and was put into operation beginning in 1989 (Granger, 1998). Today, it is one of the most widely used instruments that assess the quality of daily living activities in persons with disabilities (Granger, Hamilton, & Sherwin, 1986). The Veterans Health Administration (VHA) rehabilitation services include the FIM in its Functional Status and Outcomes Database (FSOD), which has been operational since 1997 (Veterans Health Administration [VHA], 2000). It is mandated for use by the VHA Directive 2000-016, Medical Rehabilitation Outcomes for Stroke, Traumatic Brian Injury, and Lower Extremity Amputation Patients, which requires every VHA medical center to assess functional status and enter this data into the FSOD in order to measure and track rehabilitation outcomes on all new stroke, lower extremity amputee, and traumatic brain injury (TBI) patients (VHA, 2000). Presently, the FSOD has not been linked with other data sources that would allow patients to be monitored as they progress across the continuum of care (e.g., from rehabilitation facilities to skilled nursing facilities or from rehabilitation facilities to home health care).

PAGE 13

5 While the FIM is the gold standard for measuring functional outcomes in rehabilitation settings, the Minimum Data Set (MDS) of the nursing home Resident Assessment Instrument (RAI, 1991), is used universally for monitoring rehabilitation outcomes in SNFs. The MDS was developed in response to a 1986 Institute of Medicine study of the quality of care in nursing homes that called for improvements in nursing home quality and more patient-centered care (Morris et al., 1990). The federal Omnibus Budget Reconciliation Act of 1987 (OBRA 87) mandated all U.S. nursing homes to implement the Resident Assessment Instrument, whose core is the Minimal Data Set (MDS) (Rantz et al., 1999). The MDS consists of 284 items designed to assess the cognitive, behavioral, functional, and medical status of nursing home residents (Hawes et al., 1995; Teresi & Homes, 1992). Nursing homes are a critical environment for tracking the health care status of elderly veterans. In fiscal year 2001, there were a total of 89,056 veterans treated in nursing homes with an average daily census of 33,670 (Catalogue of Federal Domestic Assistance, 2002). By 2003, it is projected that 111,953 patients will be treated in nursing homes with an average daily census of 35,132 (Catalogue of Federal Domestic Assistance). In 1995, there are at least 1.5 million nursing home residents who reside in facilities participating in the Medicare or Medicaid programs (Hawes et al., 1995). Within the VHA, the reduction of acute rehabilitation beds from 1,150 five years ago to 617 in 2003 further increases the likelihood that veterans could receive their post-acute rehabilitation care in nursing homes (C. Johnson, personal communication, September 29, 2002). A key to improving services for patients treated in PAC settings is to develop effective and efficient methods for tracking and evaluating functional status changes across rehabilitation and skilled nursing facilities. Through the use of a single instrument

PAGE 14

6 in these settings, a patient may progress from one to the other, while maintaining a functional assessment score that could easily be tracked and compared between settings. Such a tool would benefit patients, as it would facilitate an increased continuity of care between settings. It would also allow for the direct comparison of rehabilitation outcomes between settings, along with resource utilization and costs. Lathem and Haley (2003) note that a clear need exists for an instrument that can accurately assess patients functional ability as they move through the health care system. As stated in Buchanan, Andres, Haley, Paddock, and Zaslavsky (2003) Providers, payers, and consumers would all benefit from comparable measures of functional status and rehabilitation outcomes across multiple care settings to facilitate equitable payment and to monitor the quality and efficiency of care delivery (p. 45). To date, there has been only one published attempt by Williams, Lee, Fries, and Warren (1997) to link the FIM to the MDS. Yet, other studies have linked othe r measures of global functioning (e.g., Fisher, Eubanks, & Marier, 1997; Fisher, Harvey, & Kilgore, 1995; Fisher, Harvey, Taylor, Kilgore, & Kelly, 1995; Segal, Heinemann, Schall & Wright, 1997; Smith & Taylor, 2004; Tennant & Young, 1997). The dilemma of having multiple yet incompatible instruments measuring the same construct has been confronted and successfully overcome in the physical sciences. Take, for example, the manner in which we measure distance. Currently, in the United States, we have two competing systems of measuring length, namely the metric system and the standard system of measurement. Surprisingly, success has not come through attempts to convert entirely from one system to the other despite the obvious benefits in doing so. Instead, we continue to utilize simple strategies that allow us to convert a measure on one scale to its corresponding measure on the other. Similarly, we routinely convert readings between Celsius and Fahrenheit with a simple conversion table when

PAGE 15

7 measuring temperature. Thus, one could say that a precedent has been set for the manner in which we have successfully reconciled competing systems of quantifying what are essentially abstract concepts. An analogous attempt in health care would be to develop a system of converting scores between the physical functioning components of the FIM and the MDS so that a score on one instrument could be translated into its equivalent score on the other. The hypothesis is that the items included in these two instruments are subsets of items along an ADL/motor construct. Table 1-1 presents a comparison of the ADL/motor items of the FIM and the MDS. Table 1-1. Comparison of FIM to MDS ADL/motor items FIM ItemsMDS Items Eating Grooming Bathing Dressing-Upper Body Dressing-Lower Body Toileting Bladder Management Bowel Management Bed, Chair, Wheelchair (Transfer) Toilet (Transfer) Tub, Shower (Transfer) Walk/wheel Chair Stairs Eating Bed Mobility Personal Hygiene Bathing Dressing Toilet Use Bladder Continence* Bowel Continence* Transfer Walk in Room Walk in Corridor Locomotion on Unit Locomotion off Unit FIM Rating Scale 7 Complete Independence (Timely, Safely) 6 Modified Independence (Device) 5 Supervision 4 Minimal Assist (Subject = 75%+) 3 Moderate Assist (Subject = 50%+) 2 Maximal Assist (Subject = 25%+) 1 Total Assist (Subject = 0%+) MDS-Rating Scale (exceptions noted below) 0 Independent 1 Supervision 2 Limited Assistance 3 Extensive Assistance 4 Total Dependence 8 Activity did not occur during the entire 7-day period *Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually Continent, 2-Occasionally Continent, 3Frequently Incontinent, 4-Incontinent

PAGE 16

8 Similarities are immediately evident between the two instruments. Both instruments include items for eating, dressing, toileting, bowel and bladder functioning, as well as the ability to transfer and to walk. Differences include an item for climbing stairs on the FIM that is not part of the MDS. A mathematical framework is needed in order to convert a score on one instrument to its corresponding score on the other. Item Response Theory (IRT) measurement models have been rapidly gaining popularity over classical test theory (CTT) for analyzing instruments used in healthcare and rehabilitation (Douglas, 1999; Hambleton, 2000; Hays, Morales, & Reise, 2000; Linacre, Heineman, Wright, Granger, & Hamilton, 1994; McHorney, 1997; Prieto, Alonso, Lamar ca, & Wright, 1998; Silverstein, Fisher, Kilgore, Harley, & Harvey, 1992; Velozo, Magalhaes, Pan, & Leiter, 1995). IRT is comprised of a set of generalized linear models and their associated statistical procedures that connect a subjects response to test items to that subjects location on the latent trait being tested (Mellenbergh, 1994). In order to create a link between scores on the FIM and MDS, it is hypothesized that Rasch analysis can be used to place the items from both instruments on the same linear continuum (Fisher, Harvey, Taylor, et al., 1995). A precedent for linking instruments in such a manner has been established in the fields of education and psychological measurement. It is the purpose of this dissertation to evaluate the accuracy of a FIM-MDS conversion table that has been created through the use of Rasch analysis. The technical procedures of test equating used in the educational applications of Raschs probabilistic models were transferred to the cocalibration of these two functional assessment instruments. An accurate conversion table between the FIM and the MDS would allow

PAGE 17

9 for studies to take place that are necessary to examine the outcomes for persons receiving rehabilitation services in different care settings. This would, in effect, eliminate the need to institute massive changes in measurement procedures across rehabilitation settings. Functional status information could then be used to track changes and follow a patients progress across PAC settings and not only monitor but compare quality of care and rehabilitation outcomes in different settings (National Committee on Vital and Health Statistics, 2003).

PAGE 18

10 CHAPTER 2 REVIEW OF THE LITERATURE Measuring Outcomes in Rehabilitation The effectiveness of rehabilitation services is gauged by the restoration and maximization of patient functioning. Functional status in this context has been defined as reflecting, an individuals ability to carry out activities of daily living (ADLs) and to participate in various life situations and in society (Jette, Haley, & Ni, 2003, p. 1). Therefore, the assessment of functional status is a method for describing abilities and activities in order to measure an individuals use of the variety of skills included in performing the tasks necessary to daily living, vocational pursuits, social interactions, leisure activities, and other required behaviors (Granger, 1998). ADL measures have been used to determine a patients level of disability, whether one qualifies for certain types of healthcare services, and to document outcomes of rehabilitation services. The focus of the earliest standardized assessments of function, developed over 50 years ago, were on the basic ADLs, which consist of self-care activities, such as bathing, grooming, dressing, and walking. Two of the first functional status measures used in rehabilitation were the Katz and Barthel indexes whose items were comprised solely of basic ADL tasks (Cohen & Marino, 2000; Latham & Haley, 2003). Then, with changing societal expectations, the advent of brain injury rehabilitation, and the independent living movements, medical outcome research began to explore means of documenting social and cognitive-based behaviors as part of rehabilitation outcomes

PAGE 19

11 (Latham & Haley, 2003, p. 85). The result of this was the development of instruments, such as the Functional Independence Measure (FIM) and the Minimum Data Set (MDS). The FIM Instrument Used in Inpatient Rehabilitation In U.S. inpatient rehabilitation settings, the Uniform Data System for Medical Rehabilitation (UDSMR), is the most widely used clinical database for assessing rehabilitation outcomes (Fiedler & Granger, 1997; Granger & Hamilton, 1993). The FIM is the core functional status measure of the UDSMR and was developed to establish a uniform standard for the assessment of functional status during medical rehabilitation (Granger, Hamilton, Keith, Zielezny & Sherwin, 1986). The FIM incorporates concepts and items from previous functional assessment instruments, such as the Katz Index of ADL, the PULSES profile, the Kenny Self-Car e Evaluation, and the Barthel Index (Hall et al., 1993). The FIM system was developed by a national task force cosponsored by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation Task Force to Develop a National Uniform Data System for Medical Rehabilitation (UDSMR) to rate the severity of patient disability and the outcomes of medical rehabilitation (Hamilton et al., 1987). The original work of this task force was expanded by the Department of Rehabilitation Medicine at the State University of New York at Buffalo. Since 1987, it has been the mission of the UDSMR to measure medical rehabilitation outcomes across the continuum of careboth time and settings (Granger, 1999). The UDSMR maintains a national data repository for research purposes of three million case records from 1,400 facilities around the world (Granger, 1999). The FIM is administered in most inpatient rehabilitation facilities within three days of admission and prior to discharge (Granger, Hamilton & Sherwin, 1986). The scale accounts for a patients level of independence, amount of assistance needed, use of

PAGE 20

12 adaptive or assistive devices, and the percentage of a given task completed successfully. This instrument is comprised of 18 items with a seven-level response scale of independent performance in self-care, sphincter control, mobility, locomotion, communication and social cognition (Granger & Hamilton, 1993). As such, it contains items representing three constructs: ADL, mobility, and continence (Granger, Hamilton, & Sherwin). Thirteen of the 18 FIM items (related to functional ability) can be further divided into three more specific subscores rating activities of daily living (ADLs), sphincter management, and mobility (Stineman, Jette, Fiedler & Granger, 1997). A FIM score on each item ranges from 1 (Total Assistance) to 7 (Complete Independence). Thus, a total FIM score ranges from 18 to 126. Hamilton (1989) noted that, Because each item is scaled on the basis of functional independence, it is expected that the total score (with each item appropriately weighted) will correlate with the burden of care for the disabled person (p. 862). Psychometric studies of the FIM instrument support its use for research purposes. One of the strengths of the FIM is that it has undergone so many methodological evaluations, in which it has demonstrated good psychometrics (Dodds, Martin, Stolov & Deyo, 1993). Dodds et al. noted a high internal consistency (Cronbachs coefficient of .93 at admission and .95 at discharge) that demonstrated that the FIM is a reliable instrument. Extensive investigations of the FIMs reliability and validity have provided evidence of its interrater and test-retest reliability (Ottenbacher, Hsu, Granger & Fiedler, 1996), internal consistency (Stineman et al., 1996; Stineman et al., 1997), concurrent validity (Granger, Cotter, Hamilton & Fiedler, 1993; Oczkowski & Barreco, 1993) and predictive validity (Heinemann, Linacre, Wright, Hamilton & Granger, 1994; Oczkowski & Barreco). Ottenbacher et al. (1996) performed a meta analysis of 11 studies that revealed the median inter-rater reliability for the total FIM was .95 and the test-retest and

PAGE 21

13 equivalence reliability was .95 and .92, respectively. Stineman et al. (1996) showed in a study of 93,829 rehabilitation inpatients that a factor analysis of the FIM instrument supported the identification of ADL/motor and cognitive/communication dimensions across 20 impairment categories. Additionally, the stability of the FIM motor scores was demonstrated in several studies (Linacre et al., 1994; Wright, Linacre & Heineman, 1993). Linacre (1998) confirmed the multidimensional structure of the FIM by means of Rasch analysis followed by factor analysis of standardized residuals and demonstrated the divergence of the five cognitively-oriented items from the 13 motor-oriented items. The MDS Instrument in Skilled Nursing Facilities After receiving inpatient rehabilitation services, patients may be transferred into a SNF, where the FIM instrument is no longer used as a measure of functional ability. Instead, the Minimum Data Set (MDS) is utilized in this role. Since 1990, HCFA (Health Care Finance Administration, now known as the Center for Medicare and Medicaid Services [CMS]) has required that the MDS be administered to all SNF residents. The MDS is a comprehensive assessment of over 85 health status elements organized into measurement categories. An MDS score on each item ranges from 0 (Independent) to 4 (Total Dependence). The definitions of these ratings are provided on the rating form. Full MDS assessments are required at least annually and when there is a significant change in a patients condition (such as a stroke, resulting in hemiparesis). A minimum of approximately 1.2 million residents are assessed annually with the full MDS, with 3.6 million briefer updates and an unknown additional number of complete MDS reassessments because of major changes in residents before an annual MDS is due (Casten, Lawton, Parmelee & Cleban, 1998). While the MDS shares ADL content with the FIM, it is targeted exclusively to SNFs, making comparisons across rehabilitation settings difficult (Latham & Haley, 2003).

PAGE 22

14 Although not as extensively studied as the FIM, research on the MDS suggests that it too has adequate reliability and validity for use in research studies. Early research studies by Hawes et al. (1995) showed that MDS items had interclass correlations of .7 or higher in key areas of functional status, such as ADL and cognition. Sixty-three percent of the items achieved reliability coefficients of .6 or higher and 89% achieved .4 or higher (Hawes et al., 1995). Morris et al. (1994) showed that the seven cognitive items (short-term memory, long-term memory, decision making, and four categories of memory recall) showed an internal reliability of .83-.88. While it has been suggested that the above psychometrics findings on the MDS are inflated due to being administered by research staff (Stineman & Maislin, 2000) Gruber-Baldini, Zimmerman, Mortimore & Magaziner (2000) showed that when the MDS is administered by clinical staff, cognitive items (MDS-COGS) and cognitive performance scale items (MDS-CPS) correlate moderately well with the Mini Mental Status Exam (MMSE) (R=-0.65 and -0.68, respectively) and the Psychogeriatric Dependency Rating Scale (PGRDS) Orientation scale (R=0.63 and 0.66, respectively). The internal reliability of the MDS-C OGS was .85 and the MDS-CPS (without the comatose items) was 0.80. Confirmatory factor analysis studies, derived from clinical and administrative databases of the MDS, confirmed all MDS domain clusters except social quality (Casten et al., 1998). Administrative Solution: Adopting a Single Instrument for Measuring Outcomes in Post Acute Care The existence of competing instruments in PAC settings (i.e., the FIM in rehabilitation facilities and the MDS in SNFs) has led to considerable debate over which of these instruments should be the basis for a post-acute care Prospective Payment

PAGE 23

15 System (PPS) in the private sector (Cente rs for Medicare and Medicaid Services [CMS], 2001; DeJong, 2001). The demands for a PPS for rehabilitation facilities prompted the HCFA in 1998 to develop the Minimum Data Set for Post-Acute Care (MDS-PAC), an instrument similar in design to the MDS, intended to address the needs of subacute facilities, rehabilitation facilities, and long-term care hospital patients (CMS, 2002). One of the intended purposes of the MDS-PAC was to provide CMS with a tool by which it could monitor the quality of health care services across post-acute settings (Granger, Hamilton, Keith, et al., 1986). Originally, items similar to those found in the MDS for rehabilitation were intended to be used in this new instrument. Yet, upon its completion, it consisted of more than 400 items, and it lacked one-to-one correspondence with the FIM (DeJong, 2001, p. 567). Martin Grabois, MD, President of the American Congress of Rehabilitation Medicine (ACRM) stipulated in a letter in 2001 that the ACRM strongly opposed implementation of the MDS-PAC and felt it was premature to use it as a quality-monitoring tool in rehabilitation (DeJong, 2001). This challenge resulted in a change in CMSs decision from using the MDS-PAC to using the Inpatient Rehabilitation Facility Patient Assessment Instrument (IRF-PAI) as the basis for the post-acute PPS. The IRF-PAI includes the FIM ADL/motor and cognition/communication items. This IRF-PAI was mandated for implementation in rehabilitation facilities on January 1, 2002 (CMS, 2001). While the final decision to use the IRF-PAI has tremendous economical benefits (e.g., rehabilitation facilities not having to convert from the FIM to an MDS-based instrument), it does not facilitate monitoring functional outcomes as patients cross from rehabilitation to skilled nursing facilities. That is, a FIMbased outcome instrument will continue to be used in rehabilitation facilities, while a MDS-based instrument will be

PAGE 24

16 used in nursing home facilities. A drawback to having different functional outcome measures across these two health-care settings is test dependency. Data gathered on one instrument cannot be compared to similar data gathered on an alternate instrument. Wilkerson and Johnston (1997) noted that the absence of a single, standardized instrument used in both rehabilitation facilities and SNFs was a fundamental barrier for U.S. policymakers. Without such an instrument, these policymakers were hampered in their ability to fulfill the emerging health policy mandate to monitor the quality and outcomes of services for patients in PAC settings. As patients are transferred across these settings, researchers, managers, and clinicians were unable to easily and accurately track functional status changes. The ability to measure a patients functional status is not only important within a particular rehabilitation setting, but also along the continuum of care. It is difficult to accurately and precisely compare and contrast the magnitude of gains made through rehabilitation efforts in various therapeutic PAC settings when such improvements are evaluated on incongruous instruments based on divergent scales of measurement. As Granger (1998) contends, there has been unf ortunately too little effort in addressing assessment and management of persons with disablement across the continuum of care. As a result, the outcomes of therapeutic interventions cannot easily and accurately be compared to determine their relative effectiveness. This measurement of rehabilitation outcomes is an integral component of a major controversy in rehabilitation over the best PAC setting in which to provide care. Optimizing a patients recovery while at the same time minimizing costs are the two variables most often considered in this debate. Inpatient rehabilitation units typically provide the highest level of services, although often at the greatest costs. On the other

PAGE 25

17 hand, skilled nursing facilities generally do not provide the same extent and intensity of therapeutic interventions, although at a reduced cost. The ability to directly compare rehabilitation outcomes between, for example, an inpatient rehabilitation setting and a skilled nursing facility would enhance our understanding of where patients benefit most and from what interventions. In January of 2000, Sally Kaplan, Ph.D. of the Medicare Payment Advisory Commission (MedPac) told the Subcommittee on Populations, We strongly believe that it would be extremely useful, to say the least, to have standardization of functional status measures at least in post-acute care so that if similar patients are treated in different post-acute settings, or if patients are treated in successive post-acute care settings, that we would have a means of measuring them. . It would expand the utility of regularly collected information. (National Committee on Vital and Health Statistics, 2003). Potential Measurement Solution: Linking Instruments There is more than one way to coordinate rehabilitation services along the continuum of care. For example, one might institute the use of the same instrument to measure functional ability in all rehabilitation settings so that the results would be directly comparable. In this manner, the patients progress can be easily tracked across settings and services. Unfortunately, attempts to use the FIM across settings have met with limited success as there are sizeable obstacles to implementing such a plan. Already, there has been a huge monetary investment in our current attempts to measure rehabilitation outcomes. To figuratively, throw out these current instruments and replace them with one new and improved universal measure would likely be costprohibitive (Cohen & Marino, 2000). There would also likely be significant resistance to implementing such a plan. In our capitalistic economy, private fortunes are often tied up in maintaining the status quo and significant resistance would be likely. Furthermore,

PAGE 26

18 significant costs would be incurred in training staff in a new system, as well as in the implementation of a new database. The nature of the predicament now facing rehabilitation services of having multiple, yet incompatible instruments, attempting to quantify a single construct is not a new one. It has been successfully overcome in others areas, for example, within the basic sciences. Major scientific advances have been possible in part because the instruments used to measure a construct were standardized and, in some cases, linked to other similar instruments. An example of this is the historical attempts to measure the construct of temperature. What began as a human sensation of hot and cold evolved into the field of thermometrics, the measurement of temperature (Bond & Fox, 2001). In A.D. 180, Galen mixed equal quantities of ice and boiling water to establish a neutral point for a seven-point scale having three levels of warmth and three levels of coldness (Bond & Fox). Then, in the 17th century, Santorio of Padua used a tube of air inverted in a container of water so that the water level rose and fell with temperature changes. He calibrated the scale by marking the water levels at the temperature of flame and ice (Bond & Fox). Our current mode of measuring temperature uses mercury in a closed tube. Even with a single mechanism by which we measure temperature, we use two competing scales, namely, Celsius and Fahrenheit. Both of these scales merely set two known temperature points (ice and boiling points) and simply divided the scale into equal units. These two independently developed scales have been linked, so a score on one scale can easily be converted to a score on the other scale. A problem in developing measures in the human sciences is that, we are clearly dealing with abstractions (e.g., perceived social support, cognitive ability, and selfesteem), so we need to construct measures of abstractions, using equal units, so that we

PAGE 27

19 can make inferences about constructs rather than remain at the level of describing our data (Bond & Fox, 2001, p. 4). Yet, it appears hopeless to construct models of human behavior since behavior seems to be so unpredictable. What we can do is estimate the probability of a behavior taking place. We need to build a model, more like models in modern physicsmodels which are indeterministic, where chance plays a decisive role (Rasch, 1960, p.11). What is then being described is the possibility of a behavior occurring, that is the relative frequency of an event occurring. We can say that the probability of something occurring is equal, for example, to 50 percent. An alternative solution to this problem would be to use the same scale of measurement as the basis of each instrument. Linking can be referred to as the development, of a common metric in IRT by transforming a set of item parameter estimates from one metric onto another, base metric (Hart & Wright, 2002, p. 2). McHorney, 1997 pointed out, The development of a shared language that goes beyond specific items to location on an ability scale would provide users tremendous flexibility in building and maintaining an outcomes capacity within and across different databases, (p. 749). The use of a shared language across rehabilitation settings would allow all services along the continuum of care to be interrelated and coordinated. In this manner, the outcomes, as well as the efficient utilization of resources could be maximized. Doran and Holland (2000) write, The comparability of measurement made in differing circumstances by different met hods and investigators is a fundamental precondition for all of science (p. 281). The development of the same scale of measurement may be achieved through the utilization of Rasch analysis, a one-parameter Item Response Theory (IRT). Georg Rasch, a Danish mathematician who examined psychological measurement problems in the 1950s and 1960s, surmised that the

PAGE 28

20 relationship between a persons ability and an items difficulty can be modeled as a probabilistic function. As a persons ability level increases, the probability of passing an item also increases (Fox & Jones, 1998). The Rasch model specifies exactly how to convert observed counts into linear (and ratio) measures (Wright & Linacre, 1989). IRT, then, is both a theoretical framework and a collection of quantitative techniques used to construct tests, scale responses, and equate scores. It consists of models, each designed to describe a functional relationship between an examinees ability and the characteristics (or parameters) of the items on a test (McHorney & Cohen, 2000). Performing a Rasch analysis on the FIM would address a problem that exists with the interpretation of the FIM scores. It has been noted that a change of 10 raw score points at the extremes of the FIM range is equivalent to four times as much change on the linear scale as a change of 10 raw score points at the center of the FIM range (Linacre et al., 1994). Thus, the improvement made by a person with moderate deficits, placing the patient in the center of the scale, will appear to be much greater than a person who is closer to the end of the scale even when the actually improvement might otherwise be seen as equivalent. Linacre et al. (1994) cite this as one example of why a conversion from raw scores to linear measures is so essential to quantifying changes in a patients status more appropriately. Performing a Ra sch analysis on the FIM would be one way to accomplish this. The Use of Linking Techniques Educators in the United States have been involved in the process of equating and linking instruments through a variety of statistical procedures for more than 40 years. Kolen (2001) notes that the first pages of the first issue of the Journal of Educational Measurement published in 1964 were on the subject of linking.

PAGE 29

21 Linking is a scaling method used to achieve comparability of scores, to the extent possible, between tests of different framewor ks and test specifications (Muraki, Hombo & Lee, 2000). Linking is distinguished fr om test equating, which involves making statistical adjustments to scores from alternative forms of an instrument to account for small differences in the difficulty of the test items on each form (Kolen, 2001). The term, test equating, is traditionally used to refer to the case of linking when two or more forms of a test have been constructed according to the exact specifications, such as equal difficulty, reliability, and validity and constructed for the same purpose (Muraki et al., 2000). The most basic of the equating methods is linear equating, which assumes that the two tests to be equated differ only in means and standard deviations. In IRT, linking is referred to as developing a common metric by transforming a set of item parameter estimates from one metric to another, base metric (Kim & Cohen, 2002). The process of linking consists of an anchoring design and a transformation method. The anchoring design ensures that there will be a basis for comparison between the item calibrations on the two instruments (Vale, 1986). The linking transformation refers to the equation used to put the item parameters on a common scale of measurement. These processes rest on two assumptions: (a) the different instruments being linked measure the same underlying construct; and (b) the linking sample represents the population for which the test is intended (McHorney, 2002, p. 386). The cocalibration of instruments purported to measure a common construct is simply an extension of test equating, item banking, and partial credit principles that have been in use in education for decades (Choppin, 1968; Fisher, Harvey, Taylor, et al., 1995; Wright, 1984). Routine applications of Rasch measurement models in the development and equating of instruments are performed by companies such as the Psychological

PAGE 30

22 Corporation, school districts in Portland, P hoenix, Chicago, and New York, and medical school admissions and certification boards, including the National Board of Medical Examiners, the American Society of Clin ical Pathologists, the American Dental Association, the American Council of State Boar ds of Nursing (Fisher, Harvey, Taylor, et al., 1995). Linking of Measures in Healthcare Linking studies are in their infancy in health status assessment and functional health assessment in rehabilitation (McHorney, 2002, p. 389. It has only been in the last 14 years that linking has been used in health status and rehabilitation assessment (McHorney). In this setting, the use of linking techniques has been discussed (Fisher, Harvey & Kilgore, 1995; Fisher, Harvey, Ta ylor et al., 1995; McHorney;) and linking studies have been conducted (Chang & Cella 1997; Fisher, 1997; Fisher, Harvey, Taylor et al.; McHorney & Cohen, 2000). One of the first published applications of IRT to functional health assessments tested the unidimensionality and reproducibility of the 10item Physical Functioning Scale (Ware, 2003). Examples of using IRT, specifically Rasch analysis, to link health care instruments have appeared in the literature over the last six years. These include the linking the FIM and SF-36 (Segal et al., 1997), the FIM and the Barthel Index (Tennant & Young, 1997), the FIM and the Patient Evaluation Conference System (PECS), and the SF-36 a nd the LSU HIS Physical Functioning scale (Fisher, Eubanks & Marier, 1997). Additiona lly, Badia, Prieto, Roset, Diez-Perez, & Herdman (2002) attempted to develop a short Osteoporosis-Specific Quality of Life Questionnaire based on the assemblage (equating) of the items of two existing questionnaires through the Rasch mathematical model.

PAGE 31

23 McHorney (2002) states, Measurement specialists who serve rehabilitation medicine and other specialties are at the cusp of a paradigm shift away from sizable reliance on classical test methods to broader use of IRT methods (p. 390). The use of Rasch measurement is becoming the preferred method in the development of functional assessments among rehabilitation professionals for constructing tests (Ottenbacher et al., 1996). Hays et al. (2000), McHorney (2002) Ware (2003), and Cella and Chang (2000), along with many others in the social and medical sciences in the last 30 years, have described IRT as a more promising fram ework for designing tests (Hambleton, 2000). Features of IRT, such as sample independence and test independence, account for this growing popularity over classical test theo ry (CTT) (Douglas, 1999; Linacre et al., 1994; McHorney, 1997; Prieto et al., 1998; Hamblet on, 2000; Silverstein et al., 1992; Velozo et al., 1995). The measurement units of IRT have interval properties versus ordinal raw scores used in CTT. Scores that have interval properties can be analyzed appropriately using parametric statistics, while such analyses may be inappropriate on ordinal data. Additionally, logit measures may remove measurement bias at the extreme ends of the measured construct, while extreme raw scores are biased by nature and may underestimate the magnitude of a difference or change score at the extremes (Cella & Chang, 2000). Another drawback of CTT is that tests developed in this manner are sample dependent. This means that items may look difficult when they are administered to examinees at the low end of the score continuum; and alternately, the same items look easy to those examinees at the high end of the score continuum. Thus, the item statistics are dependent upon the ability level of the subject sample and have little value when measuring subjects of a different ability level. Similarly, the problem of test dependency can be defined as one where the person statistics are dependent upon the difficulty of the

PAGE 32

24 test. If one changes the difficulty of the items in the tests, the two scores are no longer comparable. There is some indication that IRT estimates of health outcomes are more responsive to changes in health status over time. McHorney et al. (1997) found that the sensitivity of the SF-36 physical functioning scale to differences in disease severity was greater for a Rasch model-based scoring than it was for simple summated scoring. Fisher (1997) states, as it becomes increasingly clear that the accountability of educators, psychologists, health care providers, and other professionals cannot remain tied to scale-dependent indicators of unknown or low statistical sufficiency, the practicality, scientific rigor, and mathematical beauty of scale-free measurement will become more widely appreciated. (p. 93) Hays et al. (2000) predict IRT methods will be used in health outcome measurement on a rapidly increasing basis in the 21st century. Two mathematical models that are appropriate for linking functional outcome measures are (a) the one-parameter IRT model (the Rasch model), which solves for person ability through the single parameter and item difficulty, and (b) the two-parameter model, which solves for person ability through two parameters, item difficulty, and item discrimination. There is fervent debate over which model should be employed for psychometric analysis and linking instruments. The debate ranges from whether a scientific model should be made to fit the data (two-parameter model) or the data to fit the model (one-parameter model) (Wright, 1997). There is also the issue of whether item discrimination should be held constant across items (Rasch model) or allowed to vary between items (two-parameter model) (McHorney, 2002). While several studies indicate item discrimination is not constant across functional status items (McHorney, 2002; McHorney & Cohen, 2000; Spector &

PAGE 33

25 Fleishman, 1998), for pragmatic reasons, namely the availability of relatively small sample sizes of patients with linked FIM and MDS data (n= 450), we are choosing the Rasch model because of its simplicity and robustness under conditions of heterogeneous item discrimination and small samples (De Gruijter, 1986; Kolen & Brenan, 1995). The Rasch model has been shown to produce stable linking with sample sizes of 300-400 (Kolen & Brenan, 1995; Skaggs & Lissitz, 1986). Rasch analysis can be used to link healthcare inventories that measure the same construct. By linking inventories in this manner, one can improve the usefulness of both measures through Refining the rating scale. Identifying the items that form a unidimensional construct. Verifying the expected difficulty hierarchy of the items. Providing for a means of converting scores between the two measures Matching the ADL measures to specific descriptions provided by the scale. The Rasch theory stipulates that a respondents probability of answering an item correctly is dependent only on two factors: the respondents ability and the characteristics of the item (Hambleton, Swaminathan & Rogers, 1991). Rasch analysis has the ability to uniquely link a persons ability to an items difficulty level (Velozo & Peterson, 2001). Thus, a score on an instrument can be directly linked to the descriptive content of the instrument (Velozo & Peterson, 2001). The examiner is able to describe precisely a persons level of ability based on the score they receive. In many other cases, a score on a test is uninterruptible in terms of a meaningful description of the level of ability it represents. You may still be able to say with a reasonable level of confidence that someone has more or less ability than someone else, but you still do not have a clear description of the precise level of ability that person possesses. Furthermore, Rasch analysis allows for the ranking of items so that all items on a scale can be put on a continuum from least challenging to most challenging.

PAGE 34

26 There has been only one published study attempting to link the FIM to the MDS. Williams et al. (1997) compared scores on FIM and rescaled MDS ADL and cognitive items [referred to as the Pseudo-FIM(E)] on 173 rehabilitation patients admitted to six nursing homes. The matching and rescaling of the MDS was accomplished through an expert panel, with the panel judging that 8 out of 13 FIM items had a corresponding MDS item. Intraclass correlation between the FIM and rescaled MDS was .81, although the mean calibration of 6 of the 8 FIM items differed statistically from the rescaled MDS items. While this initial attempt at linking the FIM and the MDS was encouraging, the methodology and statistical approach of the study had considerable limitations. For example, expert-panel rescaling of the MDS can be challenged due to the lack of adequate empirical support (i.e., a different panel of experts could develop a different FIM-MDS matching and rescaling; Velozo, Kielhofner & Lai, 1999). Fisher, Harvey, Taylor, et al. (1995) were the first to use common-sample equating to link two global measures of functional ability, the FIM and PECS (Patient Evaluation and Conference System). Using the methodology described above, they showed that the 13 FIM and 22 PECS ADL/motor items could be scaled together in a 35item instrument. The authors found that separate FIM and PECS measures for 54 rehabilitation patients correlated .91 with each other and correlated .94 with the cocalibrated values produced by Rasch analysis. Furthermore, these authors demonstrated that either instruments ratings were easily and quickly converted into the other via a table that used a common unit of measurement, which they referred to as the rehabit. This common unit of measurement allows for the translation of scores from one instrument to another. Since the results of Rasch analysis are sample-free, these tables/algorithms can be used for all future and past instrument-to-instrument score conversions.

PAGE 35

27 More recently, Fisher et al. (1997) rep licated their previous study using commonsample equating to link two self-report instruments: the 10 physical function items of the Medical Outcome Scale (MOS) SF-36 (the PF-10) and the Louisiana State University Health Status Instrument. Difficulty estimates for a subset of similar items from the two instruments correlated at .95, again indicating that the items from the two scales were working together to measure the same construct. McHorney and Cohen (2000) applied a two-parameter IRT model to 206 physical functioning items (through 71 common items across samples) and in a similar study; they linked 39 physical functioning items (through 16 common items) from three modules of the Asset and Health Dynamics Among the Oldest Old (AHEAD) study. Both studies demonstrated successful linking of item banks through sets of common items, allowing placement of all items on a common metric. Then in 2003, Jette et al. conducted a one-parameter Rasch partial credit analysis for the entire item pool of the FIM, MDS, OASIS (Outcome and Assessment Information Set for Home Health Care), and the PF-10 (the physical functioning scale of the SF-36) items to develop an overall functional ability scale. These authors noted that the MDS instrument covered content from the mid portion of the functional ability continuum with less content coverage on the low and high ends while the FIM instrument covered a relatively small portion along the middle to upper end of this continuum. The above studies represent encouraging evidence that while physical function is presently measured with many different instruments, it need not be tied to any particular instrument. These studies support the idea that there can be a universal use of common units for the measurement of functional ability. As a result, the creation of a translation table between the FIM and the MDS should be possible for the measurement of functional ability in PAC settings. Such a table would help avoid unnecessary

PAGE 36

28 duplication of efforts when patients transf er from one PAC setting to another where different instruments are used to measure functional ability. Scores can be readily determined when they come in to a new setting on the new instrument. The creation of a universal measure of functional ability in rehabilitation would create the following conditions: It allows one to accurately and precisely evaluate the effectiveness of treatment procedures. It allows one to accurately and precisely evaluate the effectiveness of the program, thus increasing the programs efficiency. Measurements of progress are used to justify reimbursement for services. (Merbitz, Morris & Grip, 1989) In an unpublished study, Velozo (2004) performed a Rasch analysis to develop a conversion table that linked scores on the FIM to analogous scores on the MDS. The Rasch partial credit model Winsteps program (Linacre & Wright, 2001) was used to calibrate item difficulties based on the linked FIM and MDS scores. The development of this conversion table was based on a sample of 254 VHA patients who had completed both the FIM and the MDS within 7 days of one another. The decision to restrict the number of days between the completion of the FIM and MDS was based on the need to minimize the impact any possible change in the patient's condition would have on the scores. The use of 7 days as the criteria was based on the research lab's clinical judgment. The physical ability items from both instruments were placed on the same linear continuum and from this, a FIM-MDS conversion table was produced. The purpose of this current study is to test the accuracy of the conversion table developed by Velozo (2004). A new sample of records from 2,297 patients with linked FIM and MDS scores collected between July 2003 to June 2004 was obtained from the VAs Austin

PAGE 37

29 Automation Center databases. Using the conversion table developed by Velozo, the new FIM scores are converted to MDS scores, and new MDS scores to FIM scores. The converted-MDS (MDSc) scores are then statistically compared to the actual MDS scores and the converted-FIM (FIMc) scores to the actual FIM scores. From these comparisons, a determination of the accuracy of the FIM-MDS conversion table will be made.

PAGE 38

30 CHAPTER 3 METHODOLOGY Introduction The purpose of this study was to test the accuracy of a FIM-MDS conversion table that was designed to transform a score on the physical component of the FIM to its corresponding score on the MDS and vice versa. The methods utilized to investigate the accuracy of the FIM-MDS conversion table are described in the following sections of this chapter: Source of the Data Sample FIM and MDS Motoric Items Procedures Involved in the Creation of the FIM-MDS Conversion Table Statistically Testing the Accuracy of the FIM-MDS Conversion Table This study was approved by the University of Floridas Institutional Review Board for the protection of human subjects, as well as the Veterans Administrations Subcommittee on Clinical Investigations. This study also obtained a HIPAA Waiver of Authorization. Source of the Data The FSOD and MDS data reside in two separate databases at the VAs Austin Automation Center (AAC) (Veterans Health Administration, 2004). Upon consultation with Dr. Hojlo, Chief of the VA Nursing Home Care, the most accurate VA-MDS data were available starting in June of 2002. Therefore the data extractions were based on data collected from June 2002 to May 2003. Data from both databases were downloaded and merged on the basis of social security numbers, using the statistical software, Microsoft Access.

PAGE 39

31 A single-group design was used, as both i nventories were completed by the same group of subjects. Because the same population completed both inventories, population invariance and symmetry exist. This eliminates concerns that might otherwise arise over differences caused by variability in the composition of the sample population. Inclusion Criteria of Subjects Subjects included in this study were those who were part of the VAs FSOD and MDS databases, who had FIM and MDS scores completed no more than seven days apart between June 2002 and May 2003 and who had data on all items included in the development of the FIM-MDS conversion table. The decision to restrict the amount of time that elapsed between the administration of the FIM and MDS was based on the need to minimize the impact possible change(s) in a patients condition might have on the resulting FIM and MDS scores. Inclusion of Women and Minorities Inclusion criteria were for males and females and all ethnic groups, as they occurred in the VAs FSOD and MDS databases. Sample The records of 57,237 patients who underwent a FIM evaluation and 69,954 subjects who had MDS scores in a VA post-acute care setting between June 2002 and May 2003 were obtained from two separate databases housed at the VAs AAC. The linking of these records, based on patient social security number and no more than 7 days between FIM and MDS test dates, resulted in 2,521 matches. This data was then cleaned to exclude duplicate records of the same subject with more than one match of test dates, as well as those records that included missing or invalid scores (i.e., ratings other than acceptable) for items on either the FIM or the MDS. This la st exclusion was made to ensure that total

PAGE 40

32 scores were compared to total scores, which is the basis of the FIM-MDS conversion table. The result was 2,297 unique subjects with linked FIM and MDS scores. The age of subjects ranged from 19-89+ years. Of those subjects between the ages of 19 and 89, 50.7% were under the age of 70, 31.2% were in their 70s and 15.2% were in their 80s. Only 1.5% of the sample was over the age of 89. The majority of the sample was Caucasian at 73% with 20% be ing African American and 5% Hispanic. Ninety-six percent of the sample was male and 44% were married. The days between the administration of the FIM and the MDS ranged from 0-7 days with a mean of 5 (1.9) days. Thirty-five percent (1,531) of the subjects had a diagnosis of stroke, 23% (525) had lower extremity orthopedic problems, and 12% (271) had lower extremity amputations. The remainder of the sample consisted of subjects with a variety of impairments. FIM and MDS Motor Items Table 3-1 is a list and comparison of the FIM and MDS motor items included in this analysis. There are nine pairs of items between the FIM and MDS that are considered to represent the same or nearly the same activity in this study. These pairs include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel management, bladder management, transferring, and walking (Table 3-1). Items included in only one instrument and not the other are bed mobility and stair use. While both the FIM and MDS instruments include an item for eating, the FIM requires a higher skill level in order to achieve the highest rating. This is because the FIM does not permit finger feeding, nor does it allow people eating through adaptive means to achieve the highest rating. There are also similar grooming and hygiene items on the two instruments. The term, bathing for both instruments connotes a full body bath, to

34 lower body was matched with the MDS item for dressing since, based on the labs clinical judgment, dressing the lower body would be considered a more difficult task than dressing the upper body. This more difficult aspect of the ADL is incorporated in the one MDS item for dressing. The FIM item for toileting is matched with the MDS item for toilet use, as they have similar definitions, although the MDS includes toilet transfer in the task while the FIM has a separate item for transfer. The MDS item for transfers is then matched with the FIM item for transfers: bed, chair, and wheelchair. The bowel and bladder control items on the FIM are matched with the bowel and bladder continence items on the MDS. The FIM item for walk/wheelchair addresses ones ability to walk or use a wheelchair safely on a level surface, while the MDS has four items for walking to include walk in room, walk in corridor, locomotion on unit and locomotion off unit. Although not included in the definition of the FIM item for walk/wheelchair, 150 feet is specified as the performance criterion in th e clarification of the rating scale (Rogers, Gwinn & Holm, 2001, p. 6). Therefore, the FIM item for walk/wheelchair was matched to the MDS item for locomotion off unit. Furthermore, the FIM incorporates safety into the definition of many of its items, such as grooming, bathing, dressing, transfers, toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers, Gwinn & Holm, 2001). The FIM and MDS have different response scales on which the physical functioning items are scored. A clear distinction in the administration of these two assessments is that the items of the FIM are scored at the time of the assessment, while ratings on the MDS are based on observed performance over a 7-day period. Furthermore, the FIM items have seven response levels, while the MDS has a range of

PAGE 43

35 five. The FIM scoring criteria are shown in Table 3-2. The MDS scoring criteria are shown in Table 3-3. While the FIM motor items assess the percent of effort that is provided to the patient to accomplish a task, the MDS measures the number of times during a 7-day time period a patient required a certain level of assistance to perform a task. Table 3-2.FIM scoring criteria 7Complete independence All of the tasks described as making up the activity are typically performed safely, without modification, assistive devices, or aid and within a reasonable amount of time. 6Modified independence One or more of the following may be true: the activity requires an assistive device; the activity takes more than reasonable time, or there are safety (risk) considerations. 5 Supervision (Standby prompting) Supervision or SetupSubject requires no more help than standby, cuing or coaxing, without physical contact, or, helper sets up needed items or applies orthoses. 4 Minimal assist (Minimal Prompting) Subject requires no more help than touching, and expends 75% or more of the effort. 3 Moderate Assistance (Moderate Prompting) Subject requires more help than touching, or expends half (50%) or more (up to 75%) of the effort. 2 Maximal assistance (Maximalprompting) Subject expends less than 50% of the effort, but at least 25%. 1Total AssistanceSubject expends less than 25% of the effort. (Evans, 2002)

PAGE 44

36 Table 3-3.MDS scoring criteria 0Independence No help or staff oversight  OR  Staff help/oversight provided only one or two times during the last seven days. 1SupervisionOversight, encouragement, or cueing provided three or more times during last 7 days ORSupervision (3 or more times) plus physical assistance provided, but only one or two times during the last 7 days. 2Limited AssistanceResident highly involved in activity, received physical help in guided maneuvering of limbs or other nonweight-bearing assistance o three or ore occasions ORlimited assistance (3 or more times), plus one weight-bearing support provided, but for only one or two times during the last 7 days. 3Extensive AssistanceWhile the resident performed part of activity over last seven days, help of following type(s) was performed three or more times: --Weight-bearing support provided three or ore times; --Full staff performance of activity (3 or more times) during part (but not all) of last 7 days. 4Total DependenceFull staff performance of the activity during the entire 7-day period. There is complete nonparticipation by the resident in all aspects of the ADL definition task. If staff performed the activity for the resident during the entire observation period, but the resident performed part of the activity himself/herself, it would not be coded as a 4 (Total Dependence). (CMS, 2003). Procedures Involved in the Creation of the FIM/MDS Conversion Table For the purposes of creating a FIM-MDS conversion table, Velozo (2004) obtained linked FIM and MDS scores from th e records of 254 subjects. The linking of instruments using IRT methodologies is generally dependent on item calibrations, which are the difficulty measures of the items. In essence, item calibrations serve as the markings on the conversion ruler. Rasch analysis of the FIM and MDS converts a patients responses on the instrument items to a measure of ADL/motoric function.

PAGE 45

37 Prior to performing the Rasch analysis, several steps were taken so that the FIM and MDS rating scales were conceptually consistent. One inconsistency between the FIM and MDS is that the MDS includes a rating for activity did not occur. Using a procedure adapted by Jette, Haley, and Ni (2003), this MDS rating was recoded as part of the total dependence rating. The rationale underlying this decision was that a likely explanation for an activity not occurring was that the activity could not be performed (Buchanan, Andres, Haley, Paddock & Zaslavsky, 2002; Jette et al., 2003). Other inconsistencies between the FIM and MDS are that the rating scales progress in different directions and have different ranges (i.e., from 1 to 7 for the FIM and from 4 to 0 for the MDS). In order to adjust for these differences, the MDS scale was rescored and rescaled to match the rating scale used in the FIM. For example, a 4 on the MDS, which represents total dependence, was recoded as a 1 to match Total Assistance on the FIM and a 0 on the MDS, which represents Independence, was recoded to a 7 to match the rating for Complete Independence on the FIM (Table 3-4). Table 3-4.FIM-MDS score conversion MDSScore ConversionFIM Independent0 to 7Complete Independence Supervision1 to 5Supervision Limited Assistance2 to 4Minimal Assistance Extensive Assistance3 to 2Maximal Assistance Total Dependence4 to 1Total Assistance Following the rescoring of the MDS rating scale, the next step in creating the FIM-MDS conversion table was to run a Rasch partial credit model analysis on the linked FIM and MDS scores, using Winsteps (Linacre & Wright, 2000). This combined analysis placed the FIM and MDS items and rating-scale calibrations on the same linear scale with the same local origin. That is, FIM item and rating-scale calibrations became

PAGE 46

38 linked to MDS item and rating-scale calibrations. This also provided cocalibrated item and rating-scale values, which were then used as anchors in separate FIM and MDS analyses. Both the anchored FIM and MDS analyses generated output tables that associated total FIM and MDS raw scores with a common logit scale. These analyses resulted in a conversion table whereby total FIM raw scores could be translated into total MDS raw scores and vice versa (Table 3-4). Converting a score from the FIM to the MDS and vice versa is as simple as locating a score under either the FIM or MDS column (Table 3-5), reading across the adjacent logits column to find the equivalent score on the alternate instrument. It is hypothesized that a score of 16 on the MDS, represents the same amount of functional independence as a score of 39 on the FIM. Similarly, a score of 58 on the FIM represents the same amount of functional independence as does an MDS score of 29 (Table 3-4). Nine of the FIM items have corresponding items on the MDS that address similar areas of physical functioning. These items include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel management, bladder management, transferring, and walking. After performing the Rasch analysis on the FIM and MDS total scores to link the two measures, the resulting correlation between the similar FIM and MDS items was .822 at p
PAGE 47

39 Table 3-5.FIM-MDS conversion table FIMlogitMDSFIMlogitMDSFIMlogitMDS 13-3.805239-0.5336650.4118 14-2.775140-0.4935660.4617 15-2.265141-0.4634670.516 16-1.995042-0.4233680.5515 17-1.815043-0.3933690.615 18-1.674944-0.3532700.6414 19-1.564845-0.3232710.6913 20-1.464846-0.2931720.7512 21-1.384747-0.2530730.811 22-1.314748-0.2230740.8611 23-1.244649-0.1829750.9210 24-1.184550-0.1528760.989 25-1.124451-0.1228771.049 26-1.074452-0.0827781.118 27-1.024353-0.0527791.197 28-0.974254-0.0126801.276 29-0.9242550.0225811.356 30-0.8841560.0624821.455 31-0.8340570.1024831.555 32-0.7940580.1323841.674 33-0.7539590.1722851.83 34-0.7139600.2121861.963 35-0.6738610.2521872.152 36-0.6437620.2920882.402 37-0.6037630.3319892.761 38-0.5636640.3718903.380 914.520 the FIM-MDS conversion table, the same adjustments made to the MDS rating scale when creating the conversion table were applied to the current dataset. Using the Statistical Package for Social Sciences (SPSS), version 12.0 for Windows, the MDS rating for activity did not occur was recoded as total dependence following the scoring protocol of the FIM. Then, the MD S rating scale was recoded and rescaled also to match the rating scale of the FIM. The FIM-MDS conversion table was then used to convert the second sample of actual FIM scores, designated as FIMa, to converted MDS

PAGE 48

40 scores, designated MDSc. In the same manner, MDSa scores were converted to FIMc scores. Then, the actual scores on the FIM and MDS were compared to their corresponding converted scores to determine how similar the actual and converted scores were. The goal of equivalence testing is to demonstrate that two or more conditions are statistically the same (Stegner, Bostrom & Greenfield, 1996). In this type of testing, one reverses the role of the null and alternative hypotheses and then by testing a set of these reversed hypotheses, demonstrates equivalence with a predetermined significance level just as when demonstrating a difference between groups (Stegner et al.). The equivalence methodology is a simple application of bioequivalence principles proposed in Pharmacokinetics and Biopharmaceutics recently. The idea is to "prove" statistically that two drugs or formulations are equi valent (Berger & Hsu, 1996; U.S. FDA, 1997, 1999). This methodology was used in the current study in order to compare the statistical equivalence of FIMa and FIMc scores, as well as MDSa and MDSc scores. It was hypothesized that a minimum of 75% of the actual and converted scores should be within 5 points of one another in order for the conversion to be considered accurate. If less than that occurred, then the conversion table would not be considered accurate enough for use in a clinical setting. A difference of 5 points was employed, as Forrest, Schwam, and Cohen (2002) found that "each 5-point decrement in the FIM score correlated with the need for about one hour per day of help in mobility, basic activities of daily living, and instrumental activities of daily living" (p. 57). Yet, Granger et al. (1993) indicated that while no recommendations existed for what constituted a clinically significant change on the FIM, a 10-point improvement decreased by almost 50% the time required to care for

PAGE 49

41 a group of stroke patients in the community. For this study, a clinically significant difference in scores was set at the more rigorous 5-point increment. In order to apply a more precise analysis to the determination of the level of accuracy of the FIM-MDS conversion table, techniques as described in Dorans and Lawrence (1990) were utilized. In their study, Dorans and Lawrence tested the score equivalence of nearly identical editions of the Scholastic Aptitude Test (SAT). These two versions were comprised of the same test questions, but the order in which the test was administered differed. In one test situation, the 40-item verbal section of the SAT might precede the 45-item verbal section. In the other situation, the 45-item verbal section might proceed the 40-item verbal section. The test was administered to what was presumed to be statistically equivalent groups of examinees. Linear equating techniques were used to equate one version of the test to the other version and the accuracy testing of the resulting scores was accomplished by checking whether the identity transformation fell within a reasonable confidence interval placed around that equating function (Dorans & Lawrence). The difference between the equating function and the identity transformation was calculated and then that difference was divided by the standard error of the equating function. If the resulting rati on fell within a bandwidth of plus or minus two, then the equating function was considered to be within sampling error of the identity function. Kolen and Brennan (1995) indicated that in order for equating of two measures to be successful, the four moments of the distribution should be statistically equivalent. Therefore, the four moments of the distribution, including the means, standard deviations, skewness, and kurtosis were also calculated and compared.

PAGE 50

42 Correlations between the actual and converted scores on both the FIM and MDS were also determined, as was effect sizes. Effect sizes give a clear indication of the amount of difference that exists between two sc ores distributions. The standard of effect sizes as noted in Cohen (1988) was used to determine the percent of overlap that existed between the converted and actual scores (Table 3-6). Table 3-6.Effect size Cohen's StandardEffect SizePercentile StandingPercent of Overlap Large0.87952.6% 0.77657.0% 0.67361.8% Medium0.56969.0% 0.46672.6% 0.36278.7% Small0.25885.3% 0.15492.3% 0.050100.0% (Adapted from Cohen, 1988) Additionally, an analysis on the data was performed to obtain an understanding of how similar the raw scores on similar items between the FIM and MDS were. A factor that might negatively impact the accuracy of the conversion table would be the presence of large differences in the scores for individuals on similar FIM and MDS items. In order to systematically test for the level of disparate scores present in the current dataset, the ratings on the MDS were converted to their closest corresponding ratings on the FIM, as shown in Table 3-7. Table 3-7.Rating scale conversion MDSScore ConversionFIM Independent0 to 7Complete Independence Supervision1 to 5Supervision Limited Assistance2 to 4Minimal Assistance Extensive Assistance3 to 2Maximal Assistance Total Dependence4 to 1Total Assistance

PAGE 51

43 For example, the score for independence on the MDS, 0, was converted to a 7 to correspond with the FIM score for complete independence. Similarly, an MDS score of 3, indicating extensive assistance, was converted to a score of 2 for maximal assistance. This rating conversion was accomplished on the nine FIM and nine MDS items that most closely matched (Table 3-8). Table 3-8.Similar FIM and MDS items FIMTM itemsMDS items Eating Grooming Bathing Dressing-lower body Toileting Bladder management Bowel management Bed, chair, wheelchair (transfer) Walk/wheelchair Eating Personal hygiene Bathing Dressing Toilet use Bladder continence Bowel continence Transfer Locomotion off Unit It was hypothesized that a difference of four points on the rating scale represented a important difference in a persons functional ability. For instance, a score of 1 on the FIM indicates Total Assistance, where the patient exerts less than 25% of the effort required in performing a task. A distance of four points away from the rating of 1 would be a score of 5, which represents supervision, indicating that the subject requires no more help than stand by assistance to complete a task. The criteria of looking at only matched scores with a difference of four or more points is rigorous in that this situation only occurs between the following three score categories: 1-5; 1-7; and 2-7. Setting a criterion of selecting only those subj ects who had a difference of four or more points between scores on similar items demonstrates the examples of ratings that are clinically different on similar items.

PAGE 52

44 CHAPTER 4 RESULTS Statistical Analyses The FIM-MDS conversion table was used to transform FIMa scores, obtained from the records of 2,297 subjects, to MDSc scores and MDSa scores to FIMc scores. The converted scores were then analyzed statistically, first at the individual level and then at the group level, as a means of determ ining the level of accuracy of the FIM-MDS conversion table. In order for the conversion table to be considered accurate for use in clinical settings, it was hypothesized that 75% of the actual and converted scores on the FIM and the MDS should be no more than five points apart. Next, techniques used by Dorans and Lawrence (1990) to test the accuracy of an attempt to equate nearly identical editions of the Scholastic Aptitude Test (SAT) were applied to this dataset. These procedures were used to determine whether the converted scores on the FIM and MDS fell within a reasonable confidence interval placed around that actual FIM and MDS scores. The difference between the actual and converted scores was calculated and then that difference was divided by the standard error of the actual scores. If the resulting ratio fell within a bandwidth of plus or minus two, then the converted scores were considered to be within sampling error of the actual scores. When testing the accuracy of the FIM-MDS conversion table from the perspective of group scores, the equivalence of the four moments of the distributions (i.e., the means, standard deviations, skewness, and kurtosis) were compared. Next, the

PAGE 53

45 amount of overlap that existed between the actual and converted scores on the FIM and MDS was determined by calculating effect sizes. Correlation between the actual and converted scores on the FIM and MDS were also calculated and those results are displayed graphically, as well in scatter plots. Statistical Results at the Level of the Individual Of the 2,297 subjects analyzed, 25% (574) of the sample had FIMa and FIMc scores that fell within five points of one another with the difference in scores ranging from 0 to 71 points (Figure 4-1). For the MDS, 37% (850) of the sample had actual and converted scores within no more than five points of one another with the difference in scores ranging from 0 to 48 (Figure 4-2). These percentages fell well short of the 75% standard for both the FIM and the MDS. For comparison purposes, the percentage of subjects with actual and converted FIM and MDS scores within 10 points of one another was also calculated. On the FIM, 45.5% (1,045) of the subjects had actual and converted scores within 10 points and 64% (1,470) of the subjects had MDS scores within 10 points. Even when this more lenient criterion was used, the results continue to fall short of the 75% standard. The equivalence of the actual and converted FIM scores and the actual and converted MDS scores was also evaluated using the test of equivalence employed by Dorans & Lawrence (1990). For the FIMa vs. FIMc scores, 8.4% of the conversion met the criterion for equivalence and for the MDSa and MDSc scores, 6.4% met this criterion. Statistical Results at the Group Level Cooper (1989) concluded that in order for the equating of scores to be successful, all four moments of the distribution should be similar. The four moments of the

47 distribution, the mean, standard deviation, skewness, and kurtosis of the actual and converted scores on the FIM and MDS are displayed in Table 4-1. Table 4-1. Four moments of the distributions FIMaFIMcMDSaMDSc N Valid2297229722972297 Missing0000 Mean52.3760.3531.8625.86 Std. Error of Mean.432.409.280.295 Median54.0062.0033.0026.00 Std. Deviation20.6919.6013.4314.12 Skewness-.143-.388-.406-.041 Std. Error of Skewness.051.051.051.051 Kurtosis-.855-.573-.630-.901 Std. Error of Kurtosis.102.102.102.102 The mean of the FIMc was eight points greater than the mean of the FIMa while the mean of the MDSa exceeded the mean of the MDSc by six points. The standard deviation between the actual and converted FIM scores (20.69 and 19.60) was within one point of each other and for the two MDS scores, there was slightly more than a one-point difference at 13.43 for the MDSa and 12.12 for the MDSc. Since the mean of the FIMa and FIMc differed by only eight points, these scores fell well within one standard deviation of each other. Similarly, the means of the actual and converted MDS differed by six points and also fell within one standard deviation of each other. Therefore, the first two moments of the distributions, the mean and standard deviation, were equivalent. Since the mean score can be highly influenced by outliers, the median scores for the distributions were also reported. The difference in the medians of the FIMa and FIMc score distributions was eight points, just as it had been with the difference in the means. The medians of the MDSa and MDSc differed by seven points. The medians for all four of the score distributions exceeded their respective means, indicating the presence of a negative skew to the score distributions. Normal distributions produce a

PAGE 56

48 skewness statistic of about zero. A skewness or kurtosis value of two standard errors or greater, regardless of sign, likely deviates from a normal score distribution to a significant degree (Brown, 1997). Two times th e standard error of skewness for all four distributions was .051 and two times the standard error of kurtosis was .102, again for all four score distributions. Therefore, th e distributions of the actual FIM and MDS instruments had a significant negative skew, indicating that the subjects measured on these two inventories were generally more able than the tests were difficult. The distribution of the FIMc was also negatively skewed to a significant degree, yet the MDSc did not have a significant skew value and the distribution would be considered symmetrical in this regard. The distributions of both the actual and converted scores on the FIM and MDS all had negative kurtosis values, indicating that these distributions were flatter than what one would expect a nd differed from normal to a significant degree. The results of this conversion revealed a substantial overlap between the distributions of the actual and converted FIM scores (Figures 4-3 and 4-4), as well as between the actual and converted MDS scores (F igure 4-5 and 4-6). An effect size of .2 demonstrated an 85.3% overlap of the distributions for the actual and converted scores in each case. Correlations between the actual and converted scores were calculated and revealed a .724 correlation at p
PAGE 57

52 Discrepancies in the Dataset Of the 2,297 subjects in the dataset used in this study, 51% (1,163) had at least one of the similar items in which there was a difference of four or more rating points between scores on the FIM and the MDS. Three percent (76) of the subjects had four or more of the nine similar items with score differences of four or more rating points. Two percent (2) of the 109 subjects with FIM and MDS scores recorded on the same day had more than four similar items with score differences of four or more rating points.

PAGE 61

53 CHAPTER 5 DISCUSSION Summary of Results The results of this study testing the accuracy of the FIM-MDS conversion table were mixed, as those statistics at the group level tended to support the accuracy of the conversion, while those at the individual level did not. The findings that support a conclusion of equivalency between the actual and converted scores on both the FIM and MDS included an 85.3% overlap between the respective score distributions, as well as a correlation of .724 for the FIM and .745 for the MDS. The means of the actual and converted FIM scores were well within one standard deviation of each other, as were the means of the actual and converted MDS scores. The results of the two one-sided test procedures supported this conclusion of equivalency between the respective FIM and MDS means, as well. On the other hand, only 25% (574) of the sample had FIMa and FIMc scores within 5 points of one another and 37% (850) had MDSa and MDSc scores within that range. Those percentages fell well short of the hypothesized 75% of the sample having actual and converted scores that were no more than 5 points apart. If the standard for an accurate conversion system were lowered to allow for up to a 10-point difference between actual and converted scores, then 45% (1,045) of the sample would have FIM scores and 64% (1,470) would have MDS scores within that range. While these percentages were closer to the 75% criteria, they still fall short. The presence of a

PAGE 62

54 negative skew for all four of the score distributions was an indication that the subjects ability levels were higher than the difficulty levels of the inventories. All of the score distributions except for the MDSc demonstrated a skewness value that resulted in a significant departure from normal. The level of kurtosis for all four distributions also deviated from normal to a significant degree. When using the very rigorous procedures described by Dorans and Lawrence (1990) to determine equivalency, only 8.4% of the scores on the FIMa and FIMc and 6.4% of the MDSa and MDSc scores met this criterion. A ceiling effect in the distribution of the FIMc and the MDSa scores was present. Submitting the FIM and MDS scores to the FIM-MDS conversion table resulted in an inflation of FIM scores and a deflation of MDS scores. Taking all of the above information into consideration, the FIM-MDS conversion table passed less stringent standards for equivalency, generally at the group level, but failed when focusing on statistics at the individual level. Since other attempts at creating conversion tables between instruments used in rehabilitation have not gone further to test the accuracy of those conversions, no direct comparisons to other research findings can be made. An understanding of the psychometrics of the FIM-MDS conversion table is important when interpreting the results of this study. The conversion table was developed from a dataset of 253 subjects with FIM and MDS scores occurring within 7 days of one another. Similar FIM and MDS items used in the development of the conversion table had a correlation of .82 at p
PAGE 63

55 table for similar item and similar person measures may be explained by differences in the design between this and previous studies. Fi sher et al. used a convenience sample of 285 patients who were waiting for appointments in a public hospital general medicine clinic. These patients were asked to complete the PF10 and the PFS inventories while they waited to see a doctor and, therefore, no opportunity existed for a change in physical condition to take place between the completion of the two surveys. There was also no possibility for different raters to score the two instruments on the same subject since the rater in both cases was the patient. Fisher et al. also removed the least consistent cases from the analysis, meaning cases with the highest outfit statistics were not included in creating the conversion table. The correlations found in the current study were also not as strong as those reported by Fisher, Harvey, Taylor, et al. (1995), who obtained a correlation of similar person measures of .91 between the instruments used in the development of the Rehabits translation scale. The stronger correlation obt ained by Fisher, Harvey, Taylor, et al. may be the result of differences in the design of the study, as compared to the current one. Fisher and colleagues used the results of 54 consecutive patients admitted to a freestanding rehabilitation hospital, who were rated on both the FIM and the PECS at admission and discharge. Thus, rater variability was controlled, and there was no possibility for physical changes to take place in the patients condition between the administrations of the two inventories upon admission and then again upon discharge because the inventories in each case were completed at the same time. Studies by both Fisher, Harvey, Taylor, et al. (1995) and Fisher et al. (1997) that used cocalibration equating measures to link healthcare instruments did not test the correlations between actual and converted scores. Therefore, it is difficult to clearly

PAGE 64

56 define the significance of a correlation of .72 for the FIMa and FIMc and a correlation of .75 for the MDSa and MDSc. If this were a classical test-retest reliability study, these correlations would be considered low. The question left for the current study is whether better results could be obtained by creating a conversion system based on more accurate data. Yet, an attempt to use more conrolled data collection methodologies would limit the applicability of the study in clinical settings. The limited accuracy of the FIM-MDS conversion table may be a result of problems with the sample upon which it was based. This conversion system was developed from a dataset of 253 subjects with FIM and MDS scores occurring within seven days of one another. It may be that a larger dataset is necessary to create a highly accurate conversion system. Furthermore, a significant limitation to the dataset used in this study is the presence of scores on similar FIM and MDS items for the same subject that differ markedly. It can be argued that differences in the definitions of similar items between the FIM and MDS led to variability in scores. For example, the definition of eating on the MDS focuses on the intake of nourishment by any means, regardless of skill, and includes the use of alternate forms of obtaining nourishment, such as tube feeding (Rogers et al., 2001 p. 7-8). Yet, eating on the FIM is limited to the use of suitable utensils to bring the food to the mouth (Rogers et al.). The MDS item for bathing includes bathtub or shower transfers while the FIM bathing item does not. And, the MDS toileting item includes the ability to transfer to and from the toilet while the FIM item for toileting does not. Furthermore, the FIM incorporates safety into the definition of many of its items, such as grooming, bathing, dressing, transfers, toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers et al.). Thus, one person could conceivably obtain a very different score on two of the similar items

PAGE 65

57 between the FIM and MDS. Alternatively, the fact that up to a 7 day difference in test scores between the FIM and the MDS was allowed may have also had an impact on the presence of discrepancies that were seen between similar items for the same subject. While the decision to restrict the number of days between the completion of the FIM and MDS was based on the need to minimize the impact any possible change in the patient's condition would have on their scores, it is likely that even this restriction did not go far enough to eliminate this source of error in the study. Thus, some of the discrepancies in scores that were seen between similar items may be due to a change in the subject's physical condition. A related explanation for the problems encountered in developing a highly accurate conversion table is the existence of discrepancies between similar categories in the rating scales of the FIM and the MDS. A FIM score of "1" refers to "Total Assistance," in which the patient puts forth less than 25% of the effort necessary to perform a task. The corresponding score on the MDS is a 4 for Total Dependence. This score is defined as, full staff performance of the activity during entire 7-day period. There is complete nonparticipation by the resident in all aspects of the ADL definition task. A noticeable difference exists between these two rating definitions. On the FIM, the patient may exert up to a quarter of the effort needed to perform the test, while on the MDS the patient does not participate at all in the activity. Instead, the staff is required to perform the full activity. It may be reasonably be argued that there is a clinically significant difference in a patient who can exert up to 25% of the effort required to complete a task and one who cannot exert any effort at all. Similarly, a score of "2" on the FIM refers to "Maximal Assistance," in which the patient puts forth less than 50% of the effort necessary to do a task, but at least 25% of the effort. The corresponding score

PAGE 66

58 on the MDS of 1 or Supervision is defined as Oversight encouraged or cuing provided three or more times during the last 7 days OR supervision (three or more times) plus physical assistance provided, but only one or two times during the last 7 days. Once again, a significant difference in meaning is evident between these two categories. A further limitation of this study is that there may be a selection bias in that patients that have scores on both the FIM and MDS may not typical of the patient population in general. Additionally, it is likely that characteristics of this veteran population may not be representative of the population for whom the FIM and MDS are intended. Most notably, 98% of the study population is male. Yet, the greater majority of people in nursing homes, where the MDS is used, are female. Similarly, the ethnic make up of this study population is also likely not an accurate reflection of the make up of the population who most frequently use the FIM and MDS. The unique aspects of the subjects in this study may limit the generalizability of the findings. Implications for Future Research The results of this study lead one to consider other possible situations in which the linking of instruments may be effective. There is evidence to suggest that the creation of a conversion table based on two self-report instruments would have a higher degree of accuracy (Fisher et al., 1997). One possible reason for the higher degree of accuracy is that rater bias is not an issue, as the same rater (i.e., the subject or their proxy) would complete both instruments. Additionally, the research study could be set up so that the subject completed both instruments at the same time. In this manner, it would not be possible for a change in the subjects physical condition to take place and it is hypothesized that it would be unlikely for a subject to interpret similar items between instruments differently. Thus, these two sources of error, rater bias and the possibility

PAGE 67

59 that there has been a change in the subject's physical condition between the administrations of the two instruments, would be eliminated and the conditions required for the creation of a conversion table would be optimized. The use of self-reports in a clinical setting and/or in a research study would also have an economic advantage, as it would reduce the amount of time a trained therapist would need to be involved in the task. Conclusion Maximizing outcomes in rehabilitation, while streamlining the process of providing highly effective and coordinated services, will continue to be a goal of rehabilitation for years to come. Efforts to increase the continuity of care between PAC settings and to improve the effectiveness of rehabilitation services will be pursued on all levels. This research focused on determining the accuracy of one such effort, namely a means of creating an easily implemented and highly effective tool for converting the score from the physical component items on the FIM to those on the MDS and vice versa. The results of this study suggest that scores derived from the FIM-MDS conversion table should, at best, only be considered as rough estimates of similar scores on the two instruments. At the conclusion of this study, the question still remains as to whether the FIM and MDS instruments can measure physical functioning on a common unit of measurement and whether a highly accurate conversion table can be developed so that a patients gains in physical functioning can be tracked from inpatient rehabilitation settings to skilled nursing facilities. It may be that pursuing research in alternative directions, such as using these linking techniques to create a conversion table between self-report instruments of functional ability, will provide a solution.

71 BIOGRAPHICAL SKETCH Katherine L. Byers, MHS, CRC, CVE, is a doctoral candidate in the rehabilitation science doctoral (RSD) program at the Univers ity of Florida, College of Public Health and Health Professions. Ms. Byers received a Bachelor of Arts degree in behavioral sciences from Rice University in Houston, Texas in 1989. She then completed a Master of Health Science (MHS) degree in rehabilitation counseling at the University of Florida in 1991 and subsequently obtained certifications as both a rehabilitation counselor and as a vocational evaluator. Over a period of 9 years, Ms. Byers worked in positions of increasing responsibility in the field of rehabilitation before entering the University of Floridas rehabilitation science doctoral program in January of 2000. While completing the requirements of the degree, Ms. Byers was employed as a research assistant and then as a program coordinator for Dr. Craig Velozo, an assistant professor in the Department of Occupational Therapy at the University of Florida. Accomplishments during Ms. Byers doctoral career include winning the 2002 John Muthard Research Award from the University of Floridas College of Health Professions, Department of Rehabilitation Counseling. She also was selected to make a poster presentation at the Third National Rehabilitation Research and Development Meeting in Washington, DC, in 2002, and at the 2004 ACRM-ASNR Joint Conference.