The Feasibility of Using Electronic Health Data for Research on Small Populations.
Part II: The Potential Use of Electronic Health Records and Other Electronic Health Data to Improve Research on the Health and Health Care of Small Populations

Patients’ health records and other electronic health information are an essential part of care, documenting critical issues such as their history, preventive care, diagnostic tests, and diagnoses and treatments over time. Health records also facilitate information sharing among physicians, other health professionals, and provider organizations that may be involved in a patient’s care. Containing key information regardless of where and from whom the patient receives care, health records can also be fairly comprehensive as well as longitudinal. Comprehensive integrated health records support the continuity and timeliness of care, which can in turn represent higher quality and less costly care.

Given the rich information contained in health records, much medical and health services research has been based on them, solely or in combination with other types of data (e.g., survey, claims). However, the traditional medium (i.e., paper and pen) in which health records have been created as well as organized and managed (i.e., paper file folders in a filing cabinet) has limited their usefulness for research. The manual process of identifying and obtaining the relevant records from one or more providers, abstracting the information contained in them, and creating a database for analysis is time-consuming, expensive, and fraught with potential errors and problems.206

The increased adoption and use of electronic health records (EHRs) and other forms of electronic health information have the potential to revolutionize research, overcoming many historical constraints. The new medium (electronic) in which health records are created, organized, and managed (computer hardware and software) result in “big data” (a lot of detailed data on a large number of people) and potentially faster and cheaper means of using medical records for research. For example, EHRs and other information technology can facilitate the identifying patients with a particular diagnosis or receiving certain services, obtaining their records, extracting information, and creating a database needed for analysis. Additionally, recent developments like EHR certification standards, ‘Meaningful Use” (MU) criteria, tools like natural language processing (NLP) software, and electronic health information exchange (HIE) infrastructure (e.g., email, Internet, cloud) and standards (e.g., HL7) have the potential to improve the reliability and validity of EHR data as well as their comprehensiveness and longitudinality. As the Institute of Medicine (IOM) notes, EHRs and other electronic health data provide the information infrastructure to support a “learning health care system” that continuously and relatively quickly turns data into information to guide ongoing improvement efforts and research.207

Research on “small n” populations is an important area where EHR and other electronic data have the potential to complement existing data sources and methods, perhaps revolutionizing the research process. By “small n” populations, we mean subpopulations that are much less common than the “average,” “typical” or “majority” population and may differ from them in important ways (e.g., disease prevalence, treatment). For a variety of reasons, small n populations have been difficult to study with traditional methods and data sources, such federal surveys and claims data sets.

As described in Part I of this report, there are important limitations to the use of federal surveys for the health and health care needs of small n populations. These surveys may include too few people in important demographic or clinical subpopulations (e.g., race/ethnicity, sexual orientation/gender identity, location, or clinical condition) to produce valid and reliable findings. Additionally, the surveys may not contain items or questions specific to the population of interest or on co-variates needed as controls (e.g., education, income, years in country, primary language). Finally, surveys may have a lot of missing or inaccurate data about sensitive topics that raise privacy concerns (e.g., sexual behavior).

Claims data from public or private health insurers or research agencies (e.g., AHRQ HCUP data) provide sources of data for research on some small n populations. However, these data have a number of limitations as well, primarily because they have been generated to obtain payment. Depending on the payment method, providers may be more or less motivated to submit comprehensive and accurate claims. Additionally, many important clinical details, as well as patient-reported information, do not appear in claims, although efforts are currently under way to try to enhance claims data with EHR and other types of data (e.g., laboratory and pharmacy data, death certificates or other vital records) for research purposes.208 Finally, claims data from particular health plans and providers may not provide comprehensive or longitudinal information because patients may change health plans and providers or see providers that are not part of the same organized delivery system.

The purpose of this report is to explore the potential use of EHRs and other electronic information to improve research about small populations, alone or in combination with other data sources. While “research” can take many forms , we define the term broadly in this report, as our primary purpose is to consider how EHR data can potentially be used to study the health and health care needs of small populations as illustrated by the four examples or sub- groups, including making comparisons to the larger population or other sub-groups as needed. As described in Part I, the priority research questions of interest about small n populations are highly varied, including topics traditionally addressed through clinical, pharmaceutical, health services, public health, public policy and evaluation research. In some cases, even basic descriptive information about certain small populations remains unavailable due to current limitations with data and research methods. The Institute of Medicine has described different approaches to collecting evidence that may be more or less appropriate to address different types of research questions.209 In a similar way, EHR data, alone or in combination with other forms of data, may be better suited for some purposes or types of research than others. Additionally, increasing interest in quality improvement provides opportunities to harness EHR data for research on small n populations but may also present some challenges. We discuss the issue of the “fit” between the purpose and nature of the research on small n populations and the potential use of EHR data further throughout this report.

To explore this potential, we focus on four small n populations that have been difficult to study using conventional methods and source of data—the LGBT population, Asian-American subpopulations, adolescents with autism spectrum disorders, and residents of rural areas. Each of these groupings has distinctive health or health care needs that have been difficult to study for reasons that include small numbers, sensitivity or validity of some reported information (problems in both survey data and data based on medical records or claims), and concerns about confidentiality when separate data elements could be combined to identify particular individuals in a data set.

Using EHR-based information for research on small n populations shares many challenges with all research that would use such information, but, as we will discuss, some special issues arise with small n populations. The four on which we focus illustrate a range of challenges in using EHR and other electronic health information for research. For example, information about the race/ethnicity information that is increasingly being collected in structured data fields in EHRs may not necessarily include smaller ethnic categories and categories may different across health systems. Information about sexual orientation, gender identity, and sexual behavior, if collected at all, is frequently located in the clinician’s notes or other unstructured data fields because of the potential discomfort and stigma historically associated with LGBT status or certain types of sexual behavior. But, natural language processing (NLP) of that unstructured data could be used to identify lesbian, gay, and bisexual individuals, or patient surveys could be administered through a patient portal or on an iPad in the waiting room and input or streamed into the EHR. A combination of structured (age, diagnoses, medications) and unstructured EHR information could be used to identify adolescents with autism spectrum disorder (ASD) and/or also be combined with claims and/or educational records. Finally, providers located in rural areas could be identified and recruited for research on the health and health care needs of rural residents and other issues, but rural providers are less likely to have an EHR and the ability to exchange health information, and privacy concerns arise because of the possibility that individuals in a sparsely populated areas could be identified if rural zip codes are included in the data.

To explore the potential strengths and limits of using EHR data for research on small n populations, alone or in combination with other data, this report covers four general topics. First, we provide a brief description of the methods and data used for the report and briefly discuss the need for research on small n populations. Second, we describe the increasing adoption and use of EHRs among physicians and hospitals, the kinds of data available in them, and the major issues encountered in using them for research within a single health care organization, such as federally qualified health center, physician group, or large organized delivery system. Third, we describe some additional challenges to conducting research with EHR data from multiple health care organizations and/or in combining EHR and other data sources. Finally, we conclude with a discussion of the implications for HHS, including some potential next steps for exploring and improving the use of EHR and other data for research on these and other small n populations.

We conducted semi-structured telephone interviews with 22 expert informants experienced with use of electronic health data for research—in some cases specifically with our four target populations. Initial interviewees were identified through research team knowledge and literature, followed by a snowball sampling technique where interviewees suggested of other relevant experts. Interviewees came from organized delivery systems, universities, private research institutions, and a supplier of health information technology (HIT) (see Table II.1) and were leaders or participants of a number of well-established research networks that use EHRs for research (see the Appendix to Part II). Topics in the interview guide were based on literature as well as on the specific experience represented by each interviewee. They included the advantages and challenges of using EHR data for research, the types of research for which EHR data has the most potential, issues related to sharing data between organizations, and consent, privacy, data security, and confidentiality.

We also conducted a targeted review of literature review that explored technical, legal, and organizational issues related to EHR-based research. Our informants identified additional published and unpublished materials for us to read and review, including websites, materials from major projects using EHRs, and presentations at conferences or other meetings. Using these materials as a starting point, we identified search terms and utilized PubMed and other databases to find other relevant literature. This search resulted in 118 articles in the peer reviewed and gray literatures. See the References in Part II section for a list of citations.

Research has found differences among segments of the population on nearly all aspects of health and health care. The ability to identify and document such differences is an essential starting point for improving people’s health. The four small populations that we selected illustrate a range of unanswered health and health care questions as well as the challenges in conducting research to answer these questions, both with existing federal data sources and potentially with EHR data. While small relative to the U.S. population, these populations have each reached a size where research on their health and health care needs has become both increasingly important and increasingly possible, particularly as new data sources are becoming available. Members of these groups are eager to be recognized and to better understand the particular characteristics and needs of their populations.

These populations were identified based on discussions with government officials at the Assistant Secretary for Planning and Evaluation (ASPE), Agency for Healthcare Research & Quality (AHRQ), and the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS), and the Health Resources and Services Administration (HRSA), who have all received requests for better information about populations that have been difficult to study in existing federal surveys. Here we provide a brief overview of the distinct characteristics, health and health care needs of our four example populations. More detail can be found in Part I of this report.

Asian subpopulations such as Filipinos and Vietnamese

Asian Americans are the fastest growing racial group,210 making up about 4.4 percent of the American population but including more than 50 different ethnicities and 100 languages.211 Language and cultural barriers to accessing health care are important concerns generally among immigrant populations, but their health and health care needs are poorly understood due to lack of disaggregated data about ethnic subgroups.212 But there is evidence that various ethnic subpopulations have distinct patterns of disease and health care use. For example, one study found the prevalence of diabetes was three times higher among Filipino men than among Japanese men.213 Other research has shown how Vietnamese women to have both higher cervical cancer rates—the highest among Asian-American women—but also low screening rates. 214

Small numbers relative to the total population, uneven geographic distribution, and language barriers combine to make it difficult to obtain adequate samples of Asian-American subgroups in national surveys. In claims data or health records, subpopulations may remain difficult to identify because ethnicity and language are not routinely or accurately collected. These factors, along with the time and cost of manual data abstraction, have been barriers for records-based research.

Lesbian, gay, bisexual, and transgender people

The health and health care needs of lesbian, gay, bisexual, and transgender (LGBT) people are not well documented, and even basic survey-based estimates of the size of these populations are inconsistent. However, there is evidence that experiencing stigma, discrimination, and violence are common among LGBT populations, and this has significant implications for this population’s health and access to care. For example, elevated rates of suicidal attempts, depression, and substance use have been reported among LGBT youth as well as for those in early/middle adulthood compared to their heterosexual counterparts. Elevated rates of HIV/AIDS among men, particularly young black men who have sex with men, has been a concern for many years. There is also evidence that lesbian and bisexual women use fewer preventive services than heterosexual women and have higher rates of obesity and breast cancer. The associated stigma may make LGBT individuals hesitant to seek care, or to withhold information from their provider when they do.215 Therefore, information needed to identify this population in medical records is seldom there. Some experts believe that LGBT people may be more willing to identify themselves in a written or online survey compared to a face-to-face encounter. At present, however, there is no well-validated way to reliably collect data on LGBT populations, and numbers vary depending on whether information is collected on behavior, identity, or relationships. In addition, small numbers relative to the whole population make it difficult to obtain adequate samples for basic analyses, much less if split by age or gender, although there is evidence the subgroups of LGBT populations have distinct health care needs.

While transgender people have much in common with LGB populations, they also experience a number of distinct challenges with their health and health care. Although we have included them with LGB populations for illustrative purposes, there are additional issues regarding research for transgender populations that we were unable to fully cover in this report.

Adolescents with autism spectrum disorders

Autism spectrum disorders (ASDs) are a group of developmental disabilities characterized by difficulty communicating and repetitive motions or other unusual behaviors, and range from mild to severe.216 ASDs are lifelong chronic conditions that often require significant medical and psychological care. Over 95 percent of children with autism also have co-occurring conditions such as attention deficit disorder, learning disability, or mental retardation.217 Children with autism are also more likely to experience depression, anxiety, and behavioral problems,218 often as a result of difficulty being understood or bullying.219 As a result, children with ASDs use much more health care services, therapy, counseling, and medication than children without ASDs.220, 221 The prevalence of prescription medications for children with ASD is high—with the most commonly prescribed drugs being psychotropic medications, antidepressants, stimulants, and antipsychotics.222

Most research on ASDs focuses on children, but the health care transition between adolescence and adulthood is a particularly vulnerable period for this population as they move from pediatric to adult care and from child to adult special services.223 However, transition planning for this population is not common.224 This transition has been difficult to study because most national health-related surveys do not have a longitudinal design, making it impossible to follow youth with ASDs over time. In addition, because the condition is difficult to diagnose and diagnostic criteria have evolved over time, there are concerns about the validity and reliability of case reported in parental surveys. There may be opportunities to use health records alone or in combination with other records (e.g., education, social service) to study people with ASDs over time, although the lack of biologic markers and shifting definitions of ASDs may continue to pose challenges in identification, even using clinical data.

Residents of rural areas

Rural communities are generally less densely populated and more geographically isolated than urban areas, often limiting economic opportunities. The out-migration of younger residents has left many of these communities with declining and generally older populations. In addition to the higher rates of chronic conditions associated with age, rural populations are more likely than urban residents to report fair to poor health status225 and to have higher rates of mortality, disability, and smoking and lower rates of physical activity.226 The rural residents of some parts of the country also face environmental health risks associated with agriculture, mining, and industrial pollution. Access to health care services is a serious concern as many rural communities lack the economic resources needed to support expensive medical services. Difficulty attracting and retaining clinicians further limits access to care. Telemedicine has the potential to help with some access problems, but Internet connectivity and adoption of HIT lag behind in many rural areas.

Research on rural populations has been by small numbers in some research activities and by a lack of consistency in defining rural populations. More than two dozen definitions are used for different purposes by federal agencies, with criteria ranging from population size/density to land-use to commuting distance. In addition, although granular geographic identifiers (such as county and zip code) are needed to examine rural communities, such variables about individuals are not included in public-use data sets because of concerns that those living in sparsely populated areas could be identified.

For electronic health records to help solve the challenges of conducting research on small n populations, several conditions need to be present. The first is a critical level of adoption of relatively advanced EHRs by a range of providers (e.g., primary care physicians, specialists, hospitals, laboratory, and pharmacy) so that information about sufficient numbers of “small n” populations will be included. The second is having EHRs that not only support day-to-day patient care work, but that contain information that is sufficiently valid and reliable to support research. The transformation of information in EHR systems into databases that are of research quality requires extensive validation work. Experience in carrying out the needed quality control work is accumulating, as we will discuss below. Also critical is the ability to exchange the data within and across organizations, which requires both interoperability and the infrastructure for exchanging data. There are other conditions that must be met—such as systems to ensure the consent, privacy, and security that facilitates the sharing and use of the data while maintaining consumers’ and patients’ participation and trust—which we discuss later in the report. Here, we focus on aspects of these first three conditions and how recent legislation and health reform is facilitating more widespread adoption and use of EHRs and information exchange. While all of these conditions may not yet be fully in place among providers that treat small populations, it is important to begin thinking about research capabilities and infrastructure needs as the availability of these data are growing. In this report, we have reviewed the work of those who are on the cutting edge of using EHR data for research as a guide to understanding what may be more widely feasible in the future, and to provide lessons on how current challenges can be overcome in using this type of data for research on small populations.

The Health Information Technology for Economic and Clinical Health Act (HITECH) became law in 2009 as a part of the American Recovery and Reinvestment Act. HITECH made an estimated $27 billion available to enable eligible health professionals and hospitals to adopt, implement, or upgrade EHRs to achieve the “meaningful use” of HIT, as defined by the Office of the National Coordinator (ONC). The intent of meaningful use standards is to improve quality and efficiency of care through widespread implementation and use of EHRs among providers participating in the Medicare or Medicaid EHR payment incentive programs administered by the Center for Medicare and Medicaid Services (CMS). Meaningful use is defined through the regulatory rule-making process in three stages, ultimately resulting in a set of criteria for how EHRs must be used. As of August 2013, 56 percent of registered eligible professionals and 77 percent of registered eligible hospitals had received payment for meeting the meaningful use criteria.227

The HITECH legislation also established the Regional Extension Center (REC) and state health information exchange (HIE) programs.228 A total of 62 RECs provide technical assistance to “high priority” providers (e.g., physicians in small practices) to help them implement EHRs and achieve meaningful use. The HIEs work to facilitate data exchange among care providers within a region through a number of mechanisms.

The CDC’s National Ambulatory Medicare Care Survey (NAMCS) provides the best information about the extent of physician adoption of EHRs. Based on an expert consensus, NAMCS defines a “basic” EHR system for physicians as having the electronic capability for managing patient demographic information, patient problem lists, patient medication lists, clinical notes, and orders for prescriptions, and for viewing laboratory and imaging results.229 In 2012, NAMCS estimates show that 40 percent of office-based physicians used an electronic medical or health record (EMR/EHR) that met the criteria of a basic system, up from 22 percent in 2009 (a 48 percent increase).230 Earlier multivariate analysis results indicate that primary care physicians are more likely than other physicians to adopt and use EHRs, and that those practicing in large groups, in hospitals or medical centers, and in the Western region of the United States were more likely to adopt and use EHRs relative to their respective counterparts.231

Regarding EHR adoption in hospitals, in 2008, the ONC started funding an annual IT survey by the American Hospital Association. In 2012, approximately 44 percent of non-federal acute care hospitals reported having EHRs that meet the criteria of a basic system, defined as having a set of eight clinical functions (patient demographic information, patient problem lists, patient medication lists, discharge summaries, lab and radiologic reports, diagnostic test results, and orders for medications) deployed in at least one hospital unit.232, 233 This was an increase from 16 percent in 2009.234 Small, public, and rural hospitals were less likely than larger, private, and urban hospitals to have a basic EHR system. Similar—or slightly better—adoption patterns were found on a recent survey of children’s hospitals.235

Data related to health information exchange among hospitals and physicians is limited. Estimates from the AHA indicate that few hospitals are using EHRs to exchange health information: only 11 percent of hospitals reported in 2010 that they exchange key clinical information with other providers.236 However, a recent study found that hospitals’ exchange of health information with other providers and hospitals outsider their organization has increased by 41 percent since 2008.237 A recent survey estimates that approximately 15 percent of children’s hospitals exchanged health information electronically.238 Data are not available about the extent of health information exchange among office-based providers.

Despite the significant progress toward adoption of EHRs by physicians and hospitals, a significant number of obstacles have presented themselves. Barriers identified in recent review of some 60 publications included design and technical concerns, ease of use, interoperability, privacy and security, costs, productivity, familiarity and ability with EHR, motivation to use EHR, patient and health professional interaction, and lack of time and workload.239 Implementation challenges were reported among all types of users (e.g., public, patients, providers, and managers), but particularly among small, public, and rural providers.240

In sum, HITECH has provided focus and a major “spark” for the adoption and use of EHRs and the exchange of health care information, and considerable progress has been made. Additional incentives for the adoption and use of EHRS came from provisions of the Affordable Care Act (ACA) and include value-based purchasing, patient centered medical homes (PCMHs), and accountable care organizations (ACOs). Some geographic areas and types of provider or organized delivery systems that serve small n populations have reached a tipping point of having sufficient EHR adoption and exchange capacity to support research on some small population. Below, we discuss in further detail what kinds of information is or is not readily available in current EHRs and the implications for research on small populations.

To be useful for research on small populations, EHRs much include information identifying individuals as fitting into those populations, as well as information about their health and health care. For example, even if members of an Asian subpopulation were identifiable using EHRs, if they rarely seek health care or tend to seek care from places where there is less EHR penetration, or if language is a barrier to communication when they do seek care, limited information may have been recorded on their actual health and health care.

Much relevant information is routinely collected in EHRs in the process of patient care. In 2003, the Institute of Medicine identified eight core functions that EHR systems should be capable of performing in order to promote safety, quality and efficiency in health care. These functions include:241

health information and data

result management

order management

decision support

electronic communication and connectivity

patient support

administrative processes and reporting

reporting and population health

Additional functions common to EHRs include alerts for clinical preventive services, drug-drug interactions and drug allergies. Organizations have taken several approaches to obtaining a system with the needed functionalities. Purchasing a comprehensive system (often referred to as the “single-vendor strategy”) has been the most common approach among U.S. hospitals,242 but some piece together elements from different systems (e.g., scheduling, billing, and EHRs) and there is variation in what information is included in EHRs in different organizations.

EHRs typically include a patient’s demographic information, personal and family medical history, allergies, immunizations, medications, health conditions, contact and insurance information, as well as a record of what has occurred during visits with the provider.243 Information may be collected both at sign in at the registration desk and during the visit with the provider.

Patient-reported data

Basic contact, insurance, and demographic information about patients is collected at the registration desk or in the waiting room. Patients may also be asked for pertinent information about their health. Some providers use iPads or computer kiosks that allow patients to enter information directly into their EHR. Some also have patient portals that allow patients to view their information and to communicate with their health care providers. These can be set up to directly interface with the EHR,244 creating source of information within the EHR. At this stage of EHR use, all patients are not equally likely to use patient portals; minority patients may be less likely to use them and younger patients more likely.245

One benefit of collecting some information directly from patients through a written or computerized telephone questionnaire or patient portal is that it gets around the difficulty of getting staff to ask patients for information about such topics as race/ethnicity or sexual orientation.246 While challenges remain with how to word questions in order to identify LGBT populations, the bigger challenge remains training providers and other staff to ask the questions when there are common biases that may prevent them from wanting to ask or document this information.247 Both UC Davis and Vanderbilt health systems are beginning to collect information about patient’s sexual orientation and have opted to use patient portals for doing so.248 Given the opportunity to answer questions from home, patients may be more comfortable reporting certain information. Added benefit of reporting from home is that family members may help if there are language barriers. Geisinger Health System has started using patient portals to collect information about existing medications, and this information gets put into the EHR. Patient reporting may both save clinician time and include information that would not otherwise get entered. Vendors have developed tools such as clinical prediction rules and analytics engines to prompt clinicians based on information a patient enters.249

In recent years, there has been increasing effort to promote standardized collection of race, ethnicity and language data by registration staff in response to policy initiatives as well as accreditation requirements. Efforts often include staff training and patient education. For example, the Hospital Association of Rhode Island received funding for a five-hospital pilot to improve collection of race and ethnicity data. Its pilot included input from stakeholders on which granular ethnicity categories should be collected, standard interview scripts for staff to collect patient information, and materials to educate patients on why they were collecting the data.250

Clinical encounter data

Data collected during office visits and entered by the clinician into patient records during a visit may include reason for the visit, height, weight, vital signs, patient reported symptoms and characteristics (such as behavior and lifestyle), diagnoses, treatments and tests ordered, and medications prescribed. Information the pharmacy, laboratory and radiology are often incorporated into the EHR. This should include test results and imaging from other systems.

Clinical information may be entered in a structured format where the clinician can select from standard, predetermined categories such as diagnosis or procedure codes or medication list. Clinicians may also enter information in free-text notes in their own words or the patient’s words. For a condition such as autism spectrum disorder, relevant information may be entered as a diagnostic code or in free text about symptoms suggest the diagnosis or about patient or parental reports of such a diagnosis in the past. Diagnostic information may also be implied by the clinician’s prescription choices.

Although the use of electronic health records creates opportunities for standardizing much patient care information by setting requirements for data fields, many clinicians prefer to record information in the unstructured manner that was used when entering information into paper charts. Many clinicians have traditionally audio-recorded their notes from the visit, and voice recognition software can now transcribe audio-recording into free-text fields in the EHR.251 This preference may disappear over time as younger medical students who grew up using computers enter clinical practice. Whether information in an EHR is structured or unstructured has important implications for research, which will be described later in this report, but today most information contained in EHRs is unstructured.

Claims/billing information

Many providers have electronic practice management systems that handle functions like scheduling, billing, and collections. Such systems are increasingly being integrated with electronic health records. Although this is being done for practice management purposes, it can make the overall data system more useful for research. Billing systems can have more complete diagnostic and procedure information than do EHRs.

Some small populations may be identifiable using information that is now typically recorded in EHRs. Residents of rural areas may be identifiable by the address and zip code information that is collected for billing purposes, although not all providers collect updated address information at each visit, so some of this information may not be up to date. 252 In addition, lack of EHRs in rural practices and hospitals limits the availability of electronic health data on rural populations.253 While rural providers are increasingly adoption EHR systems, there will remain the problems of interconnectivity and interoperability. There is also evidence that critical access and small hospitals are at risk of failing to meet Meaningful Use criteria, which suggests there may continue to be limited data available on rural populations,254 even where EHRs are adopted. Therefore, conducting rural health research using EHR data may remain for the time being in the hands of a few integrated health care delivery systems with EHRs and data warehouses that serve large rural populations, which may not be representative of rural populations in general. Some of these organizations have been able to drill down within their rural populations for research or quality improvement purposes. For example Intermountain Healthcare has looked at rural patients with 3 or more chronic conditions,255 and Kaiser Permanente Northwest (KP-NW) has looked at rural Hispanic patients with Spanish as their primary language, among whom drug seeking behavior has been a particular problem. This population mostly receives its care through the Oregon Community Health Information Network (OCHIN) of federally qualified health centers (FQHCs), to which the KP Foundation Health Plan gave $1 million to purchase the Epic electronic health record software, so this network and KP are now collaborating on research. Since OCHIN hosts the EHR for nearly all the FHQCs in Oregon and the FHQCs are attempting to create a single medical record for each unique individual (rather than a separate record for each clinic visited by a patient), it is possible to identify drug-seeking behavior by patients who attempt to obtain opiate-containing drug products from multiple FQHCs at the same time.256

Adolescents with autism spectrum disorders may also be identified using date of birth and diagnostic information in the EHR. However, the autism diagnosis may appear in free text rather than in structured fields in the EHRs.257, 258 Even within structured fields, a number of diagnostic codes can indicate someone has an ASD. Kaiser Permanente in Northern California has developed a list of valid autism diagnoses based ICD codes and who made the diagnosis.259 There is also variability within or across provider organizations regarding who can authoritatively diagnose ASDs, as well as on the tests and benchmarks that are used. Diagnoses of ASD are often made at psychological testing sites that are separate the patient’s health care organization, particularly for those with higher incomes, and this may affect whether ASD appears in the organization’s EHR. Regardless of a family’s ability to pay, diagnosis of ASDs is also often made by school psychologists, especially at kindergarten intake. Providers of ASD patients’ medical care are not necessarily skilled at diagnosing conditions such as ASDs.260

An additional challenge when studying any adolescent population is that EHRs have generally been designed for adult populations, and pediatric EHRs thus far are not yet as robust. AHRQ and CMS are currently working to strengthen pediatric EHRs with key data elements. However, this work is still in the early stages. EHR and other electronic health data may be particularly important in moving forward research on pediatric medicine, a field where clinicians and families have typically depended on findings from adult clinical trials. A number of pediatric primary care practice-based research networks have developed that are beginning to explore the use of electronic health data for research.261 For example, Pediatric Research in Office Settings (PROS) is the American Academy of Pediatrics’ practice-based research network and has begun an EHR-based sub-network called ePROS. This sub-network was funded through the American Recovery and Reinvestment Act of 2009 and is being built to develop and test the infrastructure needed to conduct pediatric research using EHR systems. It includes providers from diverse practice settings across different states and using a variety of vendors, with plans to expand the sub-network substantially within the next one to two years.262

Using EHR information to identify patients who are members of specific Asian subpopulations or the LGBT population remains challenging at present. The broad OMB race/ethnicity categories are increasingly collected in health care settings, but recording information in medical records about patients’ membership in subpopulations such as Filipino or Vietnamese rarely happens. There are also variations in how “Asians” get recorded, sometimes along with Pacific Islanders (as per the OMB categories) and sometimes under “Other.” Indeed and more generally, the race/ethnicity information in medical records is of variable quality because standardization requires a degree of staff training that does not always occur.263

Because the Americans with Disability Act requires health care providers make interpreters available where needed, language information that may identify some Asian subpopulations may be in some organizations’ EHRs. KP-NW collects information about primary language spoken at home as well as need for translation services, and has standardized this variable across health plans so someone could easily look up language sub-groups, such as patients who speak Tagalog.264 At University of Vermont, refugee and immigrant patients have been identified through billing data where interpreters were used.265 Another approach to identifying racial and ethnic minorities may be use of last names as proxies.

Sexual orientation is almost never collected or entered into patient records, although a few organizations have begun to do so. Therefore, it is important for this and other characteristics not to impute null values where the fields are blank. UC Davis Medical Center has started using a form to collect information for entry into EHRs about patients’ sexual orientation as well as gender now and as assigned at birth.266 Some such information may already be available in provider notes based what patients may have said about behavior, attraction, or sexual identity. But there has been no standard way to collect this information, so it is difficult to create structured fields for this information. Some EHR vendors such as Epic do have fields to capture information about sexual partners and this can be used to run reports based on the sex of partners. Epic has expressed interest in receiving input from users on how to collect sexual and gender identity in its EHRs.267 The HMO Research Network’s virtual data warehouse has also incorporated sexual orientation as a variable, although they believe there is significant under-reporting of these data across participating health plans. An additional challenge even if this information is being collected is that sexual orientation may change over time, so the information in an EHR may or may not be up to date. This challenge also makes it difficult to identify transgender populations because gender is typically collected only once.

The availability of different types of information in an EHR provides multiple possible approaches that can be used to identify a population, and the potential to improve accuracy when these approaches are used in combination. For example, while there are limitations to using diagnosis to identify patients with ASDs, looking also at the ICD-9 codes and medications may provide information to supplement or validate the diagnostic information. However, some of these types of information may be more accessible and more highly valid in an EHR than others.268

For example, while ICD-9 codes tend to be readily available, it is variable how reflective they may be of the patient’s actual diagnosis. Information on family and social history are generally incomplete and of low quality. However, information such as vital signs (blood pressure, weight, etc.) tend be collected relatively frequently and recorded accurately. Lab results are not always available in an EHR, but when they are they provide highly reliable information and may also be a better indication of what the clinician was thinking than the diagnostic code. EHRs also keep fairly accurate record of what was prescribed, which may also serve to validate the diagnosis (for example, if prescribed insulin, the patient likely has diabetes). However, prescriptions may be less useful to study utilization considering up to 40 percent of prescriptions are never filled.269

EHR and other electronic health data are increasingly utilized for quality measurement and improvement, but until recently, the potential benefit of EHRs for research has not received much attention outside a few innovative, early adopting health care organizations. However, the use of EHRs for quality improvement has provided a foundation for extracting and formatting EHR data so it can be usable for other purposes, including research. In an EHR-based system, all quality improvement activities are implemented using the EHR. The wealth of information being collected has the potential to facilitate great leaps forward in both the scope and efficiency of clinical, health services and policy research.270 But, the answer to the fundamental question of whether EHR data are currently good enough for research on small n populations may depend on the definition of research and/or the specific kinds of research of interest. While EHRs may be well-suited for some types of research, it may be poorly suited for other kinds of research, and while the field has recognized this concept of “fit” between purposes and data it is still working through for which kinds of research EHRs and other electronic health data are currently well-suited and where further work is needed.

Health services research has been defined as “the multidisciplinary field of scientific investigation that studies how social factors, financing systems, organizational structures and processes, health technologies, and personal behaviors affect access to health care, the quality and cost of health care, and ultimately our health and well-being.”271 For example, EHR data has great potential value for comparative effectiveness research (CER) about drugs, medical devices, tests, surgeries, or ways to deliver health care.272 However, CER may require more precise and complete information than is necessarily found in EHRs and so may require additional investment to insure that the data quality in a given system is adequate to the specific type or aims of the research. However, even less precise and complete information may be useful to identify patient populations or potential areas for further study.

Today’s medical and pharmaceutical research largely consists of relatively small clinical studies using highly selected patients with only one health condition. Findings based on such study participants may have limited generalizability to patients in the real world who often have multiple conditions. The large volume of information going into an EHR creates the possibility of examining rich clinical information about large numbers of patients over time. While EHR-based research may not replace traditional methods of advancing medical knowledge and faces a number of challenges, there are examples in which innovative health systems and researchers have begun to demonstrate its potential for research. Data analytics engines have been developed to mine warehouses of EHR data, to provide the information about how patients with certain characteristics respond to a given medication or treatment.273

Analyses of data that have been collected in routine patient care have the potential to greatly increase the speed at which research can move forward. For example, researchers at MetroHealth Medical Center in Cleveland, Ohio were able in 11 weeks to study patient characteristics associated with venous thromboembolic events over 13 years among almost one million patients.274 Without EHR data, the resources required to recruit and follow so many patients over time would have been incomparably greater. Research to identify risks missed in clinical trials may be conducted through analysis of EHR data—such as Kaiser Permanente’s review of internal medical records that revealed the connection between Vioxx and cardiac complications.275 A benefit of EHR data is that once you identify a population, there may potentially be years of data already available rather than having to wait many years to collect the information, particularly in organized delivery systems.276

The fact that EHR data are already computerized and is available in real time substantially increases the efficiency of research, eliminating the need for extraction from paper records and data entry. Rather than being spent for data collection, resources can go towards programming and database work to prepare EHR data for analysis.277 The data are also timelier than claims or survey data, where there is often a significant lag involved in collecting and processing the data. Data collection in real time also eliminates the need for patients to recall something that happened in the past such as is often required in survey research.278 EHRs also include much detail about processes of care that isn’t available in claims data, as well as information on the uninsured. HRSA has made a substantial effort to invest in data capabilities of safety net providers for this reason—and research networks such as CHARN provide an opportunity to better understand populations where there might otherwise be very limited information. Use of clinical data from EHRs can also help reduce or mitigate traditional coding problems with claims and other administrative data.279

The availability of medical record data about all patients in a health system also allows for identification of small subpopulations where identifying information is available in the EHR, such as those in uncommon demographics or with rare conditions.280 Information may be present about patients who might not otherwise be included in research because they would not meet the narrow requirements for participation in a clinical trial.281 For example, EHR data has been used for observational comparative effectiveness research among patients with hard to detect co-morbidities, to identify patients for recruitment for interventions, and for population management research.282 The population covered by an EHR system may provide more representative information than comes from traditional research samples.283 As use of EHRs increase and efforts continue to improve interoperability of EHR systems and to create networks for pooling data, future research may be based or on actual populations rather than small samples.284

Another important aspect of EHRs is their longitudinal nature, which allows populations of patients to be followed efficiently over time so that, for example, outcomes of treatment can be studied. In contrast, surveys collect information at one point in time, typically asking if someone was ever diagnosed or currently has a condition. However, diagnoses change over time. For example, at KP-NW every diagnosis has a date stamp that begins an episode of care, and an end date is also recorded when the episode is resolved. In the EHR, a health problem list is available in a centralized place that displays a patient’s entire history of diagnoses received, as well as whether each is ongoing or has been resolved (as opposed to needing to review thousands of pages in a thick chart to get this information). In addition, the recent change that allows children to remain on their parent’s insurance coverage through age 26 increases the likelihood that they will remain in a given record system through their transition to adulthood, making it possible to follow those with a condition such as ASD through this transition.285 As the number of years covered by an organization’s EHR system increase, opportunities will grow for research that covers multiple generations of family members.286 With longitudinal data, there is the potential to make causal inferences, while this is not possible with cross sectional data. However, other factors must be carefully considered in interpreting longitudinal EHR data, such as organizational or national changes that may account for the observed change. For example, an increase in smokers among EHR data may result from increased documentation due to incentives for meaningful use rather than an actual increase in smokers.287

A limitation of EHR data, in comparison to survey data, is that the information is not collected or structured for research, which presents a number of challenges for research. While EHRs do include information of great potential value for research on small populations, a number of conditions at the technical, legal, and organizational level must be in place for such research to reach its full potential. These conditions and related challenges in meeting them are described in the following sections of this report, which are organized by these three categories. Technical conditions such as the need to convert EHR data into an analyzable format, legal conditions such as agreement over standards of privacy, and organizational conditions such as the infrastructure needed to share data across multiple institutions will be reviewed. Examples from our interviews and the literature of organizations that have begun to use EHR data for research demonstrate how conditions are coming together to allow the research opportunities to move forward. However, as we discuss in the conclusion, hurdles remain and additional steps are needed in order to take advantage of the opportunities at hand.

In order to use information in EHRs for research, it is first necessary for a number of technical conditions to be in place, such as the ability to extract and format data for research, as well as to address issues with missing data and data quality. As with claims data, the information in EHRs was not collected for research purposes. Whereas claims data are collected and entered in ways that help to maximize revenues, information is entered in EHRs to support provide patient care and to fit into clinical routines and workflows.288 In addition to assisting clinicians and health care organizations in their day-to-day work, the information that goes into EHRs provides documentation that is required by law, that is used for billing, and that informs, patient care decisions. For these purposes, there is not necessarily a need to ensure data are entered in a uniform fashion or to create the capacity for selectively pulling certain information from the system, aggregating data, or identifying certain groups of patients. The cost of converting the information contained in EHRs into databases suitable for research purposes is substantial and requires specific expertise.

Data extraction

Using data from EHRs for research requires extraction from an organization’s EHR system so that the data can be cleaned, reformatted, and analyzed. These steps require a substantial staff of programmers; their numbers depend on the system and vendor used.289 Some organizations create a data warehouse to store extracted data for secondary use—records in such a warehouse have a different architecture than an EHR, which is designed for clinical transactions.290 An organization may even have multiple data warehouses with the same data but in different forms to support various strategic functions, including resource strategic planning, resource scheduling and inventory control. Part of the problem is that various user groups often do not agree on the definition of variables, acceptable reliability rates and the list of variables to be extracted. However, these functions require data in a different format than exists in an EHR.291 For example, to facilitate access to information about any given patient, the design of an EHR may include many tables with a lot of linking, allowing clinicians to retrieve only certain information on a patient quickly, such as problem list or prescriptions. However, for research it is more useful to have all of this information in one large flat file.

This can be handled in various ways. Intermountain Healthcare has developed a central data warehouse where all information from its EHR, billing system, insurance product, registration system, and laboratory and radiology systems are pooled and linked. Data sets for research are then extracted from this warehouse rather than the EHR so that research does not interrupt the clinical care process or slow down the EHR.292 Rather than pooling to and extracting from a central location, Geisinger extracts data from 13 databases (including one EHR database and 12 databases from other clinical and administrative systems) and puts those into a separate database designed for research and quality improvement.293 New York City’s Health and Hospital Corporation (HHC) has data warehouses for each of its component hospital and community health systems from which aggregate data can be pulled. HHC has compiled several registries, such as a registry of some 60,000 diabetics that contains information that is used to track patients and improve outcomes.294

Intellectual property issues may be involved. Epic sells a data management product that extracts data from organizations’ internal files. However, because Epic considers these files to be intellectual property, client organizations are not allowed to share the internal variable names without permission from Epic. This restriction has been such an impediment that Kaiser Permanente is changing variable names used for many years that have Epic names.295 There are concerns that as large vendors such as Epic have gained market power, they are able to charge high prices while providing inflexible products and requiring additional costs for each functionality added to the EHR system.

Some research using EHR data has occurred by extracting a subset of data needed for the specific study either by manually identifying the desired records and/or variables, or by querying the system so it automatically retrieves the desired information. For example, a researcher may want to extract the records of adolescent patients with autism spectrum disorders. However, the information needed to select desired records may not be easily available for the computer to identify. While age is likely available to identify adolescents, diagnostic information is often not readily available on ASDs. In addition, not all systems were built to be queried. For example, Montefiore Medical Center in Bronx, New York, found that its system was not structured to be queried, and they needed to develop software to enable them to pull data for analysis from the system.296

Studies comparing the accuracy of automated versus manual extraction of EHR data on quality measures has found that the electronic method resulted in and underestimate of the rate of recommended care. For instance, the number of patients that received a clinical preventive service or who met a recommended treatment goal was undercounted when the automated method was used.297, 298 These findings suggest there are risks along with efficiencies in using automated extraction of EHR data for research purposes.

Part of the challenge is that the information needed to identify selected patient characteristics (e.g., autism spectrum disorder) may be spread across multiple fields but not expressed directly. For example, Kaiser Permanente developed and validated a software algorithm to detect episodes of pregnancy in patients EHRs. This algorithm searched for indicators of pregnancy in diagnosis and procedure codes, laboratory tests, pharmacy dispensing, and imaging procedures that are typical of pregnancy. Although using medical records to identifying which patients are pregnant seems straightforward, they found that it is not so easy to automate this synthesis of multiple data points from different sections of a patient chart, which is also difficult to do manually.299

Processing free-text data

Data extracted from EHRs must be converted to an analyzable format. The major difficulty for both data extraction and research is that a large portion of the data in EHRs has not been entered in a coded format. Desired information may be in free text that was entered by the clinicians to record their observations and assist with their decision-making. Even diagnoses may be put into free text by physicians because coding it is not needed for their day-to-day work. Some diagnoses (including perhaps ASD) may not be entered because of stigma concerns. Thus, relying on coded fields alone to identify patients with certain diagnoses may result in incomplete and perhaps biased representation.300 As part of an evaluation of its mental health integration program, Intermountain Healthcare looked for use of a depression metric among patients who received care at its organization. Intermountain found that even when mental health services were described in physicians’ notes, the corresponding data elements were often missing from the structured fields in the EHR.301

Free-text data are difficult to use in research they are highly heterogeneous, describing patients with similar characteristics or conditions in different ways. This variation makes it difficult to identify for data analysis patients with shared characteristics. The text may also not conform to standard grammar, may use acronyms and abbreviations, and may include typing and spelling errors. A clinician’s assessments may also be recorded as tentative, and the information may be context specific from subject to subject. A disease may be mentioned when it has been “ruled out.” Recording the nuances in each case both makes the information valuable for clinicians’ work and difficult to use for analysis.302

Active efforts are under way to find methods to overcome the limitations of unstructured data, and there has been great progress in developing algorithms and software for natural language processing with which to create standard categories from free text inserted into EHRs by clinicians. Researchers have been able to identify some populations searching for certain words or phrases in the free text of EHRs. For example, Dr. Jesse Ehrenfeld from Vanderbilt University developed and validated tools for natural language processing to identify LGBT individuals from their EHR data in order to determine whether such patient characteristics might be affecting diagnosis, treatment, and health outcomes. This process involves searching records for key terms such as “lesbian” or “bisexual,” but also looking for other indicators such as patients listing a same-gender emergency contact with a different last name. He reports that the initial search algorithm resulted in a false positive rate or 22 percent, but that after refining the algorithm to identify negation words for exclusion, only 3 percent of those identified as LGBT using the algorithm had been incorrectly classified as such.303

One systematic literature review of clinical coding and classification processes to transform natural language into standardized data found these processes had varying degrees of success.304 In general, the reliability of natural language processing programs appears to be better where variables are narrowly and consistently defined.305 Types of coding were found to fall into two primary groups: those that map text to existing classification systems such as international classification of disease (ICD) or current procedural terminology (CPT) codes, and those such as Dr. Ehrenfeld’s that used a coding scheme developed for a specific study to look for the presence or absence of certain terms or phrases.306

Despite the success of some efforts to covert free text into coded data, some experts caution that natural language processing should not be considered a magic bullet. Natural language processing requires computers that are very large and fast in order to process free text in a reasonable amount of time. In many cases, it may be more efficient and accurate to ask patients for the desired information rather than searching for it in the free text.307 Also, billing, lab, pharmacy or radiology databases may be better sources of diagnostic information than free text and may worth exploring before turning to natural language processing of the free text in EHRs. These utilization databases tend to be more structured than the problem notes recorded in the EHR.308

Other unstructured data includes scanned images, including radiology images but also PDFs of letters or records from other providers that have been scanned or faxed and then uploaded to the EHR. While useful for a clinician to open and view, converting them into something codable takes great effort and computing power. This issue is a whole sub-field of informatics by itself.309

Missing data and data quality

In addition to lack of standardization, the accuracy and completeness of data entered into EHRs are major concerns for research, since high quality and complete data are needed for drawing valid conclusions. Data quality has often been called into question when EHR data have been used for quality assessments. Compared to paper charts, electronic health records have been found to hold significant errors—in part because during this transitional period, many clinicians have not been accustomed to using a computer as part of their daily workflow. In addition to typos and spelling errors, errors of omission and commission have been found in medication lists and in problem lists where chronic and acute conditions are documented.310 Information entered in an EHR may also be affected by billing considerations. For example, some clinicians may not see the need to add secondary diagnoses for complex patients, if doing so would not affect the DRG payments. Such omissions may result in researchers’ underreporting certain diagnostic complexities.311

Because EHRs today may not reliably provide a complete picture of a patient’s health, researchers should guard against drawing conclusions as though they were complete, such as assuming that the absence of mention means that a particular characteristics, condition or treatment are not present. For clinical purposes, a physician may be more likely to record problems than improvement, particularly if there is no need for follow-up, but a researcher would need that information.312 In addition, some research that relies on EHR data may be skewed because the data do not include people who are unable to obtain care because of access barriers resulting from lack of insurance or differences in language or culture.313 This is a particular issue for the transgender population, which is often uninsured or seeks services that insurance does not cover, such as hormonal therapies, that have often been obtained outside the health care system.314 There is also the issue of patients moving in and out of EHR systems—either because they have stopped receiving care or have gone to another health care provider. For Asian subpopulations, they may even be going between countries and receiving care and taking medications they have obtained abroad. The mobility of populations can make it difficult to create cohorts and to make reliable inferences about them.315

The need for certain types of patient such as those with ASDs to see multiple providers (including mental health and medical providers) also makes it challenging to get a complete picture of someone’s health care through an EHR. Children may also receive testing for ASDs through the educational system that may not be shared with the child’s pediatrician. Although this challenge is related to the bigger issue of how the health system is organized, further development of the ability to share information among providers will be important in studying small populations. However, there remains the challenge of a patient may go to that do not have electronic data (such as some long term care facilities), making it more difficult to integrate the information into the patient’s electronic record with his or her primary care provider.316

However, increasingly integrated models of health care delivery should present opportunities to gain more complete pictures of patients’ care for study. In an integrated delivery system, a single organization provides most or all of a patient’s care across multiple settings. Integrated systems tend to be particularly advanced in the functionality and use of the EHR systems as a mechanism by which they can coordinate care across multiple settings. Therefore, a number of those interviewed for this report work in such organizations, and many examples we mention in this report come from integrated delivery systems. Shared EHR systems have permitted an increasing number of health care organizations to operate as virtual systems even though they are not a single organizational entity. This creates new opportunities to study patient care across multiple settings.

With the recent growth of accountable care organizations (ACOs) and the accompanying needed data sharing, researchers may increasingly be able to capture information about patients regardless of where they receive care. For example, because Essentia Health in the upper Midwest is an ACO, it has electronic access to patient information no matter where among the collaborating organizations they receive care, and Essentia can successfully request this information from other providers as a condition of getting paid for services for patients covered by the ACO contract.317

The growth of ambulatory networks connected with hospitals also facilitates this type of data sharing. For example, the Pediatric Research Consortium (PeRC) at Children’s Hospital of Pennsylvania (CHOP) is able to match outpatient data from CHOP’s primary care network with hospital data for patients who have received care in both. However, information is not available about care received in other settings, so the EHR system is most useful for the subset of patients who receive sub-specialty care within CHOP as opposed to the whole network.318

Restricted data

At times a portion of the medical record is restricted or separated from the rest of the patient’s information if it is viewed as sensitive in order to protect the patient’s privacy. This may be of particular concern for small populations where there may be an associated stigma, such as ASDs or LGBT populations. Patients with ASDs often receive care from mental health providers, and it is common for some or all of this information to be restricted. Even if it is included in the medical record, researchers may need special permission to be able to use it for a study—particularly as mentally disabled or cognitively impaired persons are considered vulnerable populations and therefore are a protected class of human subjects when research is considered by institutional review boards. This is an issue not only for EHR data, but for claims data as well—where any substance abuse claims must be removed when the data are used for research.319

Legacy systems

Because most EHR systems are relatively new, the number of years of available patient data varies by organization; information needed to look at a patient over time may be in paper charts or legacy electronic systems and not available for EHR-based research. Physicians in organizations that have upgraded their EHR systems may be able to login to the old system to access critical patient information stored there, but the information might not be readily available for research. The alternative ways to link legacy data into new systems all require time and resources.320

Needed expertise

The skills required to conduct research using EHR data are highly technical and specialized. A team of information systems staff is needed to support an EHR data warehouse to support care delivery, and translation to a research database requires another set of technical experts. This research informatics team must include programmers and analysts who build and maintain a research-focused warehouse.321 Higher education has yet to catch up with programs designed to provide training around these skills, which would require links between business and medical schools.322 The leader of this team must possess both IT skills and clinical expertise, and these individuals are in short supply as well, particularly as both the fields of medicine and technology have been quickly evolving.

It is also crucial that individuals conducting EHR research have knowledge of research methods specific for EHR data because a unique longitudinal data set is being repurposed. Expertise needed include statistical expertise to format and analyze the data, and the ability to interpret findings while considering how the data were collected and formatted, as well as any limitations connected to the patient population and the context. These considerations require individuals with expertise around organizational and policy history that may affect how data was recorded. For example, an organization’s decision to train staff on the collection of race/ethnicity data, whether for internal purposes or to comply with policy or accreditation requirements, may explain a perceived growth in the number of patients they serve from a certain Asian subpopulation over time. Changes in the system, personnel, and social history need to be documented and considered when interpreting data. Therefore, it is important that data warehouses and networks collaborate with their participating organizations and providers.323

In addition to technical requirements for data extraction and analysis, there are legal requirements that complicate the repurposing of EHR data for research. Privacy and security may be of particular concern for small populations, where individuals may be easily identified with just a few variables. In addition, particularly where there may be issues with stigma, individuals from small populations may not want to be identifiable by their employer, school, or others who may access the data. Institutional review boards are used to requiring that data are used for only one project for which patients consent, and that identifiable data are destroyed at the end of the study. Such requirements create barriers for the use of EHR-based data for research. Usual practices for protecting privacy and security may need to be reconsidered when EHR-based data are to be used for research. This data source will have increasing potential to answer additional research questions as more information is collected over time. Alternatives to study-by-study review and consent requirements will need to be found if the potential of EHR-based data is to be realized.

Legal landscape

Presently, the two federal laws most relevant to the use of electronic health data for research are the Health Insurance Portability and Accountability Act (HIPAA) and the Common Rule.324 In addition, there are state laws that govern the use of health data tend to go beyond the protections provided by HIPAA. While HIPAA allows covered entities (including most health care providers) to access, use and disclose identifiable personal health information for treatment, payment, and health operations (including quality improvement), the HIPAA Privacy Rule requires informed consent be obtained from individuals to use this information used for research. The Common Rule covers research conducted using federal funding from certain agencies, and defines research as “systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to general knowledge.” Application of these two laws broadly defines what is legally considered research today.

The original HIPAA legislation was passed in 1996—before the use of EHR based data for research was foreseen. Concern is growing about how much the HIPAA rules and their local application may deter important research based on secondary use of patient records.325 The HIPAA omnibus rule was changed earlier this year with the intention of increasing protection and control of personal health information, particularly in light of the growth of electronic data. Individual rights are expanded so patients can ask for a copy of their electronic medical record, as well as instruct their provider not to share their information with their insurance company if they pay in cash. In addition, the new rule aims to reduce individual burden by allowing the use of their health information for future research purposes.326 This however does not address the need for consent for secondary uses of already collected data for research.

There is ongoing legal/ethical debate about the role of restrictions based on HIPAA and human subjects’ protection in governing the use of EHRs for research, as well as on the blurring line between the use of the information for quality improvement and for research. The IOM has suggested that in a learning health care system, the distinction between research and quality improvement or other internal uses is artificial, and the laws remain unclear on this difference as well. Out of caution, IRBs tend to treat all secondary uses of data as research—a practice supported by publication policies of many academic journals that require IRB approval for results to be published.327 Other countries such as the UK and Canada are in the midst of similar debates around balancing the need to protect privacy with secondary uses of data for research. Some countries such as Denmark have concluded that database-driven research should be allowed without the consent typically needed to protect research subjects because of its contribution to the common good without disrupting people’s everyday lives. Because studies entirely based on national registries or clinical databases can be done without patient consent, a growing number of population-based studies using EHR data are being done in Denmark.328, 329

In addition, where there is lack of clarity or knowledge on the details of the laws, researchers tend to air on the more conservative side where they perceive there may be a potential issue for their IRB. At times, it is even unnecessary to go through the IRB but it is done with intentions of being cautious—but also creating unnecessary expense and patient and provider burden that at times are not legally necessary.

Opportunities for patients to make meaningful choices

While the intent of informed consent is to respect patient autonomy, it has been argued that the public benefit of health research is greater, particularly if adequate provisions for protecting data confidentiality are present.330, 331, 332 The burden would be intolerable if patients had to be re-contacted for consent for each new research use of a database that contained their records. The ability for patients to now give consent for future research given the update to HIPAA may help relieve this burden. However, patients may want their information to be used for certain purposes and not others, or change their mind over time. Interestingly, there is some evidence that patients view the use of medical records to be part of the health care routine and a necessary part of receiving good treatment rather than considering it in terms of the costs and benefits of participation in research.333 There are a full range of practices that can help patients make a meaningful choice, such as transparency around how their information will be used, who will use it, and allowing patients access to their own data.334

The benefits of seeking individual informed consent before using their EHR-based data for research are increasingly seen as coming at too high an administrative burden on research.335, 336 Of even greater concern is the potential for bias when records of patients who have not consented are excluded. One national survey found strong support and willingness to share one’s electronic health information for research,337 and evidence is accumulating that patients who refuse to agree to the use of their records in research differ in various ways from those who agree. A recent review of 17 such studies from around the world (including 5 from the United States) found differences by age, sex, race, education, income, and health status between patients who did and did not consent to the use of their medical records for research.338 Such differences could bias research results or limit generalizability of findings. This could be particularly problematic in research on small populations. In addition, there are specific issues with including child populations (such as adolescents with ASDs) in research because they are not legally able to provide informed consent, which implies understanding of the potential risks of participating in research. Parents must provide consent on their behalf, but may uncomfortable with their children being included in research studies. Until recently, children were rarely included in medical studies. Agencies such as the FDA are making an effort to educate parents on the importance of including children research.339

Both HIPAA and the Common Rule have been criticized for over-emphasizing patient consent rather than providing more comprehensive opportunities for patients to make meaningful choices.340 Organizations that conduct a lot of research using EHR data have taken a number of approaches to issues of meaningful choice and protecting patient privacy. These approaches include obtaining general consent from patients at the time care is being provided for the use of their records for research, standardizing IRB documents, classifying studies as quality improvement rather than research, and using de-identified data. For example, Essentia Health asks patients to sign a general consent form each year to use their data for research purposes. Only 1–2 percent of Essentia’s patients have been opting out, and those who opt out don’t appear to be different from those who do not demographically. This general consent applies only to research conducted within the health system and its research institute, and IRB approval is needed for use of the data for research.341 Geisinger Health System requires IRB approval for each research project, but has standardized the needed documentation to streamline the process. They also take additional steps to protect patient information, such as altering dates in the copy of the data used for research to protect confidentiality.342

For Kaiser Permanente, when someone signs up to be a member, they are informed that their data will be used for “approved research purposes.” Members may request to be excluded from all future research projects or from all genetic research. IRB approval is not needed when identifying information in EHR-based studies is used only to make linkages and then removed.343 Vanderbilt has also granted a waiver of consent under the IRB Common Rule to allow research on LGBT patients without consent since the data are de-identified after extraction. However, patients do have the opportunity to opt out of studies.344 New York’s Health and Hospital Corporation makes only de-identified data available to researchers.345

Other health systems such as Intermountain Healthcare and UC Davis conduct some studies that are classified as quality improvement rather than research, and these do not require IRB approval or informed consent.346 Such classifying of studies as serving operational purposes may avoid the privacy protections needed for research (defined as intended to generate generalizable knowledge new findings for publication), there are tradeoffs. If the activity is conducted for quality improvement or other in-house purposes, the investigator may lose ability to set priorities, be unable to invest the time needed for a rigorous study, or to candidly share findings externally. This disincentive to share knowledge externally prevents much of this type of work from contributing to a learning health care system.347 On the other hand, analytics performed for internal uses such as quality improvement may have the benefit of leveraging available data facilitate studies that are quicker and less costly than traditional research. 348, 349

De-identified data

HIPAA’s Privacy Rule does not regulate de-identified data, and it specifies that data can be de-identified using safe harbor criteria (the removal of 18 specified data fields that could be used to identify an individual) or statistical methods (demonstrating extremely small statistical risk that an individual could be identified). Statistical methods are less commonly used because the description is vague and there remains lack of a standard approach.350 In addition, individuals with the knowledge needed to make an expert determination that the statistical risk is sufficiently small are in short supply. However, some organizations such as Vanderbilt’s Multicenter Perioperative Outcomes Group, a consortium of 30 medical centers aggregating EHR data, patient reported outcomes and administrative outcomes,351 have opted to seek this expert determination instead after finding use of the safe harbor criteria to be more challenging, particularly when pooling data from multiple centers. The Privacy Rule does allow the alternative of using a limited data set that includes certain geographic and date information considered important for patient-centered outcomes research, but then requires a data use agreement between the data holder and the recipient. Researchers at Kaiser Permanente have found limited data sets to be useful for research when the length of time between events can be included where full dates are not allowed.

While eliminating the need for informed consent, de-identifying data may remove the information needed to identify small populations. For instance, removal of geographic identifiers makes it impossible to identify residents of rural communities. In addition, de-identified data complicates linkage of patient records from multiple sources, such as with lab or pharmacy data if not integrated into the EHR or across multiple institutions where the patient may receive care.

Governance

Governance processes specifying who owns, controls, and regulates the data must also be in place in order to use EHR data for research. Data governance is generally understood to include legal and regulatory concerns, the structure and role of governance bodies, IRB issues, properties of data, data sharing considerations, business issues, stakeholder engagement and participation, and sustainability.352 Institutions may designate committees or have designated employees responsible for these issues. Data governance has also been described as the process designated for the data steward (such as a health care organization) to carry out its responsibilities. A data steward has fiduciary responsibilities toward the data, or has been trusted with information that patients consider private. The role of a data steward continues to evolve both conceptually and legally, particularly as health care data have potential not only for research, but are already used for many purposes in the public interest such as for quality monitoring and improvement.353 There remains a lack of coherent policies and standards to help govern the secondary use of health data.354

In the absence of specific governance structures for research processes, some organizations such as New York’s Health and Hospital Corporation have developed a data warehouse and use the data for quality improvement; their data are used less frequently for research.355 However, building this infrastructure is resource intensive and obtaining funding for this type of development may be difficult for health systems. One of the reasons Essentia developed a separate research institute was because grants are often unwilling to pay for programming at the site of day to day operations.356 Geisinger has also developed a separate Research Center which is based on an honest broker system where researchers can request to look at a topic (such as diabetes and a specific genome), and then the broker runs the database and shares the results.357 Some health systems are creating new companies that house and mine their electronic health record data and to combine them with other sources such as EHRs from other health care organizations. Two examples of health systems with such companies are Montefiore (Emerging Health Information Technology) and MetroHealth (Explorys).

Because of the previously mentioned limitations with using data from a single organization’s EHR for research, the ability to combine EHR data with other electronic data sources is often needed to strengthen study results, particularly for small populations. Combining EHR data across institutions can allow for a larger sample size to increase the likelihood of being able to study small populations, as well as offer a more complete picture of patients that receive care in more than one place. While providing additional information, using data from multiple data sources for research does come with an additional set of challenges and requires a number of organizational conditions be in place, as described in this section. Examples of multi-organizational efforts such as research networks are described below where organizations are already working together to overcome these challenges. In addition, a number of other data sources that may be combined with EHR data to further facilitate research on small populations are described at the end of this section.

Using EHR and other electronic health data from multiple organizations

In order to conduct research with data from multiple organizations, a rationale and a mechanism are needed for organizations to share the data. The technical and legal issues associated with data sharing have received considerable attention throughout the implementation of provisions in the HITECH Act to promote health information exchange to improve the quality of care. There are two major ways that data can be share across multiple institutions: through a consolidated warehouse where a copy of the data from each institution is stored, or through some form of distributed network where the data remains stored with each organization but can be queried to retrieve standardized results from multiple databases. An additional criticism of the current legal framework surrounding human subjects research is the lack of guidance around the technical architecture of databases, although they may involve creating multiple copies of a patient’s data.358

While centralizing data in a warehouse may increase efficiency when standardizing and querying the EHR data, it requires resources to build and maintain. In addition, there are privacy and governance issues associated with creating a copy of patient information and storing it outside the organization when these data were collected for the organization’s use in caring for the patient.359 Also, as the data are centrally combined from multiple organizations, it becomes further removed from the different organizational contexts where the data were collected that must be considered when interpreting the data, such as changes in how the data was collected and documented over time. In addition, centralized data warehouses may be less flexible as all required data elements must be contributed by the organization in advance and then remain in the warehouse, giving organizations less control over which data they want to contribute for what purposes.360

As an alternative to creating a central warehouse or database, a virtual data warehouse may be created where data remains in separate home locations. This alternative may be more viable as it bypasses the need for investment outside the organization in building a separate infrastructure, and also simplifies the issues of data ownership. Virtual warehouses are easier to implement and more private because data remain at the collaborating organizations (referred to as a distributed network). Secure, remote analysis of these separate databases occurs through a central portal that queries and distributes results. Organizations may decide which data they are interested in contributing and what studies they want to participate in. One common type of distributed network is a federated research network, where separate, heterogeneous databases from multiple organizations make up the distributed network and each organization retains control of its own data. 361, 362 For example, ePROS is creating a federated database that links data from multiple organizations in order to allow for queries of de-identified patient data.363 Often the databases include standardized content areas, data dictionaries, and methods to define individuals. 364 While more efficient than a centralized model, investment is still needed in the administrative and governance infrastructure to maintain security and ensure appropriate use of the query function.365 A number of distributed research networks are being piloted to support clinical effectiveness research (CER).366, 367

Source: Hornbrook et al. Building a Virtual Cancer Research Organization. Journal of the National Cancer InstituteMonographs. 2005 (35), 12-25.

However, there are some reasons an organization may select a centralized warehouse instead of a virtual one. For example, the Community Health Applied Research Network (CHARN) chose a centralized data network because to house the data where it originates as in a virtual network, each participating organization needs to have its own infrastructure. However, because CHARN’s participants are community health centers that have limited resources, they lacked the capacity to make a virtual network an option. Cost would also be a significant barrier for each community health center to maintain its data locally. Finally, data quality was a consideration when CHARN selected a centralized database. Because of the variability among community health centers, were they to request data from each center it would be difficult to know what types of problems there may be in terms of outliers, omissions and commissions in the data. Therefore, they decided it would be simpler to look at the data all together. The issues faced by community health centers may be common among other under-resourced organizations that provide care for certain small populations, such as health care organizations in rural areas.368

An additional alternative to a distributed warehouse where data are still contributed for central analysis is to have distributed analytics. This approach is being used by the Massachusetts eHealth Institute, where participating organizations to contribute just the minimum information that is needed. While this approach addresses a lot of privacy related concerns, it does require participating organizations to conduct some of their own analytics before contributing their results.369

No matter which method is chosen for sharing data, each strategy requires significant infrastructure development, both technically and organizationally. One study of research teams that have developed such infrastructure to support CER identified a number of challenges, including the substantial effort required to establish and sustain partnerships for data sharing, understanding the strengths and limitations of their clinical information platforms, and the need for rigorous methods to ensure data quality across multiple sites.370 Another study involving interviews with multi-site research initiatives around data governance found a number of challenges related to data governance, but also found these initiatives are using strategies to address these barriers such as capitalizing on pre-existing relationships, beginning with smaller studies and then expanding, developing legal and policy documents with broad input, exchanging de-identified data only, and structuring governance bodies with broad representation.371 It is important that each organization contributing data is represented in the analysis as well in order to provide context on how the organization has changed, which affect how the data are interpreted. Particularly for those who care for certain small populations, the organizations that care for them are likely unique as well and need to be able to provide that context. The uniqueness of each organization may result in quality issues once their data are combined, even if data from the individual organizations are of high quality on their own.372

Funding for research infrastructure development is rare, as currently most grants and contracts pay for specific, discrete studies. However, in recent years the availability of this funding has increased. For example, the American Recover and Reinvestment Act of 2009 allocated $100 million to building infrastructure to use electronic clinical data for CER, patient-centered outcomes research, and quality improvement.373 In addition, in 2013 the Patient-Centered Outcomes Research Institute is investing $68 million to support the initial development of a National Patient-Centered Clinical Research Network to build the capacity needed support CER. There are currently three funding opportunities related to building this national network.374

In addition, for studies that include data from multiple organizations, approval may have to be obtained from multiple Institutional Review Boards, adding to the time and resources needed to conduct the research. Where organizations are from different states, there may also be different state laws governing health information to which each organization must comply. Some approaches to minimizing this burden have included careful distinctions between quality improvement and research-driven interventions, particularly where projects are low-risk. Negotiation of an arrangement where a central or lead IRB with particular expertise in the area first reviews the study and then other IRBs can accept their review may also be another solution.375 In addition, where research is conducted across distributed databases using methods such as distributed regression, the only information exchanged is statistical results rather than the underlying data. This technical strategy is one solution to protecting patient privacy. However, an issue with small populations is that unique individuals relative to their surrounding population can potentially be identified. In fact, some researchers are finding that people may re-identify themselves, even when given privacy protection.376

Finally, a process is needed to ensure the quality of multisite data for research, including prioritization of variables and dimensions of quality for assessment, development and use of standardized approaches to assessment, iterative cycles of assessment within and between sites, targeted assessment of data known to be vulnerable to quality problems, and detailed documentation of quality to inform data users—particularly in determining whether the data are fit for use in CER studies.377 Ideally, these efforts should be shared among the collaborating organizations on a continuous basis to keep pace with new versions of existing software and the introduction of new software to manage health care processes.

Interoperability of EHR systems

Research among multiple institutions is facilitated by interoperability of their EHR systems. In its absence, a large amount of effort is needed to integrate data. One of the reasons that building the infrastructure to share data is so challenging from a technical standpoint is the lack of interoperability among different EHR systems. Just among providers who have been able to demonstrate they are meaningfully using their EHRs based on the criteria specified under the Medicare EHR incentive payment program, 333 different EHR vendors have been used, although consolidation is occurring in the EHR industry with the top 5 vendors increasing being used by a larger share of providers.378 While the industry continues to consolidate, the wide variety of systems currently in use has led to two major challenges: 1) Syntactic interoperability, or the ability for systems to communicate with one another to exchange data; and 2) Semantic interoperability, or the ability for systems to understand the data exchanged. The ability to exchange data is more easily solved. However, differences in vocabulary and classifications are a more difficult problem, particularly when trying to identify members of small populations across multiple institutions.379 Even within a single organization’s EHR, standardizing the data is a challenge. This challenge is amplified across multiple organizations. Even for seemingly well-defined concepts there is variation. For example, what one system may call “high blood pressure” another system may call “elevated blood pressure.”380 Or, systems may use different race/ethnicity categories.

There are a number of efforts to create standards for EHR data, including the Health Level Seven International’s (HL7) Continuity of Care Document. HL7 is the global authority on standards for interoperability of health information technology. In partnership with ASTM International—another developer of voluntary consensus standards, the Continuity of Care Document was developed to foster interoperability by promoting standardization across systems through the use of templates representing typical sections of a patient’s EHR.381 While progress is being made in moving toward interoperability standards, the current set of standards are not at a level that solves many of the problems of researchers we talked to. Many of those we interviewed have been working with their vendors and other health care organizations as well to develop strategies for sharing data despite the lack of a single standard, universal approach to interoperability.

In addition, five major health systems, including Intermountain Healthcare, Geisinger Health System, Group Health Cooperative, Kaiser Permanente and Mayo Clinic have created the Care Connectivity Consortium as a pioneer effort and have achieved interoperability across multiple vendors to enable the sharing of patient information.382 While primarily motivated by wanting to provide a model by which EHR data can be shared across institutions to improve patient care, the ability of health systems to overcome interoperability challenges will also have significant benefits for research.

Those we interviewed felt that major vendors and federal incentives can both play important roles in promoting standardized data fields and formats across different EHR systems. For example, if Epic includes sexual orientation and gender identity in its system, that could lead to it becoming an industry standard. However, some smaller vendors may not invest in including these fields in their products unless it is added to Meaningful Use criteria.383 Meaningful Use requirements as well as quality reporting requirements for accreditation and recognition programs do all have the potential to help lead to greater standardization and interoperability across systems.384 While Meaningful Use presents only minimum requirements for standardization, physicians have the added incentive to do more because it enhances the value of their practices to potential purchasers.385

Research agencies also have the opportunity to promote standardization through what they fund. Although Meaningful Use itself may only do so much, in combination with other levers and incentives, the availability of standardized EHR data for research will likely continue to increase.386 In addition to interoperability across EHRs, there is the need to integrate supply chain, financial, and clinical data to provide a fuller picture. For an organization like the Health and Hospitals Corporation, which includes hundreds of systems, many decisions and definitions used by each individual component of the system do not align once information is brought together. For example, in terms of defining a visit or encounter, a clinician may only consider a patient to be discharged if they are alive, but from a financial standpoint, a discharge is some who is alive or dead. Or, the name of the same doctor may be entered differently in different systems (for example, whether the last name is listed first or second, whether the title Dr. is included, etc.). Going back and standardizing the data across systems is a lot of additional work. In the long run, it will be important to align these different types of systems as well.387

Practice-based research networks

Practice-based research networks (PBRNs) have facilitated much of the research using EHR data from multiple institutions. PBRNs are groups of primary care clinicians and practices that work together to answer community-based health care questions as well as to translate research findings into practice. AHRQ has devoted funding to support PBRNs through targeted grant programs as well as by supporting a resource center, learning groups and conferences. The DARTNet Institute is a growing collaboration of PBRNs (currently including nine of them) that is building a national collection of data from electronic health records, claims, and patient-reported outcomes for the use of quality improvement and research.

Research networks can make a wealth of clinical information available for research through their EHRs. The organizations within a network are often already either sharing a common EHR system or have worked to develop some form of centralized or distributed data warehouse for research purposes. In addition to PBRNs, there are other research networks that expand beyond primary care practices. The Cancer Research Network, a collaboration of integrated delivery settings funded by the National Cancer Institute of the National Institutes of Health, is another example of a network created to facilitate research. Still another example is the Community Health Applied Research Network (CHARN), a network of community health centers and universities established to conduct patient-centered outcome research among underserved populations. Members of CHARN include Kaiser Permanente Center for Health Research (which serves as the coordinating center), the Association of Asian Pacific Community Health Organizations (AAPCHO), Fenway Health in Boston, OCHIN in Oregon, and the Alliance of Chicago Community Health Services.

Research on small populations is increasingly feasible as networks of EHRs with common structures and formats have developed. There is also the potential to link data across systems to identify a cohort of interest.388 For example, within the Cancer Research Network, any of the individual health plans will likely include the numbers of patients needed for research on any of the five to seven most common cancers. However, for pediatric cancers or rarer cancers, data must be pooled from multiple medium sized sites or perhaps the two KP California regions to obtain sufficient number of cases for research. Most rare cancers require use of data from California, where KP has 4 million members in its EHR system.389

One challenge for PBRNs is that securing permission from individual practices and their vendors to access their server can take some time to make sure everyone is comfortable with the arrangement.390 Even after practices agree to participate, data use agreements must be established that are specific enough to provide protection, but flexible enough to accommodate research. Often additional, unanticipated data elements are required for research, requiring the revision of data use agreements, as well as working with IRBs at multiple institutions.391

EHR vendors have not yet played a big role in networks, which have mostly been built either by health systems or grant funded. However, it appears vendors are currently trying to better understand this space since there is a potential business model. While the involvement of vendors may provide additional resources and help move forward network technology, there is the danger that as the data becomes perceived as more valuable, it may make data sharing more difficult. This may also pose a threat to the current public/private partnership where the data collection occurs in the private sector without public and private sector researchers paying them to do so.392

Regional health information exchanges

While initially envisioned as another major source of patient data, it is unclear what role regional health information exchanges will play in the future of EHR-based research. One of the original purposes of the Office of the National Coordinator of Health IT was to facilitate the development of regional health information organizations (RHIOs) that would facilitate health information exchange among stakeholders in their region’s health care system. These RHIOs were intended to provide the infrastructure for a national health information exchange. However, their development has faced a number of barriers, including many of challenges mentioned in this report in EHR-based research, particularly lack of resources for infrastructure.393 Further removal from the day-to-day patient care would make data quality and interpretation an additional challenge when using data from these regional exchanges for research. There have been examples, however, where regional health information exchanges have provided data for regional quality improvement efforts.394

Linking EHR and other electronic health data with other data sources

A number of other data sources may be linked with EHR data to provide additional information for research, as well as to validate information in the EHR available to identify and study small populations. Data linkage requires that at least one common identifier be available in both sources that can be used to link records. Unique identifiers that are commonly used to link data at the patient level include social security numbers, health insurance claim numbers, and medical record numbers. Hospital or area level identifiers may also be used for linkage to organizational or geographic level data. Commonly linked administrative databases include disease registries, claims files, survey data, provider files, and area-level data.395 Additional clinical information—such as genetic, care management, and social network information—also has the potential for linkage with EHR data for research. Several examples of additional data sources for EHR-based research are described below.

Patient Registries

An electronic data source that may be useful for research in combination with EHRs are patient registries, where uniform data are collected from multiple institutions in a central database for a population defined by a particular disease, condition, or exposure. This data may be directly pulled from EHRs or require manual entry based on information from the patient’s record. Registries are a simpler form of consolidated data. They include only a core set of relevant data elements for a specific purpose. Registries may be local, such as immunization registries or vital statistics departments that collect birth and death data. Death records may be particularly important because death is often difficult to determine from an EHR. There are also national registries, such as the CDC’s National Program of Cancer Registries, and the National Cancer Institute collects information on diagnosed cancer cases and cancer deaths simply to measure incidence and mortality.396 The Institute’s tumor registry adheres to national and accreditation standards and has specialized staff that pour through records in local registries looking for evidence of cancer, including blood cancers. Although labor intensive, it is currently more accurate to use a manual process to determine which records should be included in the registry. In contrast, an automated process to query the registry for records of interest may be used if the records included are already well validated. Local registries are often able to accept EHR data and accept edits from providers. One complication is that at times, data can be corrected in the registry but not in the EHR source data. Registries may collect some patient demographic data in order to determine whether certain populations bear a disproportionate burden of the disease.

Information from registries has been linked to EHR data in order to identify patients with specific conditions. For example, in one study a tumor registry was linked to the Cancer Research Network’s distributed data warehouse to identify cancer cases. Race and ethnicity in this study were extracted from cancer registries as well. This study was able to look across eight years of data to examine whether someone’s health care utilization increases directly prior to diagnosis of a new primary cancer.397 The ability to look back to before patients were diagnosed with a certain condition is another unique benefit of research using EHR data and has the potential to improve our ability to identify patients who are at greatest risk of disease to improve targeting for preventive interventions.

Registries can also be linked to EHR data for data validation, such as was done in one study that linked clinical databases with a cancer registry to confirm cases of cancer. In this particular study, they found that 98.9 percent of cases overlapped. The use of multiple data sources presents opportunities to improve data quality for research. For example, addition of death data from a cancer registry to the clinical database allowed for more accurate stage-specific and overall survival figures.398

While registries and EHRs can combine to provide a fuller picture, like EHRs, patient registry data may be incomplete as well. It remains a challenge both to motivate clinicians to participate in registries and to facilitate easy transfer of information from patient records into the registry.399 Some studies have suggested there may be systematic bias when using only records that can be matched between multiple data sources, such as EHRs and registries. A review of the literature around this topic found a number of patient or population factors such as age, sex, race, geography, socio-economic status and health status that may be associated with incomplete data linkage. This association may result in a systematic bias among clinical outcomes reported from such studies.400

An additional limitation of some registries such as the National Cancer Institute’s Surveillance Epidemiology and End Results (SEER) registries is that they do not identify the recurrence of cancer. Researchers at Kaiser Permanente are trying to address this gap by looking for utilization clusters in claims as well as digital images to identify recurrence. The potential to use pattern recognition to analyze digital images may increase the accuracy of automated approaches to identify cancer incidence for registries and other purposes, potentially finding more than the human eye could have recognized.

In addition to registries, other systems that exist for surveillance purposes may provide useful electronic information. For example, the FDA’s Mini-Sentinel Network is a large multi-system collaboration to track exposure to specific drug products and to conduct case-control studies to identify unexpected adverse events. Participating sites agreed to make their patient medical records available to verify any statistically-identified associations. Because this effort is classified as public health surveillance, no IRB compliance is required.

Genetic Data

As the field of genomics has rapidly evolved in recent years, the routine generation of genetic data for individual patients has received much attention from the general public. The clinical utility is now limited by current inability to effectively process, store, update and interpret genetic data while protecting patient privacy.401 However, efforts have begun to integrate genetic data into EHRs,402, 403 opening many additional possibilities for research. For example, the mining of EHRs with genetic data may reveal previously unknown disease correlations based on patient genetic make-up.404

The National Health and Nutrition Examination Survey (NHANES) has collected DNA specimens from participants from 1999 to 2002, which may be used for secondary analysis and can be linked with the survey data. For permission to use the data, researchers may submit proposals to the Centers for Disease Control’s Research Data Center (RDC) for approval, and analysis must occur at a RDC location.405 In a study funded by the NIH, Kaiser Permanente in California has been able to link genetic information with its EHRs. By collecting saliva from 100,000 members, Kaiser has examined the associations between genetics and smoking and drinking habits as well as body mass index.406 While these saliva samples were expressly collected for research purposes, there have been other instances where blood or other bio specimens collected for medical purposes were reused for research.407 Instances such as these bring to light the need for clearer consensus and guidelines about the appropriate secondary use of information collected for clinical purposes. One example that may serve as a potential model is the open-consent framework used for the Personal Genome Project, where consent implies research participants accept that their data could be included in a public, open-access database with no guarantee of anonymity and confidentiality.408

Other Data Sources

A number of other data sources provide opportunities for linkages with EHRs. For example, claims data in the Healthcare Cost and Utilization Project (HCUP) databases now feature new linkage capabilities, including ability for linkage to clinical data from labs, trauma registries, EMS data and nurse staffing data.409 AHRQ has sponsored a number of clinical data pilots to demonstrate the feasibility of linking hospital lab data with HCUP data.410 Claims data may be an important supplemental source when studying insured populations because it can provide information on care provided across health systems. It may also currently be more useful to identify utilization such as visits or procedures better than EHRs. Although many health care organizations are now using EHRs to bill, EHRs likely only include their own claims, requiring claims for care received elsewhere to be obtained from another source such as the payer.411 The increase of digital data in all health care settings presents numerous opportunities for research.

In addition, the emergence of care management software programs that track weight, exercise, and medication adherence provide additional information that some providers are entering into EHRs. These programs may download data from pedometers to measure aerobic activity,412 and have been used for employee incentive programs run by employers or insurance companies. There remains much potential to develop interfaces whereby these types of programs can directly link to EHR systems. There has also been interest in incorporating personal health data from social networking websites and applications on mobile devices into health records for medical care as well as research and public health surveillance. For example, entries on Twitter about disease outbreaks have been correlated with official public surveillance data (although both reflect public concern rather than actual documentation of disease). Or, tracking consumers’ online behavior could be linked with bioinformatics. However, use of this data for such purposes presents complications in terms of privacy and consent as online, the lines between public and private are increasingly blurred.413

Linking to state and county data sources has allowed some of the organizations we interviewed to better understand their patient population.414 KP often links its data to the California Department of Developmental Services’ database for its ASD patients. However, they are unable to link to the patient’s educational records due to state laws.415 The ability to link EHR data to public school records would be ideal for research on autism spectrum disorders because individuals are often identified in both places and in theory should be managed jointly between the pediatrician and the school.416 Linking to outside data sets also allows research on the population level, for which Essentia has linked its EHR to publicly available state and county data.417 State employee health plans such as the California Public Employees’ Retirement System (CalPERS), which covers active and retired state and local government employees and their family members, may also be a potential data source of demographic and administrative information, diagnosis as well as information on spending.418

There have been a number of recent federal efforts to increase the availability of social, demographic, and behavioral data using a variety of data sources. AHRQ has recently awarded grants from the American Recovery and Reinvestment Act to enhance race/ethnicity information in statewide hospital encounter databases, another source of patient information. State grantees are taking a number of approaches to enhancing data, from standardizing, educating and auditing hospitals as they report R/E/L data to revising administrative codes to include a mandate.419 Also, CMS has recently commissioned a study to examine the barriers to collecting social and behavioral data from EHRs for Stage 3 of the meaningful use program, and how to overcome these obstacles. This study will identify the core social and behavioral domains that should be included in an EHR, possibilities for linking EHRs to public health departments, social service agencies, and other non-health care organizations, as well as case studies where such links have been established and how privacy issues were addressed.420

In addition, as EHR adoption increases, EHR data plays an increasingly important role in national health surveys such as the National Ambulatory Medical Care Survey (NAMCS), which collects information on practice characteristics and patient visits by abstracting data from a sample of patient medical records from each participating practice. While previously limited to national and regional estimates, the Affordable Care Act has funded a sample increase that will allow for state-based estimates of clinical preventive services.421 This survey also collects information on EHR adoption, as previously described.

Despite existing challenges to meeting the conditions needed to use EHR data for research, the experts we interviewed provided examples of innovative ways barriers were being overcome. Additionally, they were cautiously optimistic that some other barriers could overcome in relatively short time frames, potentially resulting in a “tipping point” or “major paradigm shift” in how clinical and health services and policy research is conducted in the not so distant future. Specifically, the experts we interviewed had a number of suggestions for ways to move forward in the field of EHR-based research in general and/or ways to study specific small or minority populations. These suggestions can be categorized as potential studies aimed at data validation, new tools and methods for mining and extracting data, descriptive studies around specific populations, and outcomes research. There were also a number of recommendations around engaging and encouraging collaboration among key stakeholders (clinicians, small populations, and vendors) to improve the quality of data collected, as well as on improving the legal framework and other policy issues around secondary uses of electronic health data.

Data validation

The most commonly suggested types of studies were those aimed at further examining the strengths and limits of EHR data, as well as identifying potential methods to strengthen the data for research use. Research networks such as the HMO Research Network,422 Community Health Applied Research Network,423 and Practice-Based Research Networks or DARTNet may be good places to conduct this kind of research because of the volume and variety of data they have available and the expertise they have already been developing through other projects and studies. The Health Care Systems Collaboratory was also identified as a good place to start for these types of projects because participants are advanced and can demonstrate the potential of EHR-based research.424

A potential related area for research included the development and testing of various patient surveys and/or completed instruments, including perhaps a catalog of items patients could self-report that would be integrated into the EHR and combined with other data. For example, it has been shown that patients will accurately report their height so it does not need to be measured by the nurse, but patients are less likely to accurately report their weight.425 A study to examine whether meaningful use has increased documentation of targeted variables was also suggested.426 In one such study, Kaiser is conducting targeted patient interviews as patients left a doctor’s office to see if they are smokers (a meaningful use measure) and whether the doctor talked to them about it, giving them a better sense of how to interpret their EHR data. The use of interviews and other methods of directly hearing from the patient are an important form of validation because although electronic health data can provide a lot of information, the only way to in know how a patient feels is to talk to him or her, or the caregiver. The collection of health-related quality of life data and/or patient experience data provides additional information from the patient’s perspective.

One suggestion for research funders from the technical expert panel was to take some studies that have been conducted on small populations using survey methods and to release requests for proposals to see if there is anyone who could look at the same issue and population using EHRs or other electronic health data, allowing for a comparison of results between methods. Similar rapid response requests for proposals could be used when there is a pressing issue for a particular small population that EHR networks could potentially examine. There were also potential studies suggested among those interviewed to examine the validity of data used to identify specific small populations for research, such as:

A large, prospective study to understand how sexual orientation and gender identity data captured in EHRs differs from patient views427

Research to identify how patients are identified as having an ASD and the data elements needed to study ASD patients, both to assess what data are available and how complete these data are428

Examination of the potential of natural language processing to identify ASD patients429 and sexual orientation430

Studies on whether and how physicians are collecting information around sexual practices and sexual orientation431

New tools and/or methods

As several examples briefly described in the report illustrate, the field is developing a variety of new methods and/or tools to identify priority small n populations in EHR databases and transform key EHR data into analytic files for research. For example, researchers described algorithms or natural language processing software that more reliably and validly identified small n populations of interest and ways to use well-validated surveys to collect key information and integrate it into the EHRs. They also described a variety of different kinds of databases and some of their relative strengths and weaknesses. These and other kinds of tools could be further developed and the significant experience gained from current projects be capitalized on to develop a clearer picture of the strengths and weaknesses of different approaches for extracting and using the data from a variety of perspective and the conditions under which one may be relatively advantageous or likely to succeed.

There is also work being done to explore new methods that can incorporate the use of EHRs and other electronic health data into more traditional methods of research, as well as to better understand what types of studies EHR data may or may not be best suited for. There is a need to further develop research study designs in order to study small populations. While randomized controlled trials have traditionally been the “gold standard,” there is growing agreement that this discipline must evolve, particularly to be able to focus trials on specific subgroups to look for differences. For example, the HSC Collaboratory has been exploring the use of EHRs for more pragmatic, real world approaches to clinical trials. While these approaches may not produce results that are generalizable, for research on small populations in particular there is a lot to be learned if they can be studied as the unique group that they are when the opportunity is available to use quasi-experimental models. Ease of access to the population may also provide opportunities to study of small, unique populations that may be concentrated in certain areas or in a health system or plan where there is good data. For example, Kaiser Hawaii may provide opportunities for research on Asian subpopulations as it serves a large concentration of Asians and has had good ethnicity data for years.

In addition, there should be considerations over what would be a useful control group for studies on small populations. Using controls from within the same electronic health data set may be advantageous because any bias in the data is likely not systematically skewed to the control. Although these biases may not be quantifiable, they can at least be described qualitatively in light of knowledge of the limitations of the data.432

It would also be helpful to identify ideal study components where EHRs and other electronic health data can help supplement other information that is collected, such as to provide utilization information for clinical trials, or to help develop high risk cohorts. EHRs may offer a viable first stage screening for proxies, such as use of a treatment as a proxy for having a rare condition. EHRs may be helpful in identifying these research questions, potentially by examining the distribution of comorbidities, or how delivery of care differs across subpopulations. There may also be ways to combine EHR and other types of data such as survey data. Some examples may include using EHRs to identify a population for a more targeted survey, or conducting a survey and then supplementing that information with what is available in medical records. Using a combination of data sources may also facilitate more effective identification of small populations. In addition, while geospatial approaches have typically been used to study rural populations, they may also be useful to study other small populations because they are often not evenly distributed throughout the country.433

Descriptive studies

There were also a number of suggested studies using EHR data to better understand the health and health care of specific small populations. For example, Kaiser has used sophisticated sampling with its EHR data to stratify patients into various subgroups according to how likely they are to have COPD—presumably, this could be done with other health outcomes. These studies could serve to examine how various subpopulations fare relative to the majority population and to identify disparities in order to address them. Some examples include:

Health: studies to examine comorbidities of adults with ASDs,434 or common diagnoses among different Asian subpopulations435

Social determinants of health: studies to better understand the patient complexity and risk associated with social determinants of health barriers (e.g., limited English proficiency, poverty level, insurance status) among different Asian subpopulations, many of whom are immigrants436

Health care utilization: studies to examine use of pediatric services by adolescents with ASDs during the transition to adulthood,437 use of psychotropic and ADHD medication among young children with ASDs,438 as well as referrals to mental health services and outside behavioral diagnostic testing439

Quality: research around the receipt of recommended care by Asian subpopulations, LGBT, and other minority or disadvantaged groups441

Patient experience: use of satisfaction surveys linked to encounter data to examine the experience of LGBT patients442

Outcomes research

Finally, a number of interviewees pointed to the potential of EHR data to be used for research examining outcomes, and how these outcomes may differ for different sub-groups of the population. This would include examining the outcomes of medications, types of treatments or care processes,443 interventions such as smoking cessation or medications,444 and new models of care such as telemedicine for rural patients.445

The information in EHRs is well suited for research around clinical topics, health services, delivery system issues, and quality of care. The volume of information makes it useful for high-level, broad utilization benchmarking as well as for more detailed information on small populations.446 The ability to identify small populations also presents an opportunity for comparison studies to identify disparities in health and/or health care that may be experienced by certain groups, such as differences in access or quality of care. These data are also useful for descriptive epidemiology that looks at the prevalence and trends of certain conditions over time by certain demographic or other characteristics,447 as well as quality improvement research to improve care for certain populations.448

EHRs also provide a unique opportunity to look for undiagnosed conditions. For example, CHARN is looking for people with possible undiagnosed hypertension by identifying people in EHRs who have high blood pressure but have not gotten tested for hypertension. They are then targeted for testing and therapeutic intervention.449

Stakeholder engagement and collaboration

In addition to potential studies, those interviewed recommended efforts to further engage key stakeholders to improve the quality of data collected, as well as to direct the research agenda for using electronic health data to study small populations. In particular, clinician engagement was recommended in order to improve the quality of data available for EHR research. Providing education about the importance of the data may motivate physicians to enter data into structured fields rather than free text. An additional incentive may be to provide feedback on their data quality along with reports around the quality of care.450 Encouraging clinicians to use their data will lead to improvement as they identify and address errors. Obtaining trust from participants is a big issue—for example, a representative from CHARN interviewed is aware the participating community health centers (CHCs) are still watching to make sure the coordinating center is not just writing reports using their data rather than engaging the CHCs in research.451 Information could also be provided to help them manage their patient populations more effectively so they can see the usefulness of high quality data. For example, reports could identify complex chronically ill patients for follow-up.452 Engaging clinicians in the development of research may help identify research questions that help address the challenges they face in clinical practice. Also, practices that participate in research networks should be supported monetarily and in terms of infrastructure to make sure they are collecting the data that researchers want. Relationship building is required, as well as some benefit to the providers from the data in order to obtain their buy-in and support. Some interviewees also suggested being purposive regarding what types of practices contribute data for research—partnering with those who are interested in using their EHRs to generate evidence, and practices with patient populations who might otherwise be underrepresented in research, such as those serving children or ethnic minorities.453

In addition to engaging providers who treat small populations, engaging the small populations themselves is important to improve the quality of data collected. One recommendation from the technical expert panel was to work with the LGBT community to develop ways to respectfully identify them, as well as to gain consensus around what information to collect and what categories to use. With HHS piloting questions to identify the LGBT population on national surveys, there may be an opportunity to compare these findings with EHR-based methods of identifying LGBT patients. Another suggestion was to convene a task force to identify the data needed to study small populations. Establishing common data elements for each population, such as specific demographic variables, may also be a task for such a task force. Vendors must also be engaged around the need for common data elements, as well as to promote the development of EHRs that support a learning health care system.454

The legal framework and other policy issues

Although the technical expert panel identified a potential role for the federal government in disseminating best practices on how research has been successfully conducted thus far within the legal framework, there was agreement that in the long run, these “work-arounds” would not be sufficient. Elements of the law that have been suggested as ripe for revision include the over-emphasis on informed consent over other fair information practices, preferential treatment of quality improvement and other internal uses over research, and lack of guidance around network architecture, governance and IRB structure.455 There is also opportunity for the government to educate the public around the benefits of using their health data for research and the barriers that over-protection of privacy pose to progress in the fields of medical and public health research. Privacy concerns that prevent patients from allowing their data to be shared also leads to a number of health risks, such as errors that occur when a patient’s multiple providers do not know what each other are doing. While the younger generation has grown up in the age of social media and may have fewer concerns around privacy, recent events such as the publicity around PRISM (the National Security Agency’s electronic surveillance program mining telecommunications data) have brought to light existing public concerns around privacy.

Implementation of policies aimed at closing the digital divide experienced by rural and safety net providers such as the HITECH Act will also improve the availability of electronic health data to study small populations. The need for a business model for EHRs in rural practice remains. The development of subscription-based EHRs operated over secure web portals and requiring only web appliances in the physician’s office may be one solution. Further development of networks like CHARN and support for such networks to learn from the experiences of more well-resourced research enterprises such as Kaiser or the HMO research network is also important for studying these populations. The government may also consider supporting the development of decentralized data warehouses and other IT infrastructure to link health systems in specific geographic areas, such as underserved urban areas or sparsely populated rural areas. Funding the development of “Centers of Research Excellence” to support the development of EHR-based research on small populations may also help build infrastructure.

Finally, closing gaps that occur when children age out of their parent’s insurance will improve the continuity of electronic information available to study small populations over time. While additional opportunities and subsidies to purchase insurance through the Affordable Care Act may help address gaps in coverage, there must also be efforts by delivery systems to close gaps in information. Development of personal health records and more robust information exchanges as incentivized in the HITECH Act will help. Simpler solutions exist as well, such as providing patients with a copy of their information that they can share with new providers. This has been done in cancer care and may be helpful to adolescents with ASDs as they transition to adulthood as well.

Relative to other federal data sources like surveys and claims databases, as well as paper charts, electronic health records have some major strengths. These include: the potential to reach larger samples of individuals, perhaps in some cases approaching the majority of the population or subpopulations of interest; the inclusion of many types of clinically rich, detailed information; the potential inclusiveness and longitudinality of some data sets; and, the ability to link EHR data to other data sources, including patient self-reported information on a variety of issues such as behavior, functioning, or health status and other outcomes. Additionally, the change in medium from paper and pen to computer hard ware and software facilitates the identification, extraction, and sharing of data on a scope, scale, and speed heretofore not possible. Finally, ARRA HITECH funding has stimulated more providers to adopt and use EHRs and ongoing efforts in this area and implementation of health reform is likely to give providers additional incentives to invest in and use EHRs.

While some significant barriers remain, many of the conditions required for harnessing the power of EHRs for a research on the health and health care needs of the American people and key small n populations are present or closer to being realized. Our interviews and literature review illustrate that innovate solutions are being developed through a variety of publicly supported and private efforts. Moreover, these innovative solutions provide concrete examples of how thorny governance, privacy and security, technical, and other barriers might be overcome. They also allow for a “cataloging” of lessons learned from various approaches and potential next steps.

Toward that end, interviewees and our own thinking result in a number of possible suggestions for moving the field forward. They can broadly be described as additional “environmental scanning” to identify promising approaches, convening of HHS agencies and possibly other groups via a public-private partnership framework to identify possible next steps and their prioritization, support for targeted EHR method and data project or specific research projects using EHR data alone or in combination with other data, and strategic planning and coordination within HHS on ways to proceed in the shorter and longer term.

For example, the research for this report has identified some of the major recent efforts in various HHS departments that have touched on the potential use of EHR data for research, implicitly or explicitly. However, we have not had the opportunity to fully catalogue or mine these programs for “lessons learned.” A more comprehensive and detailed identification and mining of innovative examples would be potentially very valuable to the field.

Similarly, we have identified and spoken with the leaders of some of the major federal and/or private research efforts to date and gotten some opportunity to get their thoughts on key areas for further work. Additional input will be gathered from a sub-set of them serving as TEP members. However, a broader group of researchers with complementary and diverse areas of expertise could be convened to weigh in on priorities and next steps. In addition, other major stakeholders such as provider and professional associations could be convened to discuss the issues that the use of EHRs for research as well as operations and related issues (i.e., quality and efficiency improvement) raise. EHRs are currently used for ongoing care and operations, and it is not clear whether and to what extent providers and professionals understand how they can help ensure that such data are useful for research and what might motivate them to become more engaged in and invested in improving the data for ongoing research. In other words, what is the business case for providers and professionals to engage in and/or participate in research that uses EHRs and/or what conditions would make them more interested and able to do so.

As noted above, interviewees identified specific projects that could be pursued. While some of these projects could be described more as EHR data and methods projects, such as EHR data validation studies or studies related to the strengths and weaknesses of different database approaches, others are more focused on particular priority target population or small n population and their health and health care needs. However, right now, many federal funding solicitations do not explicitly call for projects that innovate with respect to EHR data and methods and/or attempt to use it for research for research on specific priority populations.

Finally, drawing on the first two general steps, HHS could develop a broad plan for moving the field forward and/or specific mechanisms and projects that could be pursued to leverage the investments already made in EHR infrastructure, methods, and research. Given the potential scope a scale of the efforts needed, as well as the need to involve a variety of private organizations (e.g., health plans, organized delivery systems) in these efforts, it can be very difficult to determine where to begin and some pathways and mechanisms to facilitate progress. However, it seems clear that a locus of leadership and coordination of effort would be helpful in and of itself. There are pockets of substantial activity but currently no clear organization, department, or mechanism for pulling these pieces together within HHS or between HHS and other potential private partners, particularly with respect to the use of EHR data for research. This is clearly loci of leadership for other areas related to EHRs, such as CMS and ONC for the adoption and use of EHRs to improve quality and efficiency, and private organizations (e.g., health plans, organized delivery systems, vendors, professional associations) are highly engaged and involved in that process. Perhaps there could be an equivalent effort around the use of EHR data for research, which pulls together clinical and health services and policy researchers, key federal agencies, and other private organizations.

In sum, EHRs hold great promise to advance research on a number of topics and populations, particularly small n populations. Although there are numerous barriers, the adoption and use of EHRs is increasing fairly rapidly for many reasons, including ARRA HITECH and health reform and there is tremendous energy and enthusiasm in pockets of the research community about ways to further harness EHRs for research. This report has identified and described some prior federal efforts and related projects, ways they are working to overcome these barriers, and general next steps. Further work will be done by the TEP to identify more specific areas and possible priority areas and ways these general approaches could be more concrete and actionable by HHS alone or in some cases in conjunction with private partners such as foundations and/or associations or networks of major health plans, organized delivery systems, and professional associations.

┬╖Richard Wasserman, MD, Pediatric Research in Office Settings (PROS), American Academy of Pediatrics and University of Vermont , and Alex Fiks, MD, Pediatric Research Consortium, ChildrenтАЩs Hospital of Philadelphia

┬╖David West and Lisa Schilling, University of Colorado (DARTNet and SAFTINet)

Network of federally qualified health centers anduniversities createdtoconduct patient-centered outcomeresearchamong underservedpopulations. Made up of four research nodecentersand one data coordinatingcenter. Was originallyfunded in 2010.

An NCI-funded initiative made up of 9 health care systems [serving close to 9 million members] and 6 affiliate sites to support cancer research based in non-profit integrated health care delivery settings. All participating sites are also members of the HMO Research Network. First funded in 1999.

Collaboratory aiming to provide a framework of implementation methods and best practices for clinical research done by health care systems. Collaboratory aims to support high impact demonstration projects and provide leadership and technical research expertise.

This project aims to engage stakeholders in the design of a database system that can search existing patient registries in the U.S.; facilitate the use of common data fields; provide searchable summary results; be able to search existing data for research purposes; serve as a recruitment mechanism for new registries. The project was launched in 2012.

A network intended to promote innovation through field-based research in health care delivery by accelerating the diffusion of research into practice. Includes 17 partnerships and more than 350 participating organizations that provide health care to an estimated 50 percent of the U.S. population. ACTION II was initially funded in 2011. Its predecessor, ACTION,[9] was funded from in 2006-2010. Prior to ACTION, the Integrated Delivery System Network (IDSRN),[10] was funded from 2000-2005, and awarded nearly $26 million for 93 projects.

19.Brown J, Syat B, Lane K, and Platt R. тАЬBlueprint for a Distributed Research Network to Conduct Population Studies and Safety Surveillance.тАЭ Effective Health Care Program Research Reports 27. Agency for Healthcare Research and Quality, June 2010. http://effectivehealthcare.ahrq.gov/reports/final.cfm.

21.Chan KS, Fowles JB, and Weiner JP. тАЬReview: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature.тАЭ Medical Care Research and Review, 2010; 67(5): 503тАУ27.

22.Charles D, Furukawa M, and Hufstader M. тАЬElectronic Health Record Systems and Intent to Attest to Meaningful Use among Non-Federal Acute Care Hospitals in the United States: 2008тАУ2011.тАЭ ONC Data Brief 1. Office of the National Coordinator for Health IT, 2012.

23.Clark S, and Weale A. тАЬInformation Governance in Health: An Analysis of the Social Values Involved in Data Linkage Studies.тАЭ Economic and Social Research Council, 2011.

27.Decker SL, Jamoom EW, and Sisk JE. Physicians in Non-Primary Care and Small Practices and Those age 55 and Older Lag in Adopting Electronic Health Record Systems. Health Affairs, April 2012. 10.1377/hlthaff.2011.1121.

48.Gurney JG, McPheeters ML, and Davis MM. тАЬParental Report of Health Conditions and Health Care Use among Children with and without Autism: National Survey of ChildrenтАЩs Health.тАЭ Archives of Pediatrics & Adolescent Medicine, 2006; 160(8): 825тАУ30.

51.Hendricks DR and Wehman P. тАЬTransition from School to Adulthood for Youth with Autism Spectrum Disorders: Review and Recommendations.тАЭ Focus on Autism and Other Developmental Disabilities, 2009; 24(2): 77тАУ88.

64.Institute of Medicine. тАЬKnowing What Works in Health Care: A Roadmap for the Nation.тАЭ Consensus Report, January 24, 2008.http://www.iom.edu/Reports/2008/Knowing-What-Works-in-Health-Care-A-Roadmap-for- the-Nation.aspx.

78.Langworthy-Lam KS, Aman MG, and Van Bourgondien ME. тАЬPrevalence and Patterns of Use of Psychoactive Medicines in Individuals with Autism in the Autism Society of North Carolina.тАЭ Journal of Child and Adolescent Psychopharmacology, 2002; 12(4): 311тАУ21.

98.NORC at the University of Chicago. тАЬHoward University Hospital Diabetes Treatment CenterтАФUsing Multi-modal Health IT Tools to Improve Quality and Delivery of Care in an Urban Setting.тАЭ June 2012,.http://www.healthit.gov/system/files/pdf/HowardCaseStudyReport.pdf.

99.NORC at the University of Chicago. тАЬPatient Care Management and Rewards ProgramтАФPromoting and Tracking Wellness Behaviors within the Context of an Existing Case-management Program.тАЭ June 2012. http://www.healthit.gov/system/files/pdf/AEH_CaseStudyReport.pdf.

Survey Disclaimer

According to the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it displays a valid OMB control number. The valid OMB control number for this information collection is 0990-0379. The time required to complete this information collection is estimated to average 5 minutes per response, including the time to review instructions, search existing data resources, gather the data needed, and complete and review the information collection. If you have comments concerning the accuracy of the time estimate(s) or suggestions for improving this form, please write to: U.S. Department of Health & Human Services, OS/OCIO/PRA, 200 Independence Ave., S.W., Suite 336-E, Washington D.C. 20201, Attention: PRA Reports Clearance Officer.