Skip to Main Content

Large Data Sets in Nursing Research- RUL

introduction to existing data set sources that might be used for secondary analysis by nurse researchers

Administrative databases

American Hospital Association Annual Survey

            The most comprehensive and authoritative source on US hospitals and their characteristics, the AHA Annual Survey is sent to 6,500 hospitals and has a response rate of 85%.  Data is collected on a variety of topics including hospital organizational structure, facilities and services, utilization data, physician arrangements, staffing, and community orientation.  Researchers frequently use the Annual Survey with other datasets such as the National Inpatient Sample, Medicare, and Medicaid data, to analyze patterns of practice and healthcare outcomes.  Data includes nearly 900 variables to categorize hospitals based on size, ownership, teaching status, and the extent of facilities and services.  Data from earlier surveys is available. 

 Healthcare Cost and Utilization Project (HCUP)

            The Healthcare Cost and Utilization Project (HCUP, pronounced “H-Cup”) is a family of health care databases and related software toos and products developed through a federal, state, and industry partnership and sponsored by the Agency for Healthcare Research and Quality.  HCUP databases bring together the data collection efforts of state data organizations, hospital associations, private data organizations, and the federal government to create a national information resource of patient-level health care data.  HCUP includes the largest collection of longitudinal hospital care data in the United States. With all-payer, encounter-level information beginning in 1988.  These databases enable research on a broad range of health policy issues, including cost and quality of health services, medical practice patterns, access to health care programs, and outcomes of treatments at the national state, and local market levels.

            HCUP offers six databases which provide data beginning 1988.  The information is collected at the encounter-level and is available in a uniform format with privacy protections in place.  The databases include:

            Nationwide Inpatient Sample (NIS) with inpatient data from a national sample of over 1,000 hospitals;

            Kids Inpatient Database (KID) which is a nationwide sample of pediatric inpatient discharges;

            Nationwide Emergency Department Sample (NEDS) which contains national estimates of emergency department visits;

            State Inpatient Databases (SID) which contain the universe of inpatient discharge abstracts from participating states;

            State Ambulatory Surgery Databases (SASD) containing data from ambulatory care encounters from hospital-affiliated and sometimes freestanding ambulatory surgery sites;

            State Emergency Department Databases (SEDD) offering data from hospital affiliated emergency departments for visits that do not result in hospitalizations.

The databases are available for purchase from HCUP’s Central Distributor.

            AHRQ provides HCUPnet, a query system which retrieves health statistics and information on hospital inpatient and emergency department utilization.  Users may build their own tables with data drawn from all of the HCUP databases.

 Medicare and Medicaid Data

            The Center for Medicare and Medicaid Services offers claims (billing) data to researchers which has been used to investigate health care utilization and health outcomes.  The site provides a Data Navigator ( to help users find appropriate data.  Searching may be accomplished by selecting a specific CMS program, health care topic, care delivery setting, location, and document type. 

            Medicare data, through its sheer volume, has generalizability and the potential for power as its strengths.  However it is very complex and its use frequently requires extensive training and support.  Data is available in three file types.  The first, Research Identifiable Files (RIF) are available from 1991 and include identifiable information about the patient and physician.  The second, available as of 2000, is the Limited Dataset File (LDS), which has the same information as the RIF data with personally identifiable information removed.  LDS files are offered with a 100% national sample, a 5% national sample or state specific data.  Non-identifiable data files are the third type and they contain aggregate data.  Researchers may obtain the datasets through an application process.  Fees are graduated depending on the type of data requested.

            Medicaid data is available from 1999 to the present in MAX files.  There are five components: Personal Summary File, Inpatient File which includes diagnoses, procedures, discharge statue, length of stay and payments, Long Term Care File containing services provided by skilled nursing facilities, intermediate care facilities, custodial care facilities, and independent psychiatric facilities, Drug File, and Other Therapy file which has data for non-institutional services such as physician and other professional claims for outpatient clinic visits, labs, and x-rays.  Medicaid data is also available through an application process.

           With the intricacies of using CMS data, the Center contracted with the University of Minnesota’s School of Public Health to offer free assistance to researchers who want to use Medicare and Medicaid data.  Their service, ResDAC, Research Data Assistance Center, will help with:

  • Understanding and interpreting Medicare and Medicaid program policies and coverage
  • Learning about the strengths, weaknesses, and applications of Medicare and Medicaid data
  • Understanding the creation of CMS's administrative data files and claims processing
  • Reviewing the methods of cohort identification and file specifications
  • Generating cost estimates and invoices for CMS data
  • Preparing a request for CMS data 

Their information desk and email reference service are staffed by experienced master’s prepared technicians.

 Medical Expenditure Panel Survey (MEPS)

            Originating in 1996, the Medical Expenditure Panel Surveyis a set of large scale surveys of families and individuals, their medical providers (doctors, hospitals, pharmacies, etc.), and employers across the nation.  MEPS collects data on the specific health services that American use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of health insurance held by and available to U. S. workers.  The Survey is the most complete source of data on the cost and use of health care and health insurance coverage.

            MEPS has two major components, household and insurance.  The Household Component provides data from individual households and their members with additional data drawn from their medical providers.  The Insurance Component is a separate survey of employers to obtain data on employer-based health insurance.  Two additional components, Medical Provider and Nursing Home, also collect data and make it available.

            Data in the Household Component is collected from a series of interviews conducted over a period of two full calendar years.  The sample of families and individuals is drawn from a nationally representative subsample of households that participated in the prior year’s National Health Interview Survey.  Publicly available data is available on the MEPS website in downloadable data files or on CD-ROM.  Restricted data files may be available to researchers through a visit to the AHRQ Data Center in Rockville, MD, or at one of the Census Bureau‘s Research Data Centers.  The researcher’s project must be approved prior to the visit. 


© , Rutgers, The State University of New Jersey

Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers websites to or complete the Report Accessibility Barrier / Provide Feedback form.