Skip to main content

Large Data Sets in Nursing Research- RUL: Welcome

introduction to existing data set sources that might be used for secondary analysis by nurse researchers


          With increasing competitiveness in obtaining grant funding for prospective studies and the time investment required, a growing number of nurse scholars are turning to secondary data analysis to investigate their research questions.  Secondary data analysis is defined as the use of existing data to address new research questions or methods.  Existing large data sets permit more complex questions, a greater number of variables and a sample that reflects the population.  Nurse researchers have used secondary data analysis in epidemiological studies, rist assessments, technology assessments, comparisons of practice in different geographic areas and outcomes research.

          Benefits of large data sets include:

·         Less time

·         Less cost

·         Large number of subjects observed over a period of time

·         Easier to get IRB approval

·         Generalizability of findings.

          Use of large data sets has inherent challenges such as reliability and validity issues, missing data, potential biases, and possible limitations from HIPPA regarding availability of the data.  Time factors might be considered a liability if societal changes have influenced variable definitions.

         Large data sets share some characteristics.  They result from observational research designs.  The data sets include data from a large number of people in actual health care delivery settings.  The data is in computer readable form and requires specialized statistical analysis techniques and software.

         The data sets fall into two categories, administrative and clinical.  Clinical data sets are drawn from patient records of routine care or from research protocols.  They are collected by hospitals and health care agencies.  Administrative data sets result from indirect patient care and include insurance claims, vital events, quality assurance records, and registry records.  The groups collecting this data are usually federal or state agencies, international agencies such as the World Health Organization and professional associations.

Comparison of Process for Secondary Analysis and Prospective Study

In their 2009 article, "Using existing data to answer new research questions,"  published in Research and Theory in Nursing (v. 23, no. 3, p. 203-215), Daniel M. Doolan and Erika S. Froelicher included a chart comparing the processes of using existing data for a study and creating a prospective study.   


Iterative Process During Secondary Analysis Iterative Process During a Prospective Study
1.  Perform literature review         1.  Perform literature review

2.  Find gaps in the research and find research opportunities

2.  Find gaps in the research and find research opportunities

3.  Identify and obtain permission from the original PI to analyze a data set

3.  Pose research questions that could be answered given a prudent sample, measures, and, if applicable, follow-up
4.  Refine research questions

4.  Write a research proposal including how subjects will be recruited, what data will be collected, and what safeguards will be in place to protect subjects' safety

5.  Evaluate the appropriateness of the orginal sample, design, and measures

5.  Obtain aopproval from the organization's IRB or equivalent and committee

6.  Establish appropriate safeguards to protect data and consider legal and ethical implications of the analysis

6.  Recruit consenting subjects

7.  Obtain approval from the organization's IRB or equivalent committee

7.  Obtain predetermined measures from subjects

8.  Perform secondary analysis of the data

8.  Perform the analysis of the data
9.  Disseminate findings to the research community 9.  Disseminate findings to the research community



Life Sciences Librarian

Ann Vreeland Watkins
John Cotton Dana Library
185 University Ave
Newark, NJ 07102-1814
(973) 353-3809