Skip to Main Content

Large Data Sets in Nursing Research- RUL

introduction to existing data set sources that might be used for secondary analysis by nurse researchers

Study design and data collection methods

Points to consider:

  • Definition of the target population and adequacy of the sample’s representation of the population
  • Criteria that have been applied for subject inclusion/exclusion
  • Strategies used to minimize selection bias
    • Methods used to prevent attrition of the subjects and, if appropriate, rates of subject mortality
  • Characteristics of respondents, non-respondents, dropouts
    • Validity and reliability of the research instruments in the population from whom the data was collected
  • Qualifications and training of the research team members
    • Personal and demographic characteristics of the data collectors and whether these characteristics were matched to those of the participants
  • Controls were used to minimize threats to internal validity
  • Procedures that were used to handle missing data

(Jacobson, A., Hamilton, P., & Galloway, J. (1993). Obtaining and evaluating data sets for secondary analysis in nursing research. Western Journal Of Nursing Research, 15(4), 483-494.)

Data set documentation

        A thorough understanding of the data is critical to your success.  Documentation is your key.  It may include codebooks or dictionaries, manuals, and any reports resulting from the use of the data set.  If such documentation is not available, you should consider developing your own codebook.

        Documentation should include information about the variables, their names, labels and definitions.  Without the definitions as clarification, the variable names may not match your interpretation of the term.  The codebook should indicate the organization of the fields. 

        Handling of missing data should be part of the codebook.  Researchers follow different practices so cells for missing data may have been left blank or may be indicated by a standard designation such as 9, 99, or 999.  The researcher may have added an estimated value for the missing data and it is important for you to know what procedure was followed to determine the value.  There may additional information on how much data is missing in each of the variables and how much data is missing overall. 

        Additional components of the codebook include copies of the research instruments, a detailed description of the methodologies used, procedures for data editing and coding as well as information about error rates. 

        If you anticipate having questions on using the data set for your research questions, it might be an important consideration to have a contact person from the original study available.


© , Rutgers, The State University of New Jersey

Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers websites to or complete the Report Accessibility Barrier / Provide Feedback form.