Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Link to Libraries homepage
Link to Libraries homepage
Rutgers University Libraries

Large Data Sets in Nursing Research- RUL: Evaluating Data Sets

introduction to existing data set sources that might be used for secondary analysis by nurse researchers

Evaluating large data sets

       Contrary to the common view that secondary data analysis is easy or quick, the process makes use of the same methodologies as primary research and depends on existing data that may not be perfect for your research question.  To make an informed decision about using the data, you might consider:

  • Study design
  • Data collection methods
  • Data file format
  • Data set documentation including variable names and descriptions
  • Data quality including reliability and validity
  • Extent of missing data
  • Availability of a contact person

Study design and data collection methods

Points to consider:

  • Definition of the target population and adequacy of the sample’s representation of the population
  • Criteria that have been applied for subject inclusion/exclusion
  • Strategies used to minimize selection bias
    • Methods used to prevent attrition of the subjects and, if appropriate, rates of subject mortality
  • Characteristics of respondents, non-respondents, dropouts
    • Validity and reliability of the research instruments in the population from whom the data was collected
  • Qualifications and training of the research team members
    • Personal and demographic characteristics of the data collectors and whether these characteristics were matched to those of the participants
  • Controls were used to minimize threats to internal validity
  • Procedures that were used to handle missing data

(Jacobson, A., Hamilton, P., & Galloway, J. (1993). Obtaining and evaluating data sets for secondary analysis in nursing research. Western Journal Of Nursing Research, 15(4), 483-494.)

Data set documentation

        A thorough understanding of the data is critical to your success.  Documentation is your key.  It may include codebooks or dictionaries, manuals, and any reports resulting from the use of the data set.  If such documentation is not available, you should consider developing your own codebook.

        Documentation should include information about the variables, their names, labels and definitions.  Without the definitions as clarification, the variable names may not match your interpretation of the term.  The codebook should indicate the organization of the fields. 

        Handling of missing data should be part of the codebook.  Researchers follow different practices so cells for missing data may have been left blank or may be indicated by a standard designation such as 9, 99, or 999.  The researcher may have added an estimated value for the missing data and it is important for you to know what procedure was followed to determine the value.  There may additional information on how much data is missing in each of the variables and how much data is missing overall. 

        Additional components of the codebook include copies of the research instruments, a detailed description of the methodologies used, procedures for data editing and coding as well as information about error rates. 

        If you anticipate having questions on using the data set for your research questions, it might be an important consideration to have a contact person from the original study available.

Data file format

         Data files may be available in many formats, Access, Excel, SPSS, R, for example.  Do you have the necessary software and computing power to read the files?  Is any conversion necessary?

Extent of missing data

        Before analysis, data sets must be complete.  If any data is missing due to the lack of a response or to incorrect coding, your findings may not be accurate.  Your decisions on handling missing data will be influenced by effects on the sample size and distribution, the remaining amount of data for analysis and, ultimately, the impact on your research question.

        There are several methods that can be used to deal with missing data.  The following list briefly introduces them although you will need more information to apply the methods.

  • List-wise deletion – Cases where any of the data is missing are deleted.  Only those with values for all of the variables are retained which preserves the sample size.  Consider: is there enough data without the cases?  Will the power of the calculation or the sample size be adversely affected?
  • Pair-wise deletion – When one variable in a case is missing a value, the other variables in the case that have data are still included in the analysis.  Consider: is it important to know how much data is missing for each of the variables under study?
  • Data imputation may be used when less than 5% of the cases are missing values for variables.  There are three common procedures to impute data.
  • Mean substitution – Average mean of the existing occurrences of the value is used to replace the missing value of a variable.  The method may introduce changes in correlation values if there is much missing data.
  • Regression imputation – Each missing value is replaced with one resulting from a regression analysis on the existing values.  While this method uses more of the values in the data set, it may result in correlations that are not valid.
  • Multiple imputation – A researcher, by following several steps in an imputation model, can arrive at an estimated value for each of the missing data.

Aponte, J. (2010). Key elements of large survey data sets. Nursing Economic$, 28(1), 27-36.

Vance, D. E. (2012). Troubles and triumphs of secondary data analyses: general guidelines. Research Practitioner, 13(4), 128-135.

Rutgers, The State University of New Jersey, an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to: accessibility@rutgers.edu or complete the Report Accessibility Barrier / Provide Feedback Form.