Skip to Main Content

Large Data Sets in Nursing Research- RUL

introduction to existing data set sources that might be used for secondary analysis by nurse researchers

Clinical databases

Add Health

            Add Health, the largest, most comprehensive survey of adolescents, began in 1994 with an in-school questionnaire given to a nationally representative sample of 7th through 12th graders.  The questionnaire was designed to discover how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood.  A series of in-home interviews followed the questionnaire in 1994-95, 1996, 2001-02, and 2007-08 when the participants were 24 to 32 years of age.  Data was also elicited from parents, siblings, fellow students and school administrators using questionnaires.  Pre-existing database information provided context for the teenagers’ neighborhoods and communities.  Multiple datasets are available

Behavioral Risk Factor Surveillance System (BRFSS)‎

            The BRFSS developed from scientific findings in the early 1980s that showed a connection between personal health behavior and early morbidity and mortality.  The National Center for Health Statistics responded by conducting national studies of risk behaviors.  While the studies were important, NCHS staff realized that national data might not accurately reflect the health behaviors of an individual state’s population.  Since state health departments would carry the responsibility to reduce risky behavior and subsequent illnesses, the BRFSS, with support from CDC, was moved to the state level. 

            In 1984, the BRFSS program began in 15 states with a monthly telephone survey administered by state health departments.  The CDC developed standard questions for the survey which would elicit data that could be compared across the participating states.  The questions addressed actual behaviors related to smoking, alcohol use, physical inactivity, diet, hypertension, and seat belt use.

            The BRFSS became a nationwide surveillance system in 1993.  The questionnaire was modified to include rotating fixed core and rotating core questions and up to five emerging core questions.  With increasing numbers of the population acquiring cell phones, the telephone survey was expanded in 2008 to include cell phone owners as well as landline telephone owners, making it the world’s largest telephone survey.  Data from the cell phone owners has been included since the 2011 public release data set.

            BRFSS offers a web-enabled analysis tool (WEAT) to aid users in analyzing data through a variety of statistical methods to produce cross-tabulations and logistic regression.

            Annual survey data and documentation are available from 1984 to 2012.  City and county data from the SMART (Selected Metropolitan/Micropolitan Area Risk Trends) program may also be accessed from the BRFSS website.

 National Health and Nutritional Examination Survey (NHANES)

            NHANES was developed by the National Center for Health Statistics in the early 1960s to assess the health and nutritional status of adults and children in different U. S. Population groups.  In 1999 it became a continuous program, collecting a variety of health and nutritional measurements.  The data is currently obtained through interviews and physical examinations, a unique characteristic of the Survey.  The interview includes demographic, socioeconomic, dietary, and health questions while the examination covers medical, dental and physiological status.  A nationally representative sample of about 5,000 people is assessed every year and the data is used to determine the prevalence of seventeen major diseases and risk factors.  Risk factors encompass lifestyle, constitution, heredity, environment and sexual practices.

            Data sets and questionnaires are available.

 National Long Term Care Survey  (NLTCS)

            Resulting from a cooperative agreement between the National Institute on Aging and Duke University, the National Long Term Care Survey (NLTCS) is a longitudinal survey designed to study changes in the health and functional status of older Americans (aged 65 and over).  It also tracts health expenditures, Medicare service use, and the availability of personal, family, and community resources for caregiving.  The survey was first administered in 1982 with five follow-ups, the last taking place in 2004.

            The NLTCS survey population consists of a sample of 35,789 people drawn from national Medicare enrollment files in 1982 that has been augmented with subsequent samples of approximately 20,000 Medicare enrollees obtained by adding 5,000 people passing age 65 between successive surveys done approximately every five years.  This technique ensures a large, nationally representative sample at each point in time.  Both elderly in the community and those residing in institutions are represented in the samples.  The response rate is above 95 percent for all waves. 

            Both documentation and data are available for download from the National Archive of Computerized Data on Aging which is housed at ICPSR.*&archive=NACDA

 National Cancer Data Base  (NCDB)

            The National Cancer Data Base (NCDB), a joint program of the Commission on Cancer of the American College of Surgeons and the American Cancer Society is a nationwide oncology outcomes database for more than 1,500 Commission-accredited cancer programs in the United States and Puerto Rico.  Some 70 percent of all newly diagnosed cases of cancer in the U. S. are captured at the institutional level and reported to the NCDB.  Begun in 1989, the NCDB now contains approximately 29 million records from hospital cancer registries across the nation.  Data on all types of cancer are tracked and analyzed.  These data are used to explore trends in cancer care, to create regional and state benchmarks for participating hospitals, and to serve as the basis for quality improvement. 

            Data elements are collected and submitted to the NCDB from Commission on Cancer accredited cancer program registries using nationally standardized data item and coding definitions.  The elements include patient characteristics, cancer staging and tumor histological characteristics, type of first course treatment administered, and outcomes information.  With the data coming from hospitals, it can be very useful in comparing and evaluating cancer care across institutions.

National Program of Cancer Registries (NCPR)

            Established in 1992 and administered by the CDC, the National Program of Cancer Registries (NCPR) collects data on cancer occurrence including the type, extent, and location of the cancer and the type of initial treatment.  The data is supplied by central cancer registries in 45 states, the District of Columbia, Puerto Rico and the US Pacific Islands which represent 96% of the U. S. population.  Although the registries accept data from doctor’s offices and pathology laboratories, most of the information comes from hospitals where highly trained cancer registrars extract information from the patient’s medical record and transmit to the cancer registry using standardized codes.  The data is intended to help in monitoring cancer trends over time, detecting cancer patterns in various populations, prioritizing the allocation of health resources and promoting research.  Data offered to health researchers is updated annually to the most recent year.

 Surveillance Epidemiology and End Results (SEER)

            A program of the National Cancer Institute, SEER provides information on cancer incidence and survival in the U. S.  The data comes from population-based cancer registries covering approximately 28 percent of the U. S. population.  Racial and language groups which are represented include African Americans, Hispanics, Native Americans, Asians, and Hawaiian/Pacific Islanders.  SEER registries collect data on patient characteristics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment and follow-up for vital status.  The Program is the only comprehensive source of population-based information in the U. S. that includes stage of cancer at the time of diagnosis and patient survival data.  Updated annually, data is provided in both print and electronic formats.

Please note:  While the above three programs collect very similar data about cancer, NCDB is reporting from the hospital registries at the 1,500 cancer programs accredited by the Commission on Cancer, the CDC’s National Program of Cancer Registries data is state based and the SEER data is based on population.


© , Rutgers, The State University of New Jersey

Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers websites to or complete the Report Accessibility Barrier / Provide Feedback form.