Skip to main content

Data Management: Publishing Your Data

a guide to best practices for curating your research data

Citing Your Data

The obverse of publishing is citation.  Data citation standards are emerging in many disciplines, and are highly recommended to provide reliable access to specific datasets and to provide credit to the producers of useful data.  In the absence of a specific disciplinary model, a data citation should include the following:

  • Author or Responsible Party (examples: study PI, sample collector, government agency)
  • Name of the Data Element used (e.g., a specific Table/Map/dataset with any applicable unique IDs)
  • Name of the Database (or Publication or Repository) 
  • Version identifier (Study number or edition or year or version, etc.)
  • Date accessed
  • URL used

If specific steps were required to subset, analyze, or access the data, the citation should also include

  • parameters selected 
  • software used

See also:

Publishing Your Data

Your research is valuable and important, and so is the data that it is based on.  By publishing your data, you make it available to the scholarly community, who can study and build upon your work.  Your work will become more visible and typically be cited more frequently.

There are at least four ways of publishing your data, with different advantages and disadvantages.

  • A disciplinary repository offers high visibility within a particular field, if one exists in your area. ICPSR, the Inter-university Consortium for Political and Social Research is the leading archive in the social sciences.  In the sciences, there are many repositories, such as the National Space Science Data Center or Dryad. Larger multi-disciplinary projects, such as DataONE, are emerging. Here is a list of Bio-related data repositories. Consult re3data.org for pointers to more. Not all repositories are committed to long-term preservation of data, however, and their mission and focus may change over time.  Some, such as ICPSR, are only available to subscribers. 
  • An institutional repository's mission is to permanently preserve the scholarly output of the institution.  At Rutgers, RUresearch, the data portal of RUcore, the Rutgers Community Repository, serves this function, and preserves text, audio, and video.  RUcore is working on the capability of archiving complex datasets and hopes to make this more widely available soon.  Institutional repositories are designed to meet the needs of scholars in all disciplines, and operate according to widely accepted standards for preservation and access.
  • Often journals publish data associated with their published articles.  See this list of economics journals for examples. This option is good for visibility, but is often tied to a journal subscription, limiting access.  Compliance with documentation standards and long-term preservation may vary considerably from journal to journal.
  • In some fields, the "data journal" is emerging as an alternative.  Here, the data serves as the main course, and the article is descriptive of the data set.  This enables the data to be cited in a very familiar form.  Examples are Scientific Data (from Nature), GigaScience (which includes cloud analysis functionality), and F1000Research.
  • Self-publishing can occur through individual, institutional, or third-party websites.  In these cases, the researcher is responsible for vetting their own data for quality and documentation, and for preserving an accessible version of the data as file formats change in the future.  Tools such as the IQSS Dataverse focus on the broad sharing of data, and allow individual researchers or research centers to manage their own data on a remote server while handling some of the technical issues, although the long-term implications are uncertain at this point. openICPSR is a freely-accessible version of ICPSR.
  • Consult Scientific Data's list of recommended repositories for more suggestions.

It is not necessary to choose only one of these methods.  Most of these options do not require an exclusive granting of rights, so it is possible to deposit data in multiple locations, which is perhaps the best method for maximizing current visibility and long-term preservation simultaneously.

Data Librarian

Ryan Womack's picture
Ryan Womack
Contact:
Alexander Library

169 College Avenue

New Brunswick, NJ 08901 USA

848-932-6107
Website / Blog Page
Subjects:Data, Economics