Skip to Main Content

Science Data Management for Grad Students

How to organize, store, share and preserve your scientific research data.

Public Access to Federally Funded Research

In 2013, the Office of Science and Technology Policy (OSTP) published a memorandum stating that research funded by federal agencies with R&D budgets of over $100 Million must be made publicly accessible. See the memorandum here.

Since then, the memorandum was updated and directed the affected agencies to publish plans as to how researchers would comply with this requirement. In November of 2014, federal agencies began sharing these plans. A list of all of the affected agencies and their requirements is available in this table from the University of Arizona Libraries.

In March 2015, NSF published "NSF's Action Plan: Today's Data, Tomorrow's Discoveries", which outlines the new expectations for data sharing. Centralized data repositories are still being developed which consider accepted practices for specific disciplines.

 

Data sharing literally saved lives! A researcher at Stanford shared geospatial data for Zambia through Stanford University Libraries' Data Portal Earthworks, and Doctors Without Borders was able to use the data to map and contain a cholera outbreak. See the article here: http://library.stanford.edu/blogs/digital-library-blog/2016/05/sdr-deposit-week-earthworks-action-against-cholera

Repositories for Data Sharing

The following are searchable collections of data repositories for a variety of subject disciplines:

re3data.org Registry of Research Data Repositories- Provided by the German Research Foundation, it now includes DataBib. "re3data.org is a global registry of research data repositories that covers research data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions."

http://v2.sherpa.ac.uk/opendoar/​ OpenDOAR Directory of Open Access Repositories Created by the University of Nottingham in the UK, "OpenDOAR provides a quality-assured listing of open access repositories around the world. OpenDOAR staff harvest and assign metadata to allow categorisation and analysis to assist the wider use and exploitation of repositories. Each of the repositories has been visited by OpenDOAR staff to ensure a high degree of quality and consistency in the information provided."

http://www.osti.gov/dataexplorer/  Department of Energy Data Explorer Discover science, technology, engineering research and data collections from the US Department of Energy. View datasets by subject, find repositories hosting the data.

Scientific Data is a journal published by Nature for publishing research data. They also provide a list of recommended data repositories for scientific research data by discipline.

Trusted Repositories

Links
[1] http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re
[2] http://www.dcc.ac.uk/
[3] http://www.digitalpreservationeurope.eu/
[4] https://www.langzeitarchivierung.de/Webs/nestor/EN/Home/home_node.html
[5] http://www.crl.edu/

Preservation

Sharing your data via a website is not the same as preserving your data. Funders will require that you keep your data for various time periods. Data is typically preserved in a repository that has made a commitment to maintaining the data it accepts over time. In order to do this, data is usually kept in preferred file formats.

Preferred file formats are:

  •     Non-proprietary
  •     Open, with documented standards
  •     Commonly used by a research community
  •     Standard representations (ASCII, Unicode)
  •     Unencrypted
  •     Uncompressed (If you need to compress files to conserve space, limit compression to your 3rd backup copy.)

(From Georgia Tech Libraries http://d7.library.gatech.edu/research-data/archiving)

How Long?

"Once the minimum storage period has been met, the PI must decide whether to continue storing the data. Although data can be kept indefinitely, a PI must evaluate the benefits and risks of extended storage. On the one hand, one never knows when data might be needed. On the other hand, continued storage of confidential data increases the risk of possible violation. The monetary cost of retention and security are additional concerns." (From the Office of Research Integrity, US Dept of Health and Human Services)

For the NSF, different Directorates will have different requirements. Many of these have no specified amount of time that the data must be retained, it will be up to the PI to determine the length and declare it in a proposed Data Management Plan. Links to the NSF Directorates advice on Data Management Plans are provided below.

For NIH, protection of personal data will be a consideration. See National Institutes of Health Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research, February 2015.

Data Sharing and Management Snafu

What happens to data sharing when there is bad data management. From NYU Health Sciences Library