Skip to Main Content

Science Data Management for Grad Students

How to organize, store, share and preserve your scientific research data.

What is Metadata?

Metadata is simply data about data. Metadata describes your data so that others can find it, understand it, and possibly use it. The answers to the 20 questions below are also the kinds of metadata you may need to describe your data.

Metadata standards exist for different disciplines. A collection of some of these is available from the UK's Digital Curation Centre: http://www.dcc.ac.uk/resources/metadata-standards

20 Questions for Research Data Management

From the blog post 20 Questions for Research Data Management – see https://datamanagementplanning.wordpress.com/2012/03/07/twenty-questions-for-research-data-management/

These twenty questions are designed to prompt and assist your thinking, as a research student, a postdoc or an academic researcher at the beginning of a research project, and to form the basis of a workable research data management plan that can both guide your on-going data management activities and inform others about the nature and availability of your research data.

They will help you determining how best to safeguard your data from loss, how to describe your datasets in ways that assist both yourself when returning to them in the future and others in their subsequent interpretation, and how to publish your data in ways that maximize their usefulness to others and bring maximum academic scholarly credit to yourself, to reward your efforts in acquiring, analyzing, describing, interpreting and publishing them in the first place.

You may not have immediate answers to all these questions.  But, by seeking advice from your research supervisor, colleagues and others in your institution with responsibilities for data management, you should endeavor to discover them.  Then, once in a while, you should revisit these questions and see whether your data management practices can be improved, updating your answers.

 

The nature of your data

1       What is the general subject discipline (domain, field) to which your research data relates?

2       What is the exact nature (range, scope) of your research data?

3       Who will own the data arising from your research, and the intellectual property rights relating
         to them?

4       If you know at this stage, specify in what format(s), will you store your data in the short term
         after acquisition?

Date descriptions, so that someone else can understand what the data are about (i.e. metadata, “data about data”)

5       When and where will you describe each of your research datasets, so that someone else can
         understand them?

6       How will descriptive metadata be created or captured?

Data sharing and publication

7       With whom will you share your research data in the short term, before publication of any papers
         arising from their interpretation?

8      For how long will you embargo your research data before it is published for others to see
        and use?

9      Why is public access to your research data to be restricted (if indeed it is)?

10     Under what data-sharing license will you publish your research data?

11     What persistent identifier will be used to permit correct citation of your datasets?

12     What metadata will be published with the data to make them interpretable and reusable?

Data storage, backup and archiving

13     Where will you store your data in the short term, after acquisition?

14     Who is responsible for the immediate day-to-day management, storage and backup of the data
          arising from your research?

15     How frequently will your research data be backed up for short-term data security?

16     Where will your research data be archived for long-term preservation?

17     When will your research data be moved to a secure archive for long-term preservation
         and publication?

18     Who will decide which of your research data are worth preserving?

19     How (i.e. by what physical or electronic method) will you transfer your research datasets to their
         long-term archive, under the curatorial care of  a separate third-party, e.g. a data repository?

20     Who will be responsible for your data, once you have left your present research group?

 

 

Organize

Start off your research with good filenaming practices. These include:

  • Short but descriptive filenames- the name should tell you what the file contains
  • 3 letter file extensions- .jpg not .jpeg
  • Don't use special characters, except for dashes - and underscores _
  • Use all lowercase
  • No spaces
  • Use leading zeroes- myfile001.tif, not myfile1.tif

All versions of data must be clearly identified.

Be consistent. Documentation is key!

from http://ucblibraries.colorado.edu/systems/digitalinitiatives/docs/filenameguidelines.pdf

More about file formats

Some file formats are better for the eventual preservation of your data. (See Share/Preserve for more about preservation.) Below are some preferred file formats for preservation, from the Georgia Tech Libraries website.

Examples of preferred format choices:

  •     PDF/A, not Word
  •     ASCII or CSV, not Excel
  •     MPEG-4, not Quicktime
  •     TIFF or JPEG2000, not GIF or JPG
  •     XML or RDF, not RDBMS

Store

Where will you keep your data?

  • Department server - Is it secure?
  • Hard drive of your computer- what happens if it's compromised?
  • Flash drive- what if you lose it?
  • Paper notebooks- a fire? flood? theft?

How will you back it up?

  • In the cloud- secure?
  • External hard drive- damaged, lost or stolen?
  • On a flash drive in your desk- really?

None of these are terrible, unless they are your only copy!

Use the 3-2-1 Rule:

  • At least three copies,
  • In two different formats,
  • with one of those copies off-site

From http://blog.trendmicro.com/trendlabs-security-intelligence/world-backup-day-the-3-2-1-rule/

Citing Data

Why Cite Data? From DataCite:

"Why is it so important to cite data? Books and journal articles have long benefited from an infrastructure that makes them easy to cite, a key element in the process of research and academic discourse. We believe that you should cite data in just the same way that you can cite other sources of information, such as articles and books. Data citation can help by:

  • enabling easy reuse and verification of data
  • allowing the impact of data to be tracked
  • creating a scholarly structure that recognises and rewards data producers"

Good data citation includes a persistent identifier, such as a DOI- Digital Object Identifier, URN- Uniform Resource Name, or Handle. (ICPSR).

How do you get a DOI? Often when you deposit your work in a repository, a DOI is assigned to each item for you. Repositories use a DOI Registration Agency, such as DataCite or CrossRef.

Data Citation Formats

Basic recommended format from DataCite:

Creator (PublicationYear): Title. Publisher. Identifier.    Or, slightly expanded format:

Creator (PublicationYear): Title. Version. Publisher.  ResourceType. Identifier.

Example:

Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo.http://dx.doi.org/10.1594/PANGAEA.726855

Recommended format from ICPSR:

Author,. (Date). Title. Version, Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)

Example:

Sidlauskas B (2007) Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study From characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20