Skip to main content

Learn about data documentation and metadata

The Basics of Data Documentation and Metadata

Who contributed to the work (authors, research assistants, etc.)?
What kind(s) of data and analysis were used?
When was the data collected? When was analysis performed? Any other pertinent dates?
Does the project involve a particular geographic area?
What is the impetus for the project? What questions are you trying to answer?

More Documentation

Imagine that you have to leave the project as is for a couple months and then come back to it. What are the most important aspects of the project you'd need help remembering? Some examples:

  • File handling (how are they named, how are they divided)
  • Processing steps (how to get from point A to B)
  • Field abbreviation/name glossary (now what does POV360 stand for again?)

Now imagine if you had to leave the project and come back after six months or a year. What else would you add to the list?

Standardizing your documentation

The next step is to standardize the formatting. The standard to use depends on the discipline and/or format of your data. A few standards are listed below. This list is not intended to be exhaustive, but rather descriptive.

Type of Data Discipline/s Standard
---- Social and Behavioral Sciences Data Documentation Initiative (DDI)
---- Ecology Ecological Metadata Language (EML)
Spatial ---- Content Standard for Digital Geospatial Metadata (CSDGM)/FGDC/ISO 19115
Biodiversity Life Sciences Darwin Core

A more comprehensive list of disciplinary metadata standards is available from the Digital Curation Centre.

ReadMe Files

ReadMe files help ensure that your data can be correctly interpreted and analyzed by other researchers.

There are two ways to include a ReadMe with your dataset:

A ReadMe should be a plain text file containing the following:

  • for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication
  • for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units
  • any data processing steps, especially if not described in the publication, that may affect interpretation of results
  • a description of what associated datasets are stored elsewhere, if applicable
  • whom to contact with questions

If text formatting is important for your ReadMe, PDF format is also acceptable.

Adapted from "ReadMe Guidance" by Dryad.