The Basics of Data Documentation and Metadata
- Who contributed to the work (authors, research assistants, etc.)?
- What kind(s) of data and analysis were used?
- When was the data collected? When was analysis performed? Any other pertinent dates?
- Does the project involve a particular geographic area?
- What is the impetus for the project? What questions are you trying to answer?
Imagine that you have to leave the project as is for a couple months and then come back to it. What are the most important aspects of the project you'd need help remembering? Some examples:
- File handling (how are they named, how are they divided)
- Processing steps (how to get from point A to B)
- Field abbreviation/name glossary (now what does POV360 stand for again?)
Now imagine if you had to leave the project and come back after six months or a year. What else would you add to the list?
Standardizing your documentation
The next step is to standardize the formatting. The standard to use depends on the discipline and/or format of your data. A few standards are listed below. This list is not intended to be exhaustive, but rather descriptive.
|Type of Data
|Social and Behavioral Sciences
|Data Documentation Initiative (DDI)
|Ecological Metadata Language (EML)
|Content Standard for Digital Geospatial Metadata (CSDGM)/FGDC/ISO 19115
A more comprehensive list of disciplinary metadata standards is available from the Digital Curation Centre.
ReadMe files help ensure that your data can be correctly interpreted and analyzed by other researchers.
There are two ways to include a ReadMe with your dataset:
- Provide a separate ReadMe for each individual data file (example of ReadMe file with individual data file).
- Provide one ReadMe for the data package as a whole (example of ReadMe file with a data package).
A ReadMe should be a plain text file containing the following:
- for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication
- for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units
- any data processing steps, especially if not described in the publication, that may affect interpretation of results
- a description of what associated datasets are stored elsewhere, if applicable
- whom to contact with questions
If text formatting is important for your ReadMe, PDF format is also acceptable.