Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
What is data documentation?
Data documentation ensures that your data will be understood and interpreted by any user. Data documentation should start at the beginning of a project and continue throughout. This will make data documentation easier and make it less likely that you will forget details later.
What's important to document?
- Context of data collection
- Data collection methodology
- Structure and organization of data files
- Data validation and quality assurance
- The manipulation of raw data through analysis
- Data confidentiality, access, and use conditions
Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, the context of the data, the structure of the data and its contents, and any manipulations that have been applied to the data.
Data Level Documentation
- Variable names and descriptions
- Definition of codes and classification schemes
- Reasons for missing values
- Definitions of specialized terminology and acronyms
- Algorithms used to transform data
- File format and software used
From University of Illinois
Books on data documentation
Limited to the last 3 years
Best Practices for Data Documentation
From the Data Documentation Initiative (DDI)
Data Documentation Initiative
International standard for describing statistical and social science data. Documenting data with DDI facilitates interpretation and understanding -- both by humans and computers. The freely available international DDI standard describes data that result from observational methods in the social, behavioral, economic, and health sciences. Use DDI to Document, Discover, and Interoperate!
Version Control Software
This site is built for coders to upload, share, and collaborate on code with each other. Metrics tracked by GitHub include watchers, collaborators, and forks (when someone copies code to develop for their own purposes). GitHub is one of the few ways programmers can track the impact of their code.
Free cloud-based electronic lab notebook for use by researchers, instructors, and students for input and organization of laboratory data, information sharing, and collaboration, and for saving historical versions of files.