Documenting Your Data
Some tips that will make your life easier.
University of Cologne
Why record metadata?
It is important, just ask the NSA.
“I will do that later.”
“Ah, I don’t need to write THAT down.”
“I will remember that when I’m back in my office.”
You don’t have to love it
... but please do it.
Worst case scenario
- You need information to understand data
- Other people don’t know what you know
- Future-you doesn’t know what you know
- To find the right bit of data you need information about your data
Metadata – Data about data
Types of metadata
Administrative (incl. rights)
Make your life easier
- Record metadata early
- Create data for bulks of files
- Document raw data and processed (transcribed) data
Deal with your raw data
File related Metadata
- File Type: WAV, MPEG, ...
- Rename your files (Include date, language, etc. in the name)
- Create metadata for bulks of files
Make an inventory
e.g. with exiftool
- File name
- File type (e.g. WAV)
- File size
- Encoding, sample rate, bit rate
- Dates of creation/modification ...
Metadata for your processed data
- IMDI – used by the LAT-Archives such as Nijmegen, Cologne, and Mexico D.F. (CIESAS)
- CMDI – CMDI is a modular format and IMDI-CMDI and ELDP-CMDI
- Forms or spreadsheet of different archives AILLA
- Resources (files):
audio and video files, annotations, ...
- Sessions (set of files):e.g. an audio file with ELAN file and a toolbox file
- Corpus (set of sessions):
sessions joined as group
an offline enabled web app
- works in modern browsers (best in Firefox, Chrome)
- no installation, just go to http://cmdi-maker.uni-koeln.de/
- data is stored persistently (even if you close the browser)
- works offline
- Create your metadata early
- Use some documented format
- Use the available tools
“Metadata is a love note to the future”