The elves were bringing back data chests to the wizards like crazy, leaving both the elves and the wizards in an almost frantic state of mind. The wizards were struggling to figure out the contents of the chests. One of the elves that came back with a chest was Roscoe. Proud as most elves, he brought a nicely wrapped chest to the wizard, and one he had paid for, too.
– “Hmm”, the data wizard muttered. “What’s this? Hopefully better than the last one?”
– “Well …” said Roscoe. “It was hard for me to find out, if it was right for me to bring it back here. I know we were supposed to bring back only the ones that succeeded in turning water into gold. But this doesn’t say what it has been used for”.
– “Geez”, said the wizard a little distressed. “I have to look into it to figure out if it’s relevant. But this will take time, and I do not have time.”
– “Sorry”, apologized Roscoe looking down.
– “We don’t have room for more irrelevant chests in here”, exclaimed the wizard. “You’ll have to focus on the ones that we know will transform water. And don’t pay for anything that we are not sure of.”
– “What …” said Roscoe doubtfully. “Leave … chests … behind …?”
– “Yes”, said the wizard in a firm voice. “I know it is hard, but we have to make sure that we only bring in relevant chests.”
– “Alright”, said Roscoe. “I think I’ll catch a fishing rod on my way out and see if I can find a trustworthy data lake”.
Labelling your data with relevant attributes – most often in the form of metadata – does not only help discovering your data. It also helps humans and machines to understand the context of your data. This can be in the form of purpose and processing statements, equipment used, software versions etc. Imagine finding your own data. Now think of the contextual information that would benefit you in determining whether the data is relevant to your specific needs – and whether you would be able to understand how the data were created. Be generous when adding attributes to your data. What might not be relevant to you might be the part of filtering and querying for data for other people – or machines.
Speaking of machines, your best choice would be to use e.g. controlled vocabularies, persistent identifiers or similar to make the contextual description unambiguous. Often repositories targeted towards specific disciplines, communities, or data types will have the most optimal support for both assigning, maintaining and querying using domain specific metadata.