Findable: Data are described with rich metadata

The tale

When the first elves returned to the castle, their findings looked far from promising. They gathered around the senior data wizard in the castle hallway.

He stepped onto a podium to say a few words:

– “You are on the most important mission of all. However, there are a few rules and words of advice that you must keep in mind. When you go looking for chests, you must stick to the paths that you already know from your studies – we cannot find you in the forests, if you walk your own way and get lost. When you find a chest, make sure that you read all about the chest on the label to see if its contents are relevant to us. Read it carefully, as some may not be all that clear in stating what is in the chest. And finally, you must never ever open the chest yourself! Leave it to the data wizards”.

Even though the senior wizard knew that the elves would do their best, he kept all six fingers crossed.

The elves were bright, indeed; still he doubted they would locate all the relevant chests out there.

The truth

When humans and machines look for data, metadata are often the first point of contact, as they are usually indexed in search engines etc. It is often the metadata that determine whether the data set they describe is perceived as relevant or not for a given usage scenario

If you asked a human being, the same query words they would use to find a data set should be available in the metadata. This is metadata about the context and/or prerequisites for the data set, quality issues etc., as well as a number of discipline-specific data, e.g. sample size, equipment etc. This also includes details about the data set that may not be important to you, but could somehow be used to make your data findable outside your own discipline. So, try to think outside the box when adding metadata to your data.