Findable: (meta)data are assigned with globally unique and persistent identifiers

The tale

The elves returned one by one to the castle, and some of them were really frustrated. They had been following paths to data chests that had been meticulously described, but somehow the data chests had been removed, just leaving holes in the ground. Fimble was one of these elves, who came back quite puzzled about some strange codes he had found. He could not decipher them and therefore did not know where to go.

– “Look” said Fimble to the data wizard, “I have this strange code 10.1234/abbb, and I don’t know what it means?”

– “Oh, these are very useful indeed” said the data wizard.

– “We can look up the codes in these huge books. Now let me see. 10 is the great country of Datavalley, and we should look in the house number 1234.” He showed a map to Fimble in the book. “This is where you should go”.

– “Are you sure it’s still there?” said Fimble, not wanting to waste a single more step on hunting down data chests he could not find.

– “Absolutely. These books are magic. If someone moves the data chest to a new location, the book will know.”

– “Great” said Fimble, and took off in a sprint. He soon returned happy carrying a data chest.

But not everything was bliss. Another elf named Faelar never came back. His only clue was to go and talk to somebody named Zhang Wei asking for his data chest. As far as we know, Faelar is still walking from door to door talking to people with that name.

The truth

One of the big problems with data concerns the ability to cite your own and other people’s data, and keep pointing to its exact location in cyberspace, because the location might change. This can be solved by using persistent identifiers. They work like a big index or registry where you assign a unique key (the identifier) to each data set. If someone tries to follow the identifier (often referred to as “resolving a persistent identifier”), the resolver will point to the correct web address (URL). If the URL changes – i.e. if data is moved –the one who made the key is responsible for providing the new location to the resolver. In this way, you do not end up in blind alleys of “page not found”. That is why it is called persistent.

A DOI (Document Object Identifier) is an example of such an identifier and looks like this; 10.1234/abba (prefix/suffix). This can be resolved and points you to the URL. Most data repositories can issue and maintain DOIs or other persistent identifiers that you can use. Persistent identifiers usually contain some basic descriptive metadata such as title and author.

Other persistent identifiers are used to prevent ambiguity, e.g. giving a person a number instead of a name. This solves the problem of distinguishing Sam Smith from Sam Smith – yes, there is more than one person with that name! ORCID is an example of a service, where a person is assigned a unique code for further reference, e.g. ‘0000-0002-1825-0097’ that will point to one – and only one – person.