Metadata is information about information. All items of information have metadata. For example:
- Emails have a sender, a subject, a received date, a cc list, a return path, content type, etc.
- Source code modules have a name, and an evolution history which itself contains fields like author, date, version number, branching identifiers, etc.
- Files in a Windows file system have a name, size, created date, modified date, various title, subject, author, and keyword attributes that I never use, an access control list, and an extensible file metadata repository that seems really cool but I moved over to the Java world before I could see any use for it
- Product documentation snippets have an original author, an evolution history, links to design documents, links to other documentation snippets, project information, product information, etc.
- Certain database tables record what would be recognizable as “primary information” along with metadata, for example a Notes table might include a Note field (the primary content) plus a foreign key reference to an author table, a foreign key reference to a category table, a date field, etc.
- A newspaper clipping has an author, a byline, a title, and (for online articles) information about the publish date and perhaps the page number of the print edition.
- A folder has a list of contents, plus perhaps information about how the folder was created and what it is for
It is difficult to envision a piece of information that has no metadata. Experimental results from a science experiment may be organized into NTuples for which there appears to be no clear “primary” and therefore “secondary” information, but frankly I can’t think of any examples where I encountered this situation: There is always a primary field. An Excel spreadsheet may appear to be comprised of simple numeric data but deprived of any contextual metadata I doubt you would find much use for it other than perhaps to plot a graph – which when you think about it is really a process of reapplication of metadata that had been previously purged from the data.
There are three main categories of metadata:
- Simple atoms of information such as “author” and “creation date”
- Links to other atoms of information
- Evolution histories
Each category has a number of sub-categories. Most people are familiar with the first main category, but as far as I am concerned the other two are much more interesting from an information architecture perspective. The way that artifacts are linked to each other determines how they evolve. And managing data evolution is a rich and interesting topic that I and many of my customers believe is key to managing information in the future.
