Digitalia: An Intro to Metadata and Taxonomies

Tuesday, April 3, 2012

An Intro to Metadata and Taxonomies

Go to article

by Christine Benson on March 29th, 2012

As the structured/adaptive content conversation progresses, metadata and taxonomies will also become more and more important. To participate in the conversation, you don’t need to know everything—but you do need to understand the essential foundations so you can ask the right questions.

For the record, I’m no expert. At the end of the post, I’ve listed a series of resources from some super-smart people who are.

The goal of this post is to provide an introduction to the concepts, so you can get a general understanding and feel comfortable digging into more information.

Now, this conversation gets big in a hurry—but don’t be intimidated. These terms have been around much longer than the Web, and can be applied in a wide variety of contexts. In the hopes of making this post a bit more approachable, I’m going to fast-forward through the structured content conversation with the diagram below.

Simple enough, right? I’ll be skipping past why and how to break your content up into components, and instead focus on how metadata and taxonomies get applied to content components.

Metadata first

The information provided in metadata makes the content findable and understandable to either a human or a computer. There are lots of definitions out there, but when it comes to metadata, I look to Rachel Lovinger, the metadata guru. She defines metadata as “information about the content that provides structure, context, and meaning.”

There are three main types of metadata:

Structural: Defines the metadata elements that need to be collected; labels like title, author, date created, subject, purpose, etc. Defining these structural elements is typically based on a mix of organizational and system needs, along with standard schemas like Dublin Core.
Administrative: Often created automatically when content is entered into the CMS, these values are used to manage the content. Administrative metadata includes things like date created or author. They can sometimes include sub-elements about rights-management or preservation.
Descriptive: These values describe aspects specific to each content component, like title, subject, audience, and/or purpose.

Some of each of the three types of metadata is likely to be used on a typical piece of content, but how and when they get defined is very different. The structural metadata gets identified as part of your system requirements. Administrative and descriptive metadata are identified during the creation or curation of specific content. If you think of it like a form, the structural metadata supports which information needs to be collected (fields on the page), and the descriptive and administrative metadata provide the values for those form fields.

Here’s how they work together:

Taxonomy, shmaxonomy

The term taxonomy gets applied across a range of contexts. In the biology world, it means grouping organisms into hierarchical groups (e.g., kingdom, phylum, class, order, family, genus, and species).
The web/digital world typically applies it to any kind of structure that organizes information. Information science people sometimes say “controlled vocabularies” instead of taxonomies.

Regardless of the term, the underlying goals are to create some level of consistency and control over the information used to describe a content component, and clarify relationships between them.
Common types include:

Term list: A standardized list of terms created to insure consistent tagging and indexing. Think of it as a list of “preferred language.” Term lists typically provide a series of metadata values to pick from for elements like format or content type.
Hierarchies: Often called a “taxonomy,” a hierarchy defines the structural framework used to classify terms into parent/child or broad-to-narrow relationships. Hierarchies are specifically used to support layered groups of information and not simply for the convenience of creating groupings—although each level of a hierarchy is commonly referred to as a “category.”
Thesauri: A thesaurus translates conceptual relationships between the content, often made naturally by humans, into something a computer can understand. Thesauri typically address three types of relationships: equivalent (synonyms), hierarchical (broad-to-narrow terms), and/or associative (related terms).

Let’s get together, yeah yeah yeah

At its simplest, a taxonomy organizes information, and metadata describes it. For the taxonomy to be able to organize the information, terms need to be stored as metadata. It all works together to make the content findable, recognizable, and useful.

An example:

Not every site needs every one of these things, but this diagram illustrates how these elements can feed into each other and how they help display content to the user.

What’s next?

Admittedly, I’ve over-simplified these concepts to make them easier to understand. If you’re interested in learning more about metadata, taxonomies, and structured content, there’s no end to the list of resources out there.

Here are a few to get you started:

Rachel Lovinger’s Metadata Workshop from Content Strategy Applied 2012
“Taxonomies and controlled vocabularies best practices for metadata” by Heather Hedden, Journal of Digital Asset Management
The National Information Standards Organization (NISO) provides a wealth of information related to both traditional and new technologies in the publications section of their site
Rachel Lovinger’s blog, Meaningful Data
The Rockley Group’s blog
Joe Gollner’s blog, The Content Philosopher
Karen McGrane’s blog
Earley & Associates knowledge center

Digitalia