Tip:
Highlight text to annotate it
X
[MUSIC PLAYING]
Metadata, or documentation, is critical to
analyzing the data.
The data that I'm talking about are just gigabytes and
gigabytes of raw numbers, ones, two, threes, fours.
If you brought it up on a screen, or printed it out, it
would take up bookshelves worth of
paper of just numbers.
Absolutely meaningless without documentation.
The documentation, therefore, indicates what is meant by all
those ones and zeroes.
Everyone can take eggs, flour, milk, butter, and sugar, but
it takes a recipe to put those together to make a cake, or an
omelette, or whatever.
The raw ingredients are the data.
The recipe would be the metadata.
You can do a lot of things with the data
depending on the metadata.
In my field of the geosciences, we work with a
lot of map data.
And 20 years ago, when you looked at a map, you had not
only the data, which is the points and lines on a map, but
you also had documentation about the data.
And that was all the text around the edges of the map
that told you who made the map, when they made the map,
and some of the processes they used, what was their source
data, how many observations they made.
Now, as maps become more digital, you don't have that
maps surround to give you information about the data.
That's why we use metadata to document what work went into
that map, and who did it, and when it was done.
[MUSIC PLAYING]
In my field, metadata is something
generally external to data.
The data itself is an electronic file, and the
metadata is a separate electronic file, or in some
cases, even a hard copy file.
We're working to get the metadata embedded into the
data so that as you copy the digital files from one place
to another, the metadata always travels with it.
I know of data sets that don't have any documentation, and
once those are copied three or four times to three or four
different people, you lose all lineage back to the original
source material.
That's why it's important to document the data.
If possible, have that documentation, or metadata,
embedded in the data itself.
[MUSIC PLAYING]
We have a checklist for things that need to be included in
that documentation file, a minimum set of things, such
as, who was the principal investigator of this study?
What's the universe of respondents?
Are we interviewing just children, citizens,
immigrants, et cetera?
We make sure that when we receive a data file and
documentation, that it is clean, that it is very clear
what question is being asked and what the range of
responses are.
We have to codify those responses, and that
codification process is the meat of our documentation.
And that's why, in social science, the documentation is
literally called a code book.
As a geographic information specialist, I am usually
working with the data as the data are being developed.
I am working with the geologist who went out in the
field, made observations, and is putting together a map.
So I, along with the geologist, work on compiling
metadata, or documentation, for that map data.
We record when the map was started, who worked on it.
And as you mention a code book, there are many codes
that go into a map to describe the nature of the materials in
the subsurface, the types of information that were
collected, observations that were made.
All of those have codes in the database, and all those codes
need to be documented in the metadata.
[MUSIC PLAYING]
The reason it's important to standardize metadata is so
that it can be used, shared, and reused by anyone.
The problem I have with standards is that they are
always changing.
Standards are good while they are being used, but they
inherently have obsolescence built into them.
In the geospatial data field, the metadata standards have
evolved over time.
They've gone from a standard developed for the United
States, to an international standard, to an add-on to the
US standard to bring it up to the international standard.
It's always changing, but at the core of it, there are half
a dozen fields that transfer from one to the other.
And those are some of the ones that we always try to capture
first. These are the basic elements of a metadata
document that always need to be completed.
And I would say that, in my field, with social science
quantitative data, the pure vanilla, the most basic
elements of documentation, would be the parts that
describe what is being collected.
[MUSIC PLAYING]