Andrea Lemieux
Professor Coleman
IRLS 501
December 10, 2004

 

Content Links

Introduction
Literature Review
Evaluation
Sample Checklist
Summary Tables

References
Appendix

 

Metadata Quality and Digital Resources:
An Evaluation of Selected Resources from the Digital Library for Earth System Education (DLESE)

 

Evaluating metadata quality may not be as easy as it sounds, since not only must the meaning of “quality” be defined but also what exactly is being evaluated. From an initial perspective, it is easy to assume that quality is equated with how consistently and accurately the metadata represents the resource. However, a closer look at the literature reveals that this broad definition of quality is two fold: (1) consistency and accuracy of the data entered into the surrogate record and (2) how the record itself supports retrieval and discovery of information (Moen, 1997, Rothenberg, 1996). Since examining both of these issues is beyond the scope of this project, in particular with developing a checklist to evaluate metadata quality (a tool itself that calls for a degree of brevity), the former will be dealt with more in-depth. A brief review of the literature and the following evaluation of metadata from selected resources in the Digital Library for Earth System Education (DLESE) will illustrate that the quality of individual data contributes as much to information retrieval and discovery as do the elements chosen to represent the resource.

 

back to top

 

Metadata Quality In the Literature

 

Focusing on individual data makes sense in terms of evaluating Dublin Core metadata since all fifteen elements are optional. Is it difficult to evaluate a missing element that has been decided against in some way or another as unnecessary, the reason for which would also be difficult to determine. More importantly, though, is the extent to which the literature reflects the importance of consistency and accuracy. Graham (1990) argues the importance of accuracy by reframing Cutter’s emphasis on known author, title, and subject searching. The accuracy of the first two of course contribute to information retrieval, which misspelled or incorrectly formatted data will not facilitate, while accurate subject application contributes to information discovery. Even in what Graham designates as the lowest level metadata, all of these elements are present, making their accuracy immensely important for basic use in retrieval and discovery.

Similarly, Barton, et al. (2003), readdresses the continual concern over accuracy of spelling, contributor, title, subject, and date fields. Although Barton does so in the context of consistency and harvesting records of digital resources, it is important at least to consider her point in terms of consistency within a single digital library. Greenberg, et al. (2001), pursues this idea further in their study of author-generated metadata. The criteria selected for use to analyze records were based on criteria established by past work from those such as Moen and Rothenberg, who indicate the importance of completeness and correctness, also the top two criteria for Bruce (2004). In Bruce’s “Aspects of Quality,” he goes on to indicate provenance and logical consistency and coherence as criteria as well, all of which are incorporated into the following metadata quality checklist. Rothenberg more specifically identifies consistency with “verification,” ensuring that data is consistent with similar data provided in other surrogate records and which conforms to policies dictating its format.

There is also the issue of controlled vocabulary, another difficult aspect of DC metadata to evaluate since again use of such standards is only recommended not required. Milstead argues that any schema should have at least some level of homogeny within its construction—at the least, a standardized format of data entry within individual fields. This concern is reiterated in Attig’s (1998) discussion of the user task of finding information, a concern that contributes to the “usability” factor identified as part of the assessment criterion complied in Moen’s study. Usability of a surrogate record is the degree to which retrieval and discovery is possible and can only be said to decrease if individual elements do not perform the function they are designed to achieve.

 

back to top

 

Quality Concerns of DLESE Resources

 

The creation of the following checklist was driven by the concept of usability. The nature of DLESE and consequently its most obvious strength is its collocation of educational earth science resources, which contribute greatly to both retrieval and discovery of information in this particular field of study. However, this is only true to the extent which resource metadata supports these two activities as described in the above literature review—namely through consistency and accuracy. It seemed only appropriate that they comprise much of the criteria for establishing metadata quality.

Unfortunately, a lack of both was evident simply on reviewing the metadata provided in the short record in contrast to the full record. When searching by subject, possibly by any search method, users are likely to scan the short record to determine if the resource meets their need, continuing to the long record to obtain further information. Only 40 percent of the selected resources could be said to be consistent, though it should be noted that the data in the “Subject” field in these resources was indeed shortened and not identical to the full record. For the other 60 percent, the metadata was simply not the same in both records. This included anything from entirely different grades levels and resource types to missing information. For the user, it would mean going directly to the resource to find which record is actually correct or simply going on to the next resource. Consistency between shot and full records, a mere technical matter, was largely responsible for the poor outcome on the checklist’s fifth question regarding ambiguity and/or unintelligible data.

Incomplete data, question three, also contributed to ambiguity. The “Resource contact/ creator/ publisher” element was usually incomplete, which was questionable since many resources listed distinctly different contacts and creators, many times clearly listed on the web page reached from the URL provided on the DLESE record. Provenance also suffered from an incomplete “Resource contact” element, again unnecessary since the information was easily located without having to look much into the resource itself. Moreover, the format was never consistent within the “Resource contact” field either. Sometimes the contact was clearly labeled, other times the contact email link was provided underneath the publisher, though from looking at the resource it was apparent the two were not the same. The user, however, would assume data located next to one another is related unless otherwise designated. This could also lead the user to go onto the resource, perhaps because a particular organization or university created the information, only to find that is not the case. Various searching errors could be encountered by such ambiguous metadata.

Another major concern that developed from the general criteria was a lack of consistency between different resource metadata, mainly in formatting. As mentioned above, there is the concern over formatting of the “Resource contact” element—some resources designated contact, creator, and publisher, others did not. On the checklist, question two, all resources faired poorly. It was difficult to discern a formatting pattern for any of the elements, in particular for “Resource type” and “Subject.” From DLESE’s homepage, one can search by either, which are separated into subcategories. While some resources designated the main resource type or subject along with the more specific subcategory, many did not. Although it does not seem that it would be additionally helpful for subject, for many instances it would give another point of discernment for users when searching for resource type. For instance, lecture is actually listed under “Audio” resource, but without knowing the main subject a user would have to return to the homepage or look at the resource itself to learn that it was not in fact a text lecture.

Considering criteria for subject headings in the checklist was something that was given much thought. It was decided against mainly for the reason that Dublin Core was developed intentionally to facilitate author created metadata of digital resources, which of course does not lend itself easily to Library of Congress Subject Headings. With this in mind, criteria was established that distinguished quality metadata by it conforming to the standards established by the particular digital library. The main obstacle in evaluating DLESE resources was determining what those standards and policies were—something that should be relatively obvious from a close examination of the metadata. That is why criteria for consistency was a large part of the checklist, regarding both consistency within an individual resource and that between different resources. That would ensure at least a degree of usability with a user familiar with a particular digital library, but something that is almost impossible, at the least frustrating, if resources vary as much as those in DLESE.

There of course were also many criteria with good results. There were no apparent spelling mistakes in either the short or long records. All but one URL linked directly to the web page described. Curiously, the one that did not was extremely difficult to locate at the URL listed. Also, the issue of granularity, question eight, was not an issue. Descriptions matched very well to the resource being described and they were specific to a topic or activity and were not clearinghouses for large, general subjects. Of course, the summative “Subject” element does not help users discover this so much, unless they search by more than one criteria, such as subject, resource type, and perhaps grade level—given these elements contain accurate information.

 

back to top

 

Sample Checklist

 

Metadata Quality Checklist for Educational Digital Resources

Evaluator's Name:
Date:

Resource Name:
URL:

General Criteria

1.

Are repeated elements in both the short and long record consistent? Yes No
Note:
2.

Has the data for each element been entered consistently with other records? (e.g. creator, date, etc.)

Yes No
Note:
3. Is the data in each element as complete as possible? Yes No
Note:
4.

Are there any obvious spelling errors?

Yes No
Note:
5. Are there significant ambiguities or unintelligible data? (i.e. regarding description, keywords, etc.) Yes No
Note:
Specific Criteria
6. Does the record title accurately reflect the resource title? (Complete resource derived title is preferred over partial or author created) Yes No
Note:
7.

Does the URL correspond directly to the resource?

Yes No
If no, is the resource easily located at the URL listed? Yes No
Note:
8. Does the resource’s description accurately reflect the resource’s content? Yes No
Note:
9.

Is the description brief enough to scan but long enough to adequately inform the user of its content?

Yes No
Note:
10. Does the record indicate the intended audience of the resource/type of resource? (e.g. lesson plan for teachers, interactive resource for students, etc.) Yes No
Note:
11. Is the grade level indicated an accurate description of the resource? Yes No
Note:
12.

Is a date indicated on the record?

Yes No
Note:
13. Do keywords provide thorough coverage of terms not identified elsewhere in the record? Yes No
Are keywords unnecessarily repeated? (Keywords should identify terms not already used in other elements but that are found in the resource.)
Note:
14. Is subject indexing in-depth? (Depth indexing is preferred and will aid information retrieval as well as document retrieval) Yes No
Note:
15. Is the resource’s provenance easily determined? (May be apparent from description or publisher or creator element, etc.) Yes No
Note:

 

back to top

 

Summary Table--General Criteria

  Are repeated elements in both the short and long record consistent? Has the data for each element been entered consistently with other records? Is the data in each element as complete as possible? Are there any obvious spelling errors? Are there significant ambiguities or unintelligible data?
Down the Drain N
N
Y
N
Y
Earth's Water N
N N N Y
Ground Water N N N N Y
The Nature of Water N N N N Y
Project WET N N N N Y

Understanding the Clean Water Act

Y N N N N
Virtual River N N N N Y
Water Management Y N N N N
Water Science for School's Y N N N N
Watershed Game Y N N N Y

 

Summary Table--Specfic Criteria

  Does the record title accurately reflect the resource title? Does the URL correspond directly to the resource? If no, is the resource easily located at the URL listed? Does the resource’s description accurately reflect the resource’s content? Is the description brief enough to scan but long enough to adequately inform the user of its content? Does the record indicate the intended audience of the resource/type of resource? Is the grade level indicated an accurate description of the resource? Is a date indicated on the record? Do keywords provide thorough coverage of terms not identified elsewhere in the record? Are keywords unnecessarily repeated? Is subject indexing in-depth? Is the resource’s provenance easily determined?
Down the Drain Y Y NA Y Y Y Y Y NA NA N Y
Earth's Water Y Y NA Y Y Y Y N NA NA N Y
Ground Water Y Y NA N N Y N N NA NA N N
The Nature of Water Y Y NA Y Y Y Y N NA NA Y Y
Project WET Y Y NA N N N Y N NA NA N Y
Understanding the Clean Water Act Y N N Y Y Y Y N Y N N Y
Virtual River Y Y NA Y Y Y N N Y Y N Y
Water Management Y Y NA Y Y Y Y N NA NA N Y
Water Science for School's N Y NA N N Y Y N Y Y Y N
Watershed Game Y Y NA Y Y Y Y Y NA NA N N

 

back to top

 

References

 

Attig, J. (1998). Dublin Core Metadata and the Cataloging Rules, Taskforce On Cataloging and Metadata Rules: Final Report. Retrieved November 26, 2004 from http://archive.ala.org/alcts/organization/ccs/ccda/tf-tei3.html.

Barton, J., Currier, S., Hey, J. M. N. (2003). Building Quality Assurance Into Metadata Creation: An Analysis Based On Learning Objects and e-Prints Communities of Practice. Paper from 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice--Metadata Research and Applications, Seattle, Washington: Bell Harbor International Conference Center. Retrieved November 26, 2004 from http://www.siderean.com/dc2003/201_paper60.pdf.

Bruce, Tom. (2004). The Continuum of Quality: Defining, Expressing, and Exploiting Metadata. Retrieved November 28, 2004 from http://metadata-wg.mannlib.cornell.edu/forum/index.php?date=2004-05-21.

Graham, Peter S. (1990). Quality in Cataloging: Making Distinctions. Journal of Academic Librarianship, 16, 213-218.

Greenberg, J., Pattuelli, M. C., Parsia, B., Robertson, W. D. (2001). Author-generated Dublin Core Metadata for Web Resources: A Baseline Study In an Organization. [Electronic Version] Journal of Digital Information, 2 (2). Retrieved November 28, 2004 from http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Greenberg/.

Milstead, J., Feldman, S. (1999). Metadata: Cataloging by Any Other Name . . . . Retrieved November 26, 2004 from http://www.cbuc.es/5digital/1.pdf.

Moen, W. E., Stewart, E. L., McClure, C. R. (1997). The Role of Content Analysis In Evaluating Metadata for the US Government Information Locator Service (GILS): Results from an Exploratory Study. Retrieved November 28, 2004 from http://www.unt.edu/wmoen/publications/GILSMDContentAnalysis.htm.

Rothenberg, J. (1996). Metadata to Support Data Quality and Longevity. Paper from 1st IEEE Metadata Conference, Silver Spring, Maryland. Retrieved November 28, 2004 from http://www.computer.org/conferences/meta96/rothenberg_Paper/ieee.data-quality.html.

 

back to top

 

Appendix 1: Digital Library for Earth Systems Education (DLESE) Selected Resources

 

Down the Drain: How Much Water Do You Use? Retrieved November 16, 2004 from http://www.k12science.org/curriculum/drainproj/

Earth's Water: Ground Water. Retrieved November 16, 2004 from http://ga.water.usgs.gov/edu/mearthgw.html

Ground Water. Retrieved November 16, 2004 from http://capp.water.usgs.gov/GIP/gw_gip/index.html

The Nature of Water. Retrieved November 16, 2004 from http://www.ec.gc.ca/water/en/nature/e_nature.htm

Project WET (Water Education for Teachers). Retrieved November 16, 2004 from http://www.projectwet.org/

Understanding the Clean Water Act. Retrieved November 16, 2004 from http://www.rivernetwork.org/

Virtual River. Retrieved November 16, 2004 from http://vcourseware5.calstatela.edu/VirtualRiver/

Water Management: towards 2030. Retrieved November 16, 2004 from http://www.fao.org/ag/magazine/0303sp1.htm

Water Science for School's: Water Use In the United States. Retrieved November 16, 2004 from http://ga.water.usgs.gov/edu/wateruse.html

The Watershed Game. Retrieved November 16, 2004 from http://www.bellmuseum.org/distancelearning/watershed/watershed2.html

 

back to top