I give permission for my work
to be published in the SIRLS LIS Learning Showcase.
Carla Shults
Professor Anita Coleman
IRLS501
Library Cataloging and its Organizational Principles:
A Peak at the Past - A Glimpse at the Future
Introduction
There is a driving force behind the creation of libraries
and their
catalogs. It is called Civilization. The leading authority on the history of
Civilization, Will Durant,
says "[Civilization] begins where chaos and
insecurity end." Lionel Casson says, "And it is [in
that we find the earliest
examples of that key feature of civilization, writing."
The basic premise here is that it makes sense that as man
becomes
more civilized, he attempts
to bring order to the written word. Library
catalogs were formed as
organizational tools and have evolved over time
into different formats,
depending on collection size, the resources available,
and the need of the
patrons.
Shults 2
Charles A. Cutter formalized the purpose of the
catalog. "The most
often quoted statement of the
'objects and means' of library catalogs was
made by the renowned Charles
Ami Cutter (1904) in his setting forth of
cataloging rules in a
systematic manner"(Younger).
Charles Cutter
developed the Rules for a
Dictionary Catalog in 1876 which stated that the
purpose of a library catalog
is:
"1. To enable a
person to find a book of which either
A. the
author)
B. the
title) is known
C. the
subject)
2. To show what the
library has
A. by a
given author
B. on a
given subject
C. in a
given kind of literature[poetry, drama, fiction]
3. To assist in the
choice of a work
A. as to
its edition (bibliographically)
B. as to
its character (literary or topical)"(Buckland).
There are several reasons why cataloging is important. One reason is
for identification
purposes. It is important to know what
items are in the
collection in order to know
if something is missing or needs to be acquired.
A second reason is for
letting the searcher know that an item is available.
For example, an art museum
needs to know that they have a Monet they
acquired three years ago so
they can display it when they do an
Impressionist exhibit.
Shults 3
A third reason for cataloging is to be able to
physically locate the item. To
continue with the art
example, it is important to know the Monet exists, but
the museum also needs to know
where it is located so they can actually
retrieve it.
Finally, collocation.
It is important for the user to see a listing of like
materials, both within the
catalog and in the physical collection.
While an
art museum might not have a
need for collocation since its items tend to be
one-of-a-kind, a library
would. For example, if the patron is
looking for a
title of Gone with the
Wind, he might want to know that the book exists in
hardcover, paperback, and on
tape and that it's a movie in VHS and DVD
format.
Book Catalogs
Book catalogs were one of the first methods of organizing
library
materials. They started out as handwritten lists, then
they were printed lists.
Eventually, they were printed
in alphabetical order by author.
However, the
book catalog does not show where
a book is physically located. They are
still in use today in
organizations with a limited number of entries.
When libraries became large, the
upkeep of the book catalog became
cumbersome and
expensive. If the catalog had addendums,
the user might
Shults 4
have to look in several books
to find what he was looking for, which would
become cumbersome. Possibly, the entire book catalog would have
to be
reprinted which was
expensive.
The computer makes the use of book catalogs viable once
again. It is
simpler to update an online
book catalog than to update a printed book
catalog, and it can be
updated immediately upon acquisition of a new
document. However, if there are a large number of books
to scan through,
the process becomes
tedious. You will find that booksellers
who specialize
in specific topics will use
online book catalogs to display the titles available
for sale. Since the number of books available in a
specialized topic tends to
be small, this can work very
well and can be less expensive to maintain than
a website with an extensive
search capability.
Card Catalogs
The
development of the card catalog was a huge benefit to
information retrieval.
The book or journal information and its physical
location were typed on a card and filed in a drawer,
alphabetically, by
author. These
cards were surrogate records and contained descriptive
information about the publication. There could be any number of cards
printed with different access indices all referring
back to the original author
Shults 5
access card.
This type of retrieval can be awkward because it may mean
searching in a number of places before you get to the
original card showing
the actual location of the book.
Over
time and with practice, this type of search gets faster. The
advantage of the card catalog over printed book
catalogs is in the ability to
add new documents.
You can keep your alphabetical sort very easily with
cards by simply inserting the new card into its
correct alphabetical place.
Online book catalogs have the same flexibility.
Egan
says that another advantage is "the cards are a tactile and visual
tool." For
some people, this is more comfortable, and it's easier for them to
grasp the information when there is something in their
hand. There's also
the issue of spelling.
Egan also says "The computer doesn't do you any
favors by demanding absolute accuracy. In some ways,
the old card catalog
does allow for error.
The kids with the cards were much more likely than
the kids on the computers to stumble across something
by accident." There
is technology available today which can help with this
problem. It is known
as a 'soundex' search which is based
on phonetic attributes, not just absolute
character string matches and is available with most
database
systems.
Shults 6
Sometimes,
the patron only wants to see what the library has at that
particular location.
Maybe they can't get back to pick up an Interlibrary
Loan. Union
catalogs show the holdings at all locations;
however, not
necessarily with any particular location showing
first. Crawford has
observed users saying, "But, I want to know
what's here, users say, and
they're right."
Consistency
in information format is important for quick retrieval.
The cards were printed by one of only two companies,
the Library of
Congress and H.W. Wilson, so the format of the cards
was consistent. A
patron could go to just about any library and be
completely comfortable and
familiar with the cataloging system. This is an advantage over online
catalogs which have several different programs to
choose from and will
display the screens differently depending on the
program chosen.
One of
the disadvantages of the card catalog lies in making sure that
all of the cards are removed if a book is removed from
the library. This can
be a time-consuming process depending on the number of
alternate access
cards the library uses. "I remember consulting the tracings that
were found
either at the bottom or on the back of the shelflist
cards to find the subject
headings and other entries that had been used with the
title. We didn't want
Shults 7
to leave any 'blind' entries in the catalog. We took
special care with cross-
references in the catalog so that patrons were never
referred to an alternate
subject or author heading only to find there were no
titles under that
heading"(Balas).
Well-designed
online catalogs can fundamentally eliminate the
possibility of ‘dead’ secondary references by
automatically ensuring that the
deletion of a primary record results in the removal of
all secondary access
records.
Another
disadvantage of the card catalog derives from the time-
consuming and expensive process of filing the
cards. "Accuracy in filing
was of paramount importance, so no one (and I do mean
no one) in any
library where I was employed ever filed into the card
catalog without having
someone check his or her work"(Balas). Essentially, the library is paying
two people to file each card. And, those two people are not available for
other work while they are filing the cards. Online catalogs can perform this
function far more efficiently, since automated
auditing controls can be put in
place to highlight any potential inaccuracies in the
filing process.
Also,
if a card is lost, it must be replaced.
How does the librarian
know a card has been lost? Unless the individual cards are periodically
Shults 8
compared to a master list, then the librarian is
dependent on the patron to let
them know.
Comparing individual cards to a master list can be extremely
time-consuming, depending on the number of books in
the library. Finally,
the space required to house cards is space that cannot
be used for more
resources, so it becomes an economic liability, and
sometimes a physical
impossibility.
I know
the card catalog is out of vogue and should not be maintained,
but it still has some very strong advantages. If a library has a card catalog in
their building, they should continue to allow patrons
to use it. Over time and
with practice, the patrons will see the advantage of
the computer and will
become comfortable using the computer, and that is
where they will go for
their searches.
If a library chooses to maintain its card catalog, it would be a
straightforward process to produce printed cards from
the online catalog
representing all primary and secondary entries for the
collection. This
would allow periodic bulk refreshes of the printed
card catalog to be
conducted with confidence that for a time, at least,
the printed catalog
exactly matches the online version.
Shults 9
OPACs
Online Public Access Catalogs (OPACs), are replacing the
card
catalog. The surrogate records for an OPAC are called
metadata and are
created by a cataloger and
loaded into the database. These records
display
information about the item
and show where the item is physically located,
not unlike the card in a card
catalog. The difference is that if the
library is a
member of a union catalog,
the item may be physically located at another
site but can still be
accessed by requesting a transfer.
This is a powerful advantage as it gives the patron access
to much
more information than he was
previously exposed to using just the card
catalog. "The old card catalog has been
transformed into an online database
that not only lists items in
the local library but can retrieve citations from
other catalogs or serve as a
gateway to full-text articles in remote
databases"(Murphy).
A disadvantage of the OPAC is in the number of hits a
search might
produce. "A number of researchers have found that
users, faced with many
retrieved items, often do not
even begin browsing through the screens, and
often when they do, they stop
after the first screen or two. The
problem with
this is that in most online
catalogs, retrieved items are arranged in 'main
Shults 10
entry' alphabetical order; so
the best of the items may be one that is many
screens into the listed
retrievals, and may even be the last one
listed!"(
The computer pulls records from different catalog systems
after the
search request has been made
and displays them in the order retrieved.
This
can result in the same
surrogate record being displayed multiple times. "It
appeared that two or three
base records had been embellished or altered in
various, mostly trivial
ways. One misspelled the place of
publication and
added 'maps' to the physical description"(Tennant). The document was the
same one but the computer
couldn’t know that.
Advanced search tools and relevance matching techniques
have been
developed by both database
engines and general-purpose Internet search
engines which could be
implemented in the next-generation OPAC.
These
techniques can reduce or
completely eliminate the above issue.
Another disadvantage is in maintaining an accurate
catalog. "No
library catalog has ever been
perfect, but with the recently developed
capability of loading
bibliographic records by the thousands via computer
tape, the ease with which a
catalog's credibility can be destroyed has taken a
quantum leap
forward"(Cook).
Shults
11
No database is ever perfect, either, but well-designed
auditing
techniques that make use of
‘expert system’ approaches to data verification
can greatly improve the
quality of the content found in any online database.
Errors in the source
material, such as a tape, can be detected and identified
for examination with a high
degree of accuracy. Human catalogers can
then
make the determination if the
records are representing the same physical
item or not.
Finding space to house the hardware to run an OPAC takes up
space
that could be used to store
more books. "At
Encyclopaedia in hard copy
needs 41,580 cm3. The equivalent cdrom
needs
175 cm3. However, 585,000 cm3 of hardware is needed to
read a cdrom,
and more if you want to
print"(Grosvenor). This actually
becomes the same
problem the card catalog has
but is becoming less of a problem since
electronic devices grow
smaller every year.
The major difference between the World Book Encyclopaedia
and the
CD-ROM reader, however, is
that the encyclopedia must be physically
accessible to the user, and
the CD-ROM reader can be physically located out
of reach, or even off-site,
in less expensive space.
Shults
12
In spite of the disadvantages of the OPACs, there is
ongoing research
into providing an online
catalog that returns more accurate search results,
among other things. Later in this paper, I will talk about the
Cheshire II
project and what it is attempting
to do to change the online catalog forever.
The records in the catalogs and the information on the
records follow
organizational
principles. If the records were typed up
in any old order and
had varying types of
information on them, confusion would soon prevail and
the catalog would be
useless.
Organizational Principles
When an item is acquired by a library, it needs to be
described. Books
have their access points
determined, a description of the physical item
specified, a subject heading
chosen, and a call number assigned so the book
can be located on a
shelf. An artifact is described based on
provenance, its
condition, and both a
description and subject based on the subjective
interpretation of the
cataloger.
If an electronic resource has been identified, the
cataloger may need to
evaluate the reliability of
the resource before determining the subject,
description, and additional
access points.
Shults
13
The cataloger will want to make sure the resource is
reliable by
making sure an email address
has been provided and that the author
responds to email. The author's credentials and expertise should
also be
available. These are just a few additional steps a
cataloger needs to take
before determining whether or
not an electronic resource should be included
in the catalog.
With the changes in the cataloging system and the sharing
of
information among databases,
standardization becomes essential.
"Standardized practices for
creating records, describing changes in a
resource, and the specific
rules for description are becoming more important
as libraries work with
catalog records on globally shared
databases"(Hawkins). The computer system used by the library must
be able
to read the records coming in
from the different sources and display them in
the same format in order to
be useful to the user.
Standardization began long before computers existed. Codes were
needed to retrieve and shelve
books in the library. It wasn't
practical to run
up and down row after row of
shelving to find the book you wanted.
Melvil Dewey developed the Dewey Decimal Classification in
the
1870's to identify where a
book is located on a shelf. It is an hierarchical
Shults
14
system using three numbers
before a decimal point and then another three
numbers, etc, until the book
has a unique number. The first three
numbers
are the subject of the book,
and the rest of the numbers are used to denote
more specificity about the
subject.
The decimal is essentially limitless in the number of
levels that could
be used. However, the use of only three decimal
numbers for each level in
the hierarchy is limiting as
collections grow, since only 999 entries
can be represented by each
hierarchic level. Also, if the number gets too
long, then locating and
reshelving the book can be time-consuming.
Charles Cutter developed Rules for a Dictionary Catalog in
the 1890's.
He used the letters A-Z to
denote the subject and then used numbers to
denote locality. Additional lines and codes were used to make
the code
unique. These codes are also used to locate a book on
a shelf. This
approach is also limiting but
was the basis for the Library of Congress
Subject Headings.
International Standard Bibliographic Description (ISBD)
established
standards for form and
content for monographic publications.
The
descriptions were put onto
the surrogate records to identify that the
publication existed. These standards eventually expanded to
include
Shults
15
electronic resources. The elements specified are Title and
Statement of
Responsibility, Edition, Type
and extent of resource area, publication,
physical description, series
area, notes area, and standard number and terms
of availability. These standards were the basis for
establishing the Anglo-
American Cataloging Rules
(AACR), now in its second revision (AACR2).
The AACR2 is continuing to
undergo revisions to include electronic
resources.
AACR2
"Anglo-American Cataloging Rules 2nd Revision (AACR2)
is the set
of rules used for collecting
bibliographic data relating to library materials
and for formulating access
points (for authors, titles, subjects, related works,
etc.)" (Randall). Initially, these standards were designed for
textual material.
There are revisions being
made to the rules to accommodate a broader range
of catalogable units. "The Joint Steering Committee for
Revision of the
AACR is in the process of
revising the code to enhance rules for
international use and to
facilitate the cataloging of Internet and other types
of electronic
resources"(Hawkins).
The AACR2 has an extensive list of rules for what elements
are
needed on the surrogate
record and what the element means. It
also defines
Shults
16
the MAchine
Readable Code (MARC) which should be used to identify the
pieces of information. The MARC tags are read by the computer
program to
display the records correctly
on the screen and to compare search terms.
This set of standards was developed
specifically for Internet
information. "The Dublin Core metadata element set is
a standard for cross-
domain
information resource description. Here an information resource is
defined to be
'anything that has identity'. There are
no fundamental
restrictions to
the types of resources to which Dublin Core metadata can be
assigned"(Dublin
Core). The Dublin Core metatdata records
are displayed
on the Internet
and typically stored using Hyper-Text Markup Language
(HTML).
There is work, currently, on designing
Dublin Core templates for
authors of
documents to fill in. The HTML is
automatically created and the
surrogate record
is added to the database. This process
will free up the
librarian's time
and will get the author's document available to potential
readers much
more quickly. "One of the major
reasons for moving towards
author-described
resources with metadata is to try and provide more
effective
indexing services for the public"(Ianella & Waugh).
Shults
17
There
are 15 Dublin Core elements which can be used and 8 extra
elements for more detailed description. Some of the elements are
repeatable; some can have only one
value. All of the elements are
optional.
The 15 Dublin Core elements are:
Title
Author/Creator
Subject and keywords
Description
Publisher
Other Contributors
Date
Resource Type
Format
Resource Identifier
Source
Language
Relation
Coverage
Rights Management
Less well known and more specialized metadata standards
identified
by Iannella
and Waugh are "The Australia New Zealand Land
Information
Council (ANZLIC) and The
Platform for Internet Content Selection (PICS)."
Some of these standards can
be used in conjunction with each other, but it is
possible that a decision has
to be made as to which standard takes
precedence. "There are already many metadata
standards and more will
undoubtedly be created, which
will lead to the situation where a resource
Shults
18
will be described by two(or
more) sets of metadata attributes. What
happens
if the two sets have
contradictory information?"(Ianella & Waugh)
Using HTML as the representational mechanism for the
content that
results from the cataloging
process has limitations. This is due to
the fact
that HTML, unlike XML, does
not include a generalized mechanism for
associating metadata tags to
variable content. In addition, as the
above
discussion illustrates, HTML
lacks any capability for representing
‘inheritance’ relationships
between data items. If XML were used in
place
of HTML by all of these
cataloging standards proposals, then it would be
possible to create a
universal access method that can automatically convert
catalog entries to a standard
format. Without the use of XML, such an
access method is far more
difficult to create.
Control
Libraries use bibliographic, authority and access control
to help
maintain order and maintain
some degree of quality in their catalogs.
Bibliographic control is used
to maintain order. "The central
mechanism of
control in the modern library
is the bibliographic catalog. The
central
function of the catalog and
the bibliographic records contained in it is to
Shults
19
systematically and
predictably describe, control, and provide access to
identifiable units of
information"(Pitti).
This means not only maintaining the metadata but also
keeping up-to-
date on the status of the
physical document. The catalog should
know if the
book has been acquired, if it
is available to view, when it is due back, etc.
Conventional mechanisms for maintaining this element of the
catalog
rely on check-in/checkout
procedures. They cannot address the
issue of a
physical item being moved about
within the shelves by users of the library.
Very inexpensive technology
using radio frequency known as RFID is now
becoming available that would
allow every item in the collection to be
directly locatable regardless
of where it has been physically placed.
RFID
devices will soon become so
inexpensive that it will be economically viable
to place one in every book,
magazine, or even newspaper that a collection
contains. There is some controversy surrounding the use
of RFID, but from
what I have read, privacy of
the patron is not compromised.
Another type of control is authority control which
maintains the
quality of the document. "Archivists and librarians use authority
control
files to identify real world
entities such a people, institutions, corporations,
and societies and the name or
names by which they are known"(Pitti).
For
Shults
20
subject control, the library
might use the Library of Congress subject
headings.
The authority control for subjects would be the Library of
Congress.
If an author began his
writing under one name and then changed to another,
a search on either author
name would bring up both.
"Authority control thus
operates over and above the catalog,
bridging bibliographic records by
gathering works by and about
an author under that author's name, and works
about a subject under the
name of the subject, and each with references from
other forms of the name if
such exist and are discovered"(Pitti).
It is possible for a library to be its own authority
control for author
names and subject
headings. If it's a small library and is
not a part of a
union catalog, it might make
sense to avoid the seemingly endless selection
of subjects in the Library of
Congress subject headings. Otherwise, it
is
important for consistency’s
sake for all catalogers using the same database to
use the same subject headings
and the same name authority file. After
all,
why reinvent the wheel?
"Access control is the process of exchanging data and
information in a
secure and authoritative
manner once authentication has taken
place"(Morgan). Access control maintains some semblance of
order over
Shults
21
who can access the library's
materials. First, the user needs to be
authenticated, and then they
can access information. Library cards
are one
way of maintaining access
control. Once a person can prove they
live within
a particular county, they can
get a library card for that county and check out
books from that county's
libraries only.
Computers demand user names and passwords to access
documents,
systems, and the like,
online. An authority figure determines
that the user is
valid, and they give him the
codes he needs to access the online systems or
documents.. "A good authentication system will
prevent outsiders from
violating confidentiality and
data integrity policies"(Steinke).
Classification
"Every system of grouping books is based upon
resemblances or
likeness"(Merrill). This is known as classifying, another
principle of
organization. The reason for classifying is to bring like
books together. If a
user is looking through books
on a shelf on a particular topic, he wouldn't
have far to go to find more
books that are related. The retrieval
system in an
OPAC needs to do the same
thing for electronic resources.
According to
William Stetson Merrill, the
choices for classifying are: size, date, language,
Shults
22
binding, literary form,
subject, local treatment, and persons for whom
written.
Size. "The
simplest mode of classifying books is by size.
As a
general arrangement for a public library, such an
arrangement is
never used nowadays." Arrangement by size alone
would make for a time-consuming search. I can't imagine a
scenario where this would be efficient.
Date. "A second
and usually a simple mode of arranging books is by
date. This,
again, is not a usual classification of
books."
This type of classification could make sense in
a very specialized library where all the books are
essentially of
the same subject.
Foreign Language.
"In popular libraries it is not unusual to arrange
books in foreign languages in classes by
themselves, calling
them French books, German books, and the
like."
I have seen
the foreign language books classified together by
audience.
For example, picture books
in Spanish are classified
together in the children's area while the non-fiction Spanish
books are classified together in the non-fiction
area.
Shults
23
Binding. "Fine
specimens of binding may properly be arranged
together in a bibliographic museum or in a
exhibition of library
treasures."
Literary Form.
"Classification by literary form is common enough in
every system of classification. Encyclopedias are usually
placed with other works of general reference at
the beginning of
the classification."
Subject. "A
classification based upon this feature of a book is indeed