Danielle Carlock
The
catalog and the information organizing principles surrounding it:
present
trends and challenges
The
catalog, as a way to provide access to the holdings of a library, has
been an
integral part of the profession of librarianship from its earliest
days.
Although the format of the catalog has changed from print-based, card
format to
computer format (OPACs), and continues to evolve, its functions remain
the
same. However the sweeping changes in the format of catalogs, and in
the information
organizing principles surrounding it, are changing the tasks of the
librarian.
Most of these changes are driven by the technological innovations the
internet
has wrought in the areas of electronic communication and publishing.
This paper will explore not only the changing nature of the catalog
itself, but the information organizing principles that go into creating
catalogs. The new internet technologies have also made possible new
forms of the library. Digital libraries, online repositories of
information, now exist and continue to expand. The
challenges of these newly emerging entities will be discussed.
When
attempting to describe the functions of the catalog, most librarians
still call
on Cutter’s enduring work. Cutter’s “objects of the catalog” (1904)
include: to
ensure the user can access any work of which he/she knows the author,
title or
subject, to show the library’s holdings by author, by subject, or by
type of
literature, and to assist the user in the choice of a book by edition
or
character. Today’s catalogs, the OPACs are designed based on Cutter’s
principles. According to Taylor (1999) the catalog also serves as an
inventory
of the library’s collection, which librarians access to determine their
own
holdings when completing collection development tasks.
From
surveying the library literature, it appears most librarians agree that
the
purposes of the catalog remain the same since Cutter’s time; however
the
specifics of how to meet user’s information needs through them is
subject
to controversy. (Osbourne, 1941; Theimer, 2002; Borgman, 1996;
Cochrane, 2000;
Buckland, 1997)
One
controversy surrounding the development of catalogs is which “theory of
cataloging” to pursue. According to Osbourne (1941) there are 4
theories:
legalism, pragmatism, bibliographic cataloging, and perfectionism.
Those of the
legalism vein believe that rules must be devised for every possible
contingency; however Osbourne argues that this leads to a never ending
process
of revising rules, thus diminishing productivity and increasing costs.
Perfectionists would like to catalog a work so that it remains good for
all
time, however Osbourne argues that this is never the case- history
shows that
revisions always occur. Bibliographic cataloging, in which a catalog is
a type
of bibliography, leads to too much detail in the catalog descriptions,
says
Osbourne. Pragmatism, in which each library conducts cataloging based
on its
own needs, argues Osbourne, is the preferred theory. He proposes three
levels
of cataloging: standard, simplified, and detailed. Each library should
choose
which one to use based on its needs. This sounds a lot like the concept
of
exhaustivity, and the controversy surrounding whether to conduct
summarization
or depth indexing. Although Osbourne was writing more than sixty years
ago, his
ideas seem relevant today. It can be argued that the theory of
cataloging
followed by a library will determine in many respects the nature of its
catalog.
In
order to create an inventory or catalog one must carry out several
processes.
The item or unit to be cataloged (monograph, journal, video, etc) must
be first
described through the process of descriptive cataloging, following
certain
rules and procedures. The item must also be analyzed as to its
subject(s)
through determining “aboutness.” Depending on the classification scheme
to be
used (DDC, LCSH, etc), the cataloger assigns call numbers to the item.
Finally,
the data generated from these processes is turned into a surrogate
record,
which serves as a representation of the item itself. Finally the
surrogate
record must be encoding into a format in which it will be displayed
(MARC,
etc).
Most
if not all of the processes described above are in a state of flux.
This is due
to many factors: the sheer increase in information and therefore items
to be
cataloged, an increase in the kinds of formats available, and ever
changing
technology. The changes and controversies surrounding descriptive and
subject
cataloging, encoding and metadata standards, and authority control will
be
discussed next.
Before
an item can be cataloged, it must first be decided what the item
actually is.
This seems an easy and obvious task, but not so upon close examination.
Should
a two volume work be cataloged as two separate units or as one? Should
a series
be cataloged as a one unit or as separate ones? Electronic resources
are even
more difficult to work with in this regard. Should a website be
classified as
one unit, or should each individual page be cataloged separately? (Taylor, 1999)
Once
the unit of analysis is determined, the process of describing it can
commence.
The AACR2r and the ISBD standards are the most widely accepted rules
for
describing items to be cataloged (Taylor, 1999). However new standards
are
being developed to deal with new formats that have arisen primarily due
to the
surge in electronic materials and the world wide web.
The
Dublin Core is one of the most prominent standards that has arisen to
describe
WWW documents (especially in Europe) (el Sherbini, 2001). However
there are many
other standards under development, leading to what Milstead and Feldman
(1999)
call the “metadata dilemma.” Most of the new standards have arisen
because of
the need to describe specialized information packages. For example the
FGDC
standard has been developed for geospatial data, the VRA for visual
resources,
GILS for government publications/information/services, and the MPEG 7
was
created for multimedia files (Taylor, 1999; El Sherbini, 2001; Milstead
&
Feldman, 1999). Some authors have expressed concern that the
proliferation of
metadata standards may be creating chaos and confusion. El Sherbini
(2001) believes it possible that multiple records will be created for
the same
resource under different metadata standards, which would confuse users.
In
addition, the need to create extensions to the standards would further
confound crosswalks. Milstead and Feldman (1999), while acknowledging
the need
to create metadata standards that are customized for specialized
resources,
call for a metadata registry to keep track of all current proposals. It
is
unclear at this time what the outcome will be of multiple standards,
whether
they will improve access or lead to chaos and a lack of
interoperability.
Metadata
standards that describe the content of resources, as discussed above,
are not
the only standards in flux. Encoding standards, those which create the
“containers” in which content is placed, are also undergoing
considerable
change. (Taylor, 1999) The MARC format, which is perhaps the first
encoding
standard, has reigned supreme for decades. It has served the needs of
the
library well, by acting as standard format in which to create surrogate
records
for all kinds of resources. However some argue that the time for MARC’s
reign
has come to an end, while others continue to argue its merits.
According to
Mayes (2003) MARC is limited because it is not interoperable with HTML,
SGML,
or HTML. Instead she proposes a switch to an XML schema developed by
OCLC that
preserves many of the MARC elements. Kokabi, (1996) while supporting
the MARC
format, acknowledges some weaknesses of the standard, including the
fact that some OPAC’s can’t handle
MARC format and that MARC inherited the flaws of the card catalog.
However Kokabi
believes that MARC will not go away anytime soon, since it has proven
to be
stable and effective, and allows copy cataloging. In addition, many
developing
countries are just beginning to adopt MARC.
Pitti
(1995b) favors a changeover from MARC to SGML because it is used by a
wider
community. While only librarians use MARC, SGML is a standard for
markup
languages that is used by people in many communities including
government and
industry. Tennant (2002a) suggests that MARC exit strategies be adopted
immediately. The reasons for his view are many: MARC is inflexible,
difficult
to read, and used only by libraries. Therefore the choice of software
that
libraries can use is limited to vendors that make MARC compatible
programs
(Tennant, 2002b). With these serious criticisms of MARC and the
increased
interest in the markup languages, it will be interesting to see how
many
library systems convert over to new formats.
After
an item is described using a metadata standard and an encoding
standard, it
must next undergo subject cataloging. During subject cataloging, the
aboutness of
an item is decided and subject headings and classification numbers are
assigned. (Taylor, 1999) Determining aboutness continues to be a tricky
task.
Merill (1912), writing almost a hundred years ago addresses concerns
that are
still with us today. For example, he addressed the question of how a
book about
two or more subjects should be treated, as well as other matters in
subject
analysis.
While
determining aboutness the cataloger must assign subject headings based
on the
subject heading list or thesaurus (for example LCSH, Sears, MeSH) in
use by the
library. Then the work is assigned a place in the library’s
classification
scheme based on its assigned subject(s). Like many other aspects of
cataloging,
this process has also been fraught with controversy. For example, how
deep in
the classification scheme should the cataloger go? Should he/she
classify the
work based only on where it falls within the main classes of the
scheme? Or
should a deeper level of classification be pursued, perhaps down to the
finest
subdivisions that are available? (Taylor, 1999) Some things to consider
in
choosing between broad and close classification is to what degree
collocation
is desired. If the library is very large and only broad classification
is
carried out collocation will be minimal; works will be classed near
other
works with which they only share a broad relationship. However in a
small
library broad classification may be sufficient to accomplish
collocation
(Taylor, 1999).
Another
process that goes into producing a library catalog is authority
control.
Authority control is a process in which authorized forms of names and
titles
and authorized terms for subjects are chosen and maintained for use in
resource
descriptions. (Taylor, 1999) This allows for standardization; in theory
a
library user should be able to type an unauthorized version of an
author name
and be directed to the authorized version, and hence all the materials
written
by said author. The same should be true for subject or title
searches.
Some
authors express the merits and need for the continuation of authority
control.
Others are of the opposite view, declaring that authority control does
not do
what it claims to do. Ayres, (2001) while supporting the idea of
authority
control, claims that libraries are not living up to it. Instead, he
claims,
users are missing materials held by the library when searching the OPAC
because
of a lack of consistency in authority control. Using the Library of
Congress
OPAC to support this claim, he shows that some searches do not yield
all the
information contained in the library. Searching Dostoevski (an
unauthorized
variant of Dostoevsky) yields only 5 out of the 329 holdings in the LC.
Other
unauthorized versions of the name also yield incomplete results. He
also
supplies seven examples of subject searches which yield results
inconsistent
with proper authority control. Ayres' main concern is that libraries
are touting
authority control as a feature that sets them apart from Internet
search
engines, yet consistent authority control is not being delivered.
Some
other authority control difficulties were discovered by this author
when
completing practice exercises in OCLC. When searching for the author
Juan
Diego, no results came up under J. Diego. The same was true when
searching for
Ruth Underhill as R. Underhill. However just searching under Diego or
Underhill
did yield the correct results. Vocabulary control for titles was also
problematic. When searching for the title Navajo Peyote Ceremonial
Songs,
I found that a search under Ceremonial Songs Dine did not yield the
title. Since the Navajo call themselves the Dine, the two names should
be cross referenced. A Dine person searching OCLC would probably search
under Dine,
since that is the culturally appropriate term. However, the search will
not
yield results that are classed only under Navajo.
Jeng
(2002) argues that authority control is a costly, unnecessary measure.
Users
don’t mind sorting through search results which include a lot of
unrelated
materials, he claims. In addition, says Jeng, the more controlled the
vocabulary is the greater the knowledge base the user must have about
the
system in order to complete a successful search. In contrast, argues
Jeng, indexing
and abstracting services are serving the user well with little
authority
control, but very flexible interfaces that give users a lot of control
in
designing their searches.
Jackson
(2003) counters these argument against authority control. He says it is
needed
more than ever as the amount of information continues to grow. He
claims
automated authority control can be done efficiently. As one example, he
cites
the conversion of 600 subject headings beginning with Afro-American to
African
American completed overnight in his library.
Another authority control
controversy concerns whether the process should be outsourced. Jackson
(2003) and others claim that vendor supplied authority control can be
done
successfully. Aschmann (2003) discusses the outcomes of vendor supplied
authority control at Virginia Tech. Authority control was indeed
improved,
however the expected decrease in staff time did not follow. Instead a
permanent
team had to be formed to carry out quality assurance.
It is
not clear how the controversies surrounding authority control will
flesh out.
It seems difficult to imagine that libraries will do away with the idea
of
authority control completely, as suggested by Jeng. However, universal
authority control has certainly not been accomplished, as illustrated
by Ayers.
Once
an item is described using a metadata standard, encoded with an
encoding
standard, analyzed as to subject, classified by subject headings,
assigned a
call number, and had authority work carried out on its access points,
it needs
to become part of the library’s inventory- become part of its catalog.
The
card catalog was the predominant form of the catalog for the greater
part of
the 20th century. Not until the 1960’s did a computer
generated
catalog come into play, and it was in a limited fashion. The computer
generated
a microform readable catalog; however this version of the catalog was
unpopular
because of the difficulty of working with microform. OPACs or online
public
access catalogs first introduced in the late 1970’s have now become the
dominant form of catalog. At first they were readable from computers
only at
the library through CD-ROM. Then as the internet grew, OPACs became
accessible
through gophers and telnet, and finally through a WWW homepage allowing
patrons
to access the catalog from their homes. (Taylor, 1999)
Today’s
OPAC’s, despite their convenience and greater
accessibility, have their critics. According to Theimer (2002) OPAC’s
are not
designed for 21st century users who are versed in the
internet
search engines, and usually expect the OPAC to respond in the same way
as do the search engines. Many
internet search engines forgive misspellings while most OPAC’s do not.
Borgman
(1996) posits that the design of OPAC’s does not
match the information seeking behavior of users at all. The catalog is
based
upon Cutter’s principles, yet most library users do not come to the
library
actually knowing one of the three access points (even Jackson writing
in 1958
noted that card catalog users rarely came with complete bibliographic
information).
In addition, the user must know library terminology to some degree in
order to
successfully execute a search. This includes, for example some
knowledge of
subject headings. In today's culturally diverse world, word choice is
increasingly diverse.
Library of Congress subject headings (and other classification schemes)
may not adequately take this into account.
Cochrane (2000) suggests that OPAC keyword searching
should be improved. As is stands now, the burden of the search is
placed
completely on the user, with no assistance provided in finding synonyms
or
related terms. Keyword searching can be integrated with controlled
vocabulary
searching to overcome this difficulty.
Other authority control issues impacting the OPAC came to light
during Bowman’s study (2000). Thirty eight OPAC’s were author searched
using
five variants of the name Leonardo da Vinci. Cross- referencing to the
authorized version of the name was not found in all cases. The use of
corporate
names were also found to be inconsistent across the OPAC’s studied.
This was particularly true
with subordinate names, such as Atomic Energy Commission. The same
results were
not obtained when searching under Atomic Energy Commission and U.S.
Atomic
Energy Commission.
Many
of these findings support Spanhoff’s (2002)
contention that OPAC’s find, but do not gather well. Some solutions
that have
been put forward to improve OPAC’s include: forcing users to view a
definition
of terms (for subject headings), so that they can be sure it agrees
with what
they are looking for, automatic spelling correction, and the inclusion
of
author browse which allows users to be directed to the authorized name.
(Cochrane, 2000; Bowman, 2000; Borgman,1996; Theimer, 2002)
The
OPAC is also being influenced by the new technologies of the internet.
Because
of the explosion of online materials, Medeiros (1999) recommends that
libraries
develop OPAC representation policies which lay out exactly what online
materials will be included in the OPAC.
Concern
that the internet is replacing the library as the place that people
turn to for
information has prompted several authors to make suggestions for how to
deal
with this situation. Medeiros (1999) suggests MARC records be converted
into
XML which would allow OPAC content to be searched by internet search
engines.
Casciato (1999) defends the design of OPAC’s vis á vis the
Amazon.com model.
The OPAC is objective and not commercially driven whereas the Amazon
catalog is
highly subjective, allowing anyone to upload a book review. He believes
that
the OPAC (and the service of cataloging as performed by librarians)
will remain
viable because of its demonstrated authenticity and authority.
Another
future direction the OPAC could take is conversion to an online
bibliography.
Compared to catalogs, bibliographies are known for their greater
variety of
records and more detailed indexing. An online bibliography could
include all
the materials considered important in a particular field, with links to
holdings information (of libraries chosen by the user). Such a
bibliography or
bibliographies could be a collaboration between many institutions.
Another
aspect of this concept is access to detailed access information. As the
user
finds an appropriate title, he/she could click on local holdings and
discover
not only if the title can be found at the location in question, but
also obtain
the call number and circulation status of the item. (Buckland, 1997)
New
computer technologies have led to the possibility of not only new
futures for
OPAC’s but have also spawned a new form of library, the digital
library. A
digital library can be defined as a “computer-based system for
acquiring,
storing, organizing, searching, and distributing digital materials for
end user
access” (Sharma and Vishwanathan, 2001). Digital libraries can be open
access
(freely available to all) or they can follow the restricted access
model. Some
digital libraries are sponsored by libraries (for example the
California
Digital Library sponsored by UC at http://www.cdlib.org/), or they may be run by other
organizations (such as netLibrary, a commercial venture at http://www.netlibrary.com.) Despite these differences,
digital libraries face similar organizational challenges.
Various
protection issues arise in the online world. The digital library’s
materials
are vulnerable to viruses, and as yet no comprehensive solution has
been found
to this problem (Sharma and Vishwanathan, 2001). If the library is to
be for
authorized users only, seamless, cost effective, controlled access
systems must
be incorporated into its design. (Pope, 1998). The intellectual
property rights
of contributors must be protected. The system must be set up so that
resources
in the library cannot be tampered with, thus ensuring their
authenticity
(Pope, 1998; Sharma and Vishwanathan, 2001).
The
future requirements of the library must be considered in its design.
The system
must allow resources to continue to be added in a variety of formats
(even in
new types of format that cannot be anticipated yet). Long term
preservation of
the materials must also be considered (Greenstein, 2001; Pope, 1998).
This
includes a consideration of how the system should be designed to
accommodate
archival functions, how the costs will be handled, and how the
decisions as to
which materials will be kept should be decided.
Sharma
and Vishwanathan (2001) raise some interesting questions regarding
equity,
social justice, and digital libraries. Because so many developing
countries do
not yet have the technology (or enough of the technology) to support
digital
libraries, the development of the digital library may increase the gap
between
the haves and have-nots. In addition, digital library collections are
primarily
written in the five major world languages: English, Chinese, Hindi,
Russian,
and Spanish. Those who speak other languages will be cut off from
accessing digital library collections. Another aspect of equity that
must be
considered when designing digital libraries is that of access for
persons with
disabilities. This is a complex issues because different types of
disabilities
require different types of modifications (Pope, 1998).
Another challenge
facing digital libraries is that of maintenance. Traditional libraries
have had
centuries to work out the issues involved in maintaining their
collections.
Digital libraries are too new to have done this (Sharma and
Vishwanthan, 2001).
In addition, the maintenance needs of digital libraries may exceed that
of
traditional libraries (Ackerman and Fielding,1995).
Greenstein
(2000) argues that standards and best
practices should be developed. He especially calls for the development
of
benchmarks that would allow users to evaluate the digital library.
Collection
development policies that take into consideration the costs, benefits,
and
values of different types of resources should also be developed.
A
further challenge of digital libraries is that of
deciding how to handle peer review. Pitti’s vision of a digital
community
(1995a) proposes a solution to this dilemma. Researchers can publish
their
works online, and once endorsed by the relevant scholarly organization,
a link
can be created to that work from an online bibliography. Then anyone
using the
bibliography can access the peer reviewed and approved work. The
digital
community proposal also includes the idea of allowing scholars private
spaces
in which to add annotations to collections, and then, if desired to
move these
items into public spaces. This would create an informal learning and
research
community. Although digital libraries face many
organizational challenges they possess substantial potential for
reaching a wide audience and for transforming the world of information.
The process of
cataloging has been discussed from the viewpoint of the trends
and challenges surrounding it. The nature of the catalog is under
question, with recommendations on how to transform it to better conform
to the digital age. The new internet technologies are also impacting
how surrogate records are created, in terms of both encoding and
metadata standards. Despite changes due to technology, many age old
cataloging controversies remain, such as which cataloging theory to
embrace and whether to practice broad or close classification.
Enshrined practices, such as authority control have come under fire,
while at the same time garnering support. The digital library made
possible by the internet faces its own unique challenges. As can be
seen then, the field of cataloging while preserving many of its
traditions, is also in the process of embracing many exciting
changes.
References
Ackerman,
MS &
Fielding, RT (1995) Collection
maintenance in the digital library. In
Proceedings of Digital Libraries, Austin, TX pg 38-48.
Retrieved 11/25/2003 from <http://csdl.tamu.edu/DL95/papers/ackerman/ackerman.html>
Aschmann,
Althea.
(2003) The lowdown on vended supplied authority control at
Virginia Tech.
Technical Services Quarterly 20(3):33-44.
Ayers, F.H. (2001) Authority control simply does not work. Cataloging and Classification Quarterly 32(2):49-59
Borgman,
C (1996)
Why are online catalogs still hard to use? Journal of the American Society for
Information Science 47(7):493-503.
Bowman,
J.H. (2003)
The catalog as barrier to retrieval Part II: forms of name. Cataloging and
Classification Quarterly 30(4): 51-73.
Buckland, M. (1997) Bibliographic access reconsidered. In Redesigning library services: a manifesto. Accessed 12/1/2003 at <http://sunsite.berkeley.edu/Literature/Library/Redesigning/bibaccess.html>
Casciato, DC (1999) Authority and objectivity in a time of transformative growth: the future of the library catalog. Library Computing 18(4):295-300.
Cochrane, Pauline. (2000) Improving LCSH for use in online catalogs: what progress has been made? What issues remain? Cataloging and Classification Quarterly 29(1-2):73-89.
Cutter,
Charles A.
(1904) Rules for a dictionary
catalog, 4th Edition. Washington:
Government Printing Office.
Greenstein, Daniel (2000) Digital libraries and their challenges. Library Trends 49(2): 290-303.
Jackson, Richard V. (2003) Authority control is alive and.... well? OLA Quarterly 9(1):9-12.
Jackson, Sidney L. (1958) Catalog Use Study. Chicago: ALA.
Jeng, Ling Hwey. (2002) What authority? What
control?
Cataloging and Classification
Quarterly. 34(4):91-7
Kokabi, Mortaza (1996) Is the future of MARC assured? Library Review 45(2): 68-73.
Mayes, Bessie (2003) Beyond MARC: New trends for
the
library of the future. Oregon
Library Association 9(1): 2-4.
Medeiros, Norm (1999) Driving with eyes closed: the perils of traditional catalogs and cataloging in the internet age. Library Computing 18(4):300-306.
Merrill, William S. (1912) A code for classifers-its scope and problems. Library Journal, May 1912: 245-310.
Pitti, Daniel (1995a) Settling the digital frontier: the future of scholarly communication in the humanities. Accessed 12/1/2003 at <http://sunsites.berkeley.edu/FindingAids/EAD/dpitti.html
Pitti, Daniel (1995b) SGML and the transformation of cataloging. The Serials Librarian 25 (3-4): 243-253.
Osbourn,
Andrew
(1941) The crisis in cataloging. Library
Quarterly 11(4): 393-411.
Sharma, R.K. & K.R. Vishwanathan (2001) Digital libraries: development and challenges. Library Review 50(1): 10-15.
Spanhoff,
Elizabeth
de Rijk (2002) Cataloging paradigms: old and new. Cataloging and classification
quarterly 35(1-2): 37-59.
Taylor, Arlene (1999) The organization of information. Englewood, CO: Libraries Unlimited.
Tennant, R (2002a) MARC exit strategies. Library Journal 127(19):27-8.
Tennant, R (2002b) MARC must die. Library Journal 127(17):26-28.
Theimer, Sarah (2002) When a 21st century
user
meets a 20th century OPAC: how word choice impacts search success. PNLA
Quaterly 66(3): 11-25.