I give permission for my work to be published in the SIRLS LIS Learning Showcase.

 

Carla Shults

 

Professor Anita Coleman

 

IRLS501

 

December 10, 2003

 

                   Library Cataloging and its Organizational Principles:

 

                        A Peak at the Past - A Glimpse at the Future     

         

Introduction

 

          There is a driving force behind the creation of libraries and their

 

catalogs.  It is called Civilization.  The leading authority on the history of

 

Civilization, Will Durant, says "[Civilization] begins where chaos and

 

insecurity end."  Lionel Casson says, "And it is [in Egypt and Mesopotamia]

 

that we find the earliest examples of that key feature of civilization, writing."

 

          The basic premise here is that it makes sense that as man becomes

 

more civilized, he attempts to bring order to the written word.  Library

 

catalogs were formed as organizational tools and have evolved over time

 

into different formats, depending on collection size, the resources available,

 

and the need of the patrons. 

 

 

 

                                                                                                Shults  2

 

          Charles A. Cutter formalized the purpose of the catalog.  "The most

 

often quoted statement of the 'objects and means' of library catalogs was

 

made by the renowned Charles Ami Cutter (1904) in his setting forth of

 

cataloging rules in a systematic manner"(Younger).  Charles Cutter

 

developed the Rules for a Dictionary Catalog in 1876 which stated that the

 

purpose of a library catalog is:

 

          "1.  To enable a person to find a book of which either

                   A.  the author)

                   B.  the title)      is known

                   C.  the subject)

          2.  To show what the library has

                   A.  by a given author

                   B.  on a given subject

                   C.  in a given kind of literature[poetry, drama, fiction]

          3.  To assist in the choice of a work

                   A.  as to its edition (bibliographically)

                   B.  as to its character (literary or topical)"(Buckland).

 

          There are several reasons why cataloging is important.  One reason is

 

for identification purposes.  It is important to know what items are in the

 

collection in order to know if something is missing or needs to be acquired. 

 

A second reason is for letting the searcher know that an item is available. 

 

For example, an art museum needs to know that they have a Monet they

 

acquired three years ago so they can display it when they do an

 

Impressionist exhibit.

 

                                                                                                Shults  3

 

 A third reason for cataloging is to be able to physically locate the item.  To  

                                                                                               

continue with the art example, it is important to know the Monet exists, but

 

the museum also needs to know where it is located so they can actually

 

retrieve it.  

 

          Finally, collocation.  It is important for the user to see a listing of like

 

materials, both within the catalog and in the physical collection.  While an

 

art museum might not have a need for collocation since its items tend to be

 

one-of-a-kind, a library would.  For example, if the patron is looking for a

 

title of Gone with the Wind, he might want to know that the book exists in

 

hardcover, paperback, and on tape and that it's a movie in VHS and DVD

 

format.

 

Book Catalogs

 

          Book catalogs were one of the first methods of organizing library

 

materials.  They started out as handwritten lists, then they were printed lists. 

 

Eventually, they were printed in alphabetical order by author.  However, the

 

book catalog does not show where a book is physically located.  They are

 

still in use today in organizations with a limited number of entries. 

 

          When libraries became large, the upkeep of the book catalog became

 

cumbersome and expensive.  If the catalog had addendums, the user might

                                                                                               

                                                                                                Shults  4

 

have to look in several books to find what he was looking for, which would

 

become cumbersome.  Possibly, the entire book catalog would have to be

 

reprinted which was expensive. 

 

          The computer makes the use of book catalogs viable once again.  It is

 

simpler to update an online book catalog than to update a printed book

 

catalog, and it can be updated immediately upon acquisition of a new

 

document.  However, if there are a large number of books to scan through,

 

the process becomes tedious.  You will find that booksellers who specialize

 

in specific topics will use online book catalogs to display the titles available

 

for sale.  Since the number of books available in a specialized topic tends to

 

be small, this can work very well and can be less expensive to maintain than

 

a website with an extensive search capability. 

 

Card Catalogs

 

          The development of the card catalog was a huge benefit to

 

information retrieval.  The book or journal information and its physical

 

location were typed on a card and filed in a drawer, alphabetically, by

 

author.  These cards were surrogate records and contained descriptive

 

information about the publication.  There could be any number of cards

                                                                                               

printed with different access indices all referring back to the original author

 

                                                                                                Shults  5

 

access card.  This type of retrieval can be awkward because it may mean

 

searching in a number of places before you get to the original card showing

 

the actual location of the book.  

 

          Over time and with practice, this type of search gets faster.  The

 

advantage of the card catalog over printed book catalogs is in the ability to

 

add new documents.  You can keep your alphabetical sort very easily with

 

cards by simply inserting the new card into its correct alphabetical place. 

 

Online book catalogs have the same flexibility.

 

          Egan says that another advantage is "the cards are a tactile and visual

 

tool."  For some people, this is more comfortable, and it's easier for them to

 

grasp the information when there is something in their hand.  There's also

 

the issue of spelling.  Egan also says "The computer doesn't do you any

 

favors by demanding absolute accuracy. In some ways, the old card catalog

 

does allow for error.  The kids with the cards were much more likely than

 

the kids on the computers to stumble across something by accident."  There

 

is technology available today which can help with this problem.  It is known

 

as a 'soundex' search which is based on phonetic attributes, not just absolute

 

character string matches and is available with most database

 

systems.

 

                                                                                                Shults  6

 

          Sometimes, the patron only wants to see what the library has at that

 

particular location.  Maybe they can't get back to pick up an Interlibrary

 

Loan.  Union catalogs show the holdings at all locations;  however, not

 

necessarily with any particular location showing first.  Crawford has

 

observed users saying, "But, I want to know what's here, users say, and

 

they're right."  

 

          Consistency in information format is important for quick retrieval. 

 

The cards were printed by one of only two companies, the Library of

 

Congress and H.W. Wilson, so the format of the cards was consistent.  A

 

patron could go to just about any library and be completely comfortable and

 

familiar with the cataloging system.  This is an advantage over online

 

catalogs which have several different programs to choose from and will

 

display the screens differently depending on the program chosen.

 

          One of the disadvantages of the card catalog lies in making sure that

 

all of the cards are removed if a book is removed from the library.  This can

 

be a time-consuming process depending on the number of alternate access

 

cards the library uses.  "I remember consulting the tracings that were found

 

either at the bottom or on the back of the shelflist cards to find the subject

 

headings and other entries that had been used with the title. We didn't want

 

                                                                                                Shults  7

 

to leave any 'blind' entries in the catalog. We took special care with cross-

 

references in the catalog so that patrons were never referred to an alternate

 

subject or author heading only to find there were no titles under that

 

heading"(Balas).

 

          Well-designed online catalogs can fundamentally eliminate the

 

possibility of ‘dead’ secondary references by automatically ensuring that the

 

deletion of a primary record results in the removal of all secondary access

 

records.

 

          Another disadvantage of the card catalog derives from the time-

 

consuming and expensive process of filing the cards.  "Accuracy in filing

 

was of paramount importance, so no one (and I do mean no one) in any

 

library where I was employed ever filed into the card catalog without having

 

someone check his or her work"(Balas).  Essentially, the library is paying

 

two people to file each card.  And, those two people are not available for

 

other work while they are filing the cards.  Online catalogs can perform this

 

function far more efficiently, since automated auditing controls can be put in

 

place to highlight any potential inaccuracies in the filing process.         

 

          Also, if a card is lost, it must be replaced.  How does the librarian

 

know a card has been lost?  Unless the individual cards are periodically

 

                                                                                                Shults  8

 

compared to a master list, then the librarian is dependent on the patron to let

 

them know.  Comparing individual cards to a master list can be extremely

 

time-consuming, depending on the number of books in the library.  Finally,

 

the space required to house cards is space that cannot be used for more

 

resources, so it becomes an economic liability, and sometimes a physical

 

impossibility.   

 

          I know the card catalog is out of vogue and should not be maintained,

 

but it still has some very strong advantages.  If a library has a card catalog in

 

their building, they should continue to allow patrons to use it.  Over time and

 

with practice, the patrons will see the advantage of the computer and will

 

become comfortable using the computer, and that is where they will go for

 

their searches.  If a library chooses to maintain its card catalog, it would be a

 

straightforward process to produce printed cards from the online catalog

 

representing all primary and secondary entries for the collection.  This

 

would allow periodic bulk refreshes of the printed card catalog to be

 

conducted with confidence that for a time, at least, the printed catalog

 

exactly matches the online version.

                                                         

 

 

 

 

                                                                                                Shults  9

 

OPACs

 

          Online Public Access Catalogs (OPACs), are replacing the card

 

catalog.  The surrogate records for an OPAC are called metadata and are

 

created by a cataloger and loaded into the database.  These records display

 

information about the item and show where the item is physically located,

 

not unlike the card in a card catalog.  The difference is that if the library is a

 

member of a union catalog, the item may be physically located at another

 

site but can still be accessed by requesting a transfer.

 

          This is a powerful advantage as it gives the patron access to much

 

more information than he was previously exposed to using just the card

 

catalog.  "The old card catalog has been transformed into an online database

 

that not only lists items in the local library but can retrieve citations from

 

other catalogs or serve as a gateway to full-text articles in remote

 

databases"(Murphy).    

 

          A disadvantage of the OPAC is in the number of hits a search might

 

produce.  "A number of researchers have found that users, faced with many

 

retrieved items, often do not even begin browsing through the screens, and

                                                                                               

often when they do, they stop after the first screen or two.  The problem with

 

this is that in most online catalogs, retrieved items are arranged in 'main

 

                                                                                                Shults  10

 

entry' alphabetical order; so the best of the items may be one that is many

 

screens into the listed retrievals, and may even be the last one

 

listed!"(Taylor).

 

          The computer pulls records from different catalog systems after the

 

search request has been made and displays them in the order retrieved.  This

 

can result in the same surrogate record being displayed multiple times.  "It

 

appeared that two or three base records had been embellished or altered in

 

various, mostly trivial ways.  One misspelled the place of publication and

 

added 'maps' to the physical description"(Tennant).  The document was the

 

same one but the computer couldn’t know that.

 

          Advanced search tools and relevance matching techniques have been

 

developed by both database engines and general-purpose Internet search

 

engines which could be implemented in the next-generation OPAC.  These

 

techniques can reduce or completely eliminate the above issue.

 

          Another disadvantage is in maintaining an accurate catalog.  "No

 

library catalog has ever been perfect, but with the recently developed

 

capability of loading bibliographic records by the thousands via computer

 

tape, the ease with which a catalog's credibility can be destroyed has taken a

 

quantum leap forward"(Cook).

 

                                                                                                Shults 11

 

          No database is ever perfect, either, but well-designed auditing

 

techniques that make use of ‘expert system’ approaches to data verification

 

can greatly improve the quality of the content found in any online database. 

 

Errors in the source material, such as a tape, can be detected and identified

 

for examination with a high degree of accuracy.  Human catalogers can then

 

make the determination if the records are representing the same physical

 

item or not.  

 

          Finding space to house the hardware to run an OPAC takes up space

 

that could be used to store more books.  "At Blacktown, the World Book

 

Encyclopaedia in hard copy needs 41,580 cm3.  The equivalent cdrom needs

 

175 cm3.  However, 585,000 cm3 of hardware is needed to read a cdrom,

 

and more if you want to print"(Grosvenor).  This actually becomes the same

 

problem the card catalog has but is becoming less of a problem since

 

electronic devices grow smaller every year. 

 

          The major difference between the World Book Encyclopaedia and the

 

CD-ROM reader, however, is that the encyclopedia must be physically

 

accessible to the user, and the CD-ROM reader can be physically located out

                                                                                               

of reach, or even off-site, in less expensive space. 

 

                                                                                               

 

                                                                                                Shults 12

         

          In spite of the disadvantages of the OPACs, there is ongoing research

 

into providing an online catalog that returns more accurate search results,

 

among other things.  Later in this paper, I will talk about the Cheshire II

 

project and what it is attempting to do to change the online catalog forever.

 

          The records in the catalogs and the information on the records follow

 

organizational principles.  If the records were typed up in any old order and

 

had varying types of information on them, confusion would soon prevail and

 

the catalog would be useless.  

 

Organizational Principles

 

          When an item is acquired by a library, it needs to be described. Books

 

have their access points determined, a description of the physical item

 

specified, a subject heading chosen, and a call number assigned so the book

 

can be located on a shelf.  An artifact is described based on provenance, its

 

condition, and both a description and subject based on the subjective

 

interpretation of the cataloger.

 

          If an electronic resource has been identified, the cataloger may need to

 

evaluate the reliability of the resource before determining the subject,

 

description, and additional access points.

 

         

 

                                                                                                Shults 13

 

          The cataloger will want to make sure the resource is reliable by

 

making sure an email address has been provided and that the author

 

responds to email.  The author's credentials and expertise should also be

 

available.  These are just a few additional steps a cataloger needs to take

 

before determining whether or not an electronic resource should be included

 

in the catalog.    

 

          With the changes in the cataloging system and the sharing of

 

information among databases, standardization becomes essential.

 

"Standardized practices for creating records, describing changes in a

 

resource, and the specific rules for description are becoming more important

 

as libraries work with catalog records on globally shared

 

databases"(Hawkins).  The computer system used by the library must be able

 

to read the records coming in from the different sources and display them in

 

the same format in order to be useful to the user. 

 

          Standardization began long before computers existed.  Codes were

 

needed to retrieve and shelve books in the library.  It wasn't practical to run

                                                                                               

up and down row after row of shelving to find the book you wanted.  

 

          Melvil Dewey developed the Dewey Decimal Classification in the

 

1870's to identify where a book is located on a shelf.  It is an hierarchical

 

                                                                                                Shults 14

 

system using three numbers before a decimal point and then another three

 

numbers, etc, until the book has a unique number.  The first three numbers

 

are the subject of the book, and the rest of the numbers are used to denote

 

more specificity about the subject. 

 

          The decimal is essentially limitless in the number of levels that could

 

be used.  However, the use of only three decimal numbers for each level in

 

the hierarchy is limiting as collections grow, since only 999 entries

 

can be represented by each hierarchic level. Also, if the number gets too

 

long, then locating and reshelving the book can be time-consuming.

 

          Charles Cutter developed Rules for a Dictionary Catalog in the 1890's. 

 

He used the letters A-Z to denote the subject and then used numbers to

 

denote locality.  Additional lines and codes were used to make the code

 

unique.  These codes are also used to locate a book on a shelf.  This

 

approach is also limiting but was the basis for the Library of Congress

 

Subject Headings.

 

          International Standard Bibliographic Description (ISBD) established

                                                                                               

standards for form and content for monographic publications.  The

 

descriptions were put onto the surrogate records to identify that the

 

publication existed.  These standards eventually expanded to include

 

                                                                                                Shults 15

 

electronic resources.  The elements specified are Title and Statement of

 

Responsibility, Edition, Type and extent of resource area, publication,

 

physical description, series area, notes area, and standard number and terms

 

of availability.   These standards were the basis for establishing the Anglo-

 

American Cataloging Rules (AACR), now in its second revision (AACR2). 

 

The AACR2 is continuing to undergo revisions to include electronic

 

resources. 

 

AACR2

 

          "Anglo-American Cataloging Rules 2nd Revision (AACR2) is the set

 

of rules used for collecting bibliographic data relating to library materials

 

and for formulating access points (for authors, titles, subjects, related works,

 

etc.)" (Randall).  Initially, these standards were designed for textual material. 

 

There are revisions being made to the rules to accommodate a broader range

 

of catalogable units.  "The Joint Steering Committee for Revision of the

 

AACR is in the process of revising the code to enhance rules for

 

international use and to facilitate the cataloging of Internet and other types

 

of electronic resources"(Hawkins). 

 

          The AACR2 has an extensive list of rules for what elements are

 

needed on the surrogate record and what the element means.  It also defines

 

                                                                                                Shults 16

 

the MAchine Readable Code (MARC) which should be used to identify the

 

pieces of information.  The MARC tags are read by the computer program to

 

display the records correctly on the screen and to compare search terms. 

Dublin Core

          This set of standards was developed specifically for Internet

information.  "The Dublin Core metadata element set is a standard for cross-

domain information resource description. Here an information resource is

defined to be 'anything that has identity'.  There are no fundamental

restrictions to the types of resources to which Dublin Core metadata can be

assigned"(Dublin Core).  The Dublin Core metatdata records are displayed

on the Internet and typically stored using Hyper-Text Markup Language

(HTML). 

          There is work, currently, on designing Dublin Core templates for

authors of documents to fill in.  The HTML is automatically created and the

surrogate record is added to the database.  This process will free up the

librarian's time and will get the author's document available to potential

readers much more quickly.  "One of the major reasons for moving towards

author-described resources with metadata is to try and provide more

effective indexing services for the public"(Ianella & Waugh).

 

                                                                                                Shults 17

          There are 15 Dublin Core elements which can be used and 8 extra

elements for more detailed description.  Some of the elements are

repeatable; some can have only one value.  All of the elements are optional. 

The 15 Dublin Core elements are:

          Title

          Author/Creator

          Subject and keywords

          Description

          Publisher

          Other Contributors

          Date

          Resource Type

          Format

          Resource Identifier

          Source

          Language

          Relation

          Coverage

          Rights Management

 

          Less well known and more specialized metadata standards identified

 

by Iannella and Waugh are "The Australia New Zealand Land Information

 

Council (ANZLIC) and The Platform for Internet Content Selection (PICS)." 

 

Some of these standards can be used in conjunction with each other, but it is

 

possible that a decision has to be made as to which standard takes

 

precedence.  "There are already many metadata standards and more will

 

undoubtedly be created, which will lead to the situation where a resource

 

                                                                                                Shults 18                                                                                                    

will be described by two(or more) sets of metadata attributes.  What happens

 

if the two sets have contradictory information?"(Ianella & Waugh)

 

          Using HTML as the representational mechanism for the content that

 

results from the cataloging process has limitations.  This is due to the fact

 

that HTML, unlike XML, does not include a generalized mechanism for

 

associating metadata tags to variable content.  In addition, as the above

 

discussion illustrates, HTML lacks any capability for representing

 

‘inheritance’ relationships between data items.  If XML were used in place

 

of HTML by all of these cataloging standards proposals, then it would be

 

possible to create a universal access method that can automatically convert

 

catalog entries to a standard format.  Without the use of XML, such an

 

access method is far more difficult to create.

 

Control

 

          Libraries use bibliographic, authority and access control to help

 

maintain order and maintain some degree of quality in their catalogs. 

                                                                                               

Bibliographic control is used to maintain order.  "The central mechanism of

 

control in the modern library is the bibliographic catalog.  The central

 

function of the catalog and the bibliographic records contained in it is to

 

 

 

                                                                                                Shults 19

 

systematically and predictably describe, control, and provide access to

 

identifiable units of information"(Pitti).

 

          This means not only maintaining the metadata but also keeping up-to-

 

date on the status of the physical document.  The catalog should know if the

 

book has been acquired, if it is available to view, when it is due back, etc.

 

          Conventional mechanisms for maintaining this element of the catalog

 

rely on check-in/checkout procedures.  They cannot address the issue of a

 

physical item being moved about within the shelves by users of the library. 

 

Very inexpensive technology using radio frequency known as RFID is now

 

becoming available that would allow every item in the collection to be

 

directly locatable regardless of where it has been physically placed.  RFID

 

devices will soon become so inexpensive that it will be economically viable

 

to place one in every book, magazine, or even newspaper that a collection

 

contains.  There is some controversy surrounding the use of RFID, but from

 

what I have read, privacy of the patron is not compromised. 

                                                                                               

          Another type of control is authority control which maintains the

 

quality of the document.  "Archivists and librarians use authority control

 

files to identify real world entities such a people, institutions, corporations,

 

and societies and the name or names by which they are known"(Pitti).  For

 

                                                                                                Shults 20

 

subject control, the library might use the Library of Congress subject

 

headings. 

 

          The authority control for subjects would be the Library of Congress. 

 

If an author began his writing under one name and then changed to another,

 

a search on either author name would bring up both.  "Authority control thus

 

operates over and above the catalog, bridging bibliographic records by

 

gathering works by and about an author under that author's name, and works

 

about a subject under the name of the subject, and each with references from

 

other forms of the name if such exist and are discovered"(Pitti).

 

          It is possible for a library to be its own authority control for author

 

names and subject headings.  If it's a small library and is not a part of a

 

union catalog, it might make sense to avoid the seemingly endless selection

 

of subjects in the Library of Congress subject headings.  Otherwise, it is

 

important for consistency’s sake for all catalogers using the same database to

 

use the same subject headings and the same name authority file.  After all,

 

why reinvent the wheel?  

 

          "Access control is the process of exchanging data and information in a

 

secure and authoritative manner once authentication has taken

 

place"(Morgan).   Access control maintains some semblance of order over

 

                                                                                                Shults 21

 

who can access the library's materials.  First, the user needs to be

 

authenticated, and then they can access information.  Library cards are one

 

way of maintaining access control.  Once a person can prove they live within

 

a particular county, they can get a library card for that county and check out

 

books from that county's libraries only. 

 

          Computers demand user names and passwords to access documents,

 

systems, and the like, online.  An authority figure determines that the user is

 

valid, and they give him the codes he needs to access the online systems or

 

documents..  "A good authentication system will prevent outsiders from

 

violating confidentiality and data integrity policies"(Steinke).

 

Classification

 

          "Every system of grouping books is based upon resemblances or

 

likeness"(Merrill).  This is known as classifying, another principle of

 

organization.  The reason for classifying is to bring like books together.  If a

 

user is looking through books on a shelf on a particular topic, he wouldn't

                                                                                               

have far to go to find more books that are related.  The retrieval system in an

 

OPAC needs to do the same thing for electronic resources.  According to

 

William Stetson Merrill, the choices for classifying are: size, date, language,

 

 

 

                                                                                                Shults 22                                           

binding, literary form, subject, local treatment, and persons for whom

 

written.      

 

          Size.  "The simplest mode of classifying books is by size.  As a

 

                   general arrangement for a public library, such an arrangement is           

                   never used nowadays."  Arrangement by size alone                   

                  

                   would make for a time-consuming search.  I can't imagine a

 

                   scenario where this would be efficient.

 

          Date.  "A second and usually a simple mode of arranging books is by

 

                   date.  This, again, is not a usual classification of

 

                   books."  This type of classification could make sense in

 

                   a very specialized library where all the books are essentially of

 

                   the same subject.

 

          Foreign Language.  "In popular libraries it is not unusual to arrange

 

                   books in foreign languages in classes by themselves, calling

 

                   them French books, German books, and the like."

                    I have seen the foreign language books classified together by

 

                   audience.  For example, picture           books in Spanish are classified

 

                   together in the children's area     while the non-fiction Spanish

 

                   books are classified together in the non-fiction area.

 

                                                                                                Shults 23                        

          Binding.  "Fine specimens of binding may properly be arranged

 

                   together in a bibliographic museum or in a exhibition of library

 

                   treasures." 

 

          Literary Form.  "Classification by literary form is common enough in

 

                   every system of classification.  Encyclopedias are usually

 

                   placed with other works of general reference at the beginning of

 

                   the classification."

 

          Subject.  "A classification based upon this feature of a book is indeed