Danielle Carlock

I give permission for my work to be published in the SIRLS LIS Learning Showcase


The catalog and the information organizing principles surrounding it: present trends and challenges

        The catalog, as a way to provide access to the holdings of a library, has been an integral part of the profession of librarianship from its earliest days. Although the format of the catalog has changed from print-based, card format to computer format (OPACs), and continues to evolve, its functions remain the same. However the sweeping changes in the format of catalogs, and in the information organizing principles surrounding it, are changing the tasks of the librarian. Most of these changes are driven by the technological innovations the internet has wrought in the areas of electronic communication and publishing. This paper will explore not only the changing nature of the catalog itself, but the information organizing principles that go into creating catalogs. The new internet technologies have also made possible new forms of the library. Digital libraries, online repositories of information, now exist and continue to expand. The challenges of these newly emerging entities will be discussed.
        When attempting to describe the functions of the catalog, most librarians still call on Cutter’s enduring work. Cutter’s “objects of the catalog” (1904) include: to ensure the user can access any work of which he/she knows the author, title or subject, to show the library’s holdings by author, by subject, or by type of literature, and to assist the user in the choice of a book by edition or character. Today’s catalogs, the OPACs are designed based on Cutter’s principles. According to Taylor (1999) the catalog also serves as an inventory of the library’s collection, which librarians access to determine their own holdings when completing collection development tasks.
        From surveying the library literature, it appears most librarians agree that the purposes of the catalog remain the same since Cutter’s time; however the specifics of how to meet user’s information needs through them is subject to controversy. (Osbourne, 1941; Theimer, 2002; Borgman, 1996; Cochrane, 2000; Buckland, 1997)
        One controversy surrounding the development of catalogs is which “theory of cataloging” to pursue. According to Osbourne (1941) there are 4 theories: legalism, pragmatism, bibliographic cataloging, and perfectionism. Those of the legalism vein believe that rules must be devised for every possible contingency; however Osbourne argues that this leads to a never ending process of revising rules, thus diminishing productivity and increasing costs. Perfectionists would like to catalog a work so that it remains good for all time, however Osbourne argues that this is never the case- history shows that revisions always occur. Bibliographic cataloging, in which a catalog is a type of bibliography, leads to too much detail in the catalog descriptions, says Osbourne. Pragmatism, in which each library conducts cataloging based on its own needs, argues Osbourne, is the preferred theory. He proposes three levels of cataloging: standard, simplified, and detailed. Each library should choose which one to use based on its needs. This sounds a lot like the concept of exhaustivity, and the controversy surrounding whether to conduct summarization or depth indexing. Although Osbourne was writing more than sixty years ago, his ideas seem relevant today. It can be argued that the theory of cataloging followed by a library will determine in many respects the nature of its catalog.
        In order to create an inventory or catalog one must carry out several processes. The item or unit to be cataloged (monograph, journal, video, etc) must be first described through the process of descriptive cataloging, following certain rules and procedures. The item must also be analyzed as to its subject(s) through determining “aboutness.” Depending on the classification scheme to be used (DDC, LCSH, etc), the cataloger assigns call numbers to the item. Finally, the data generated from these processes is turned into a surrogate record, which serves as a representation of the item itself. Finally the surrogate record must be encoding into a format in which it will be displayed (MARC, etc). 
        Most if not all of the processes described above are in a state of flux. This is due to many factors: the sheer increase in information and therefore items to be cataloged, an increase in the kinds of formats available, and ever changing technology. The changes and controversies surrounding descriptive and subject cataloging, encoding and metadata standards, and authority control will be discussed next. 
        Before an item can be cataloged, it must first be decided what the item actually is. This seems an easy and obvious task, but not so upon close examination. Should a two volume work be cataloged as two separate units or as one? Should a series be cataloged as a one unit or as separate ones? Electronic resources are even more difficult to work with in this regard. Should a website be classified as one unit, or should each individual page be cataloged separately?  (Taylor, 1999) 
        Once the unit of analysis is determined, the process of describing it can commence. The AACR2r and the ISBD standards are the most widely accepted rules for describing items to be cataloged (Taylor, 1999). However new standards are being developed to deal with new formats that have arisen primarily due to the surge in electronic materials and the world wide web.
        The Dublin Core is one of the most prominent standards that has arisen to describe WWW documents (especially in Europe) (el Sherbini, 2001).  However there are many other standards under development, leading to what Milstead and Feldman (1999) call the “metadata dilemma.” Most of the new standards have arisen because of the need to describe specialized information packages. For example the FGDC standard has been developed for geospatial data, the VRA for visual resources, GILS for government publications/information/services, and the MPEG 7 was created for multimedia files (Taylor, 1999; El Sherbini, 2001; Milstead & Feldman, 1999). Some authors have expressed concern that the proliferation of  metadata standards may be creating chaos and confusion. El Sherbini (2001) believes it possible that multiple records will be created for the same resource under different metadata standards, which would confuse users. In addition, the need to create extensions to the standards would further confound crosswalks. Milstead and Feldman (1999), while acknowledging the need to create metadata standards that are customized for specialized resources, call for a metadata registry to keep track of all current proposals. It is unclear at this time what the outcome will be of multiple standards, whether they will improve access or lead to  chaos and a lack of interoperability.          
        Metadata standards that describe the content of resources, as discussed above, are not the only standards in flux. Encoding standards, those which create the “containers” in which content is placed, are also undergoing considerable change. (Taylor, 1999) The MARC format, which is perhaps the first encoding standard, has reigned supreme for decades. It has served the needs of the library well, by acting as standard format in which to create surrogate records for all kinds of resources. However some argue that the time for MARC’s reign has come to an end, while others continue to argue its merits. According to Mayes (2003) MARC is limited because it is not interoperable with HTML, SGML, or HTML. Instead she proposes a switch to an XML schema developed by OCLC that preserves many of the MARC elements. Kokabi, (1996) while supporting the MARC format, acknowledges some weaknesses of the standard, including the fact that some OPAC’s can’t handle MARC format and that MARC inherited the flaws of the card catalog. However Kokabi believes that MARC will not go away anytime soon, since it has proven to be stable and effective, and allows copy cataloging. In addition, many developing countries are just beginning to adopt MARC. 
        Pitti (1995b) favors a changeover from MARC to SGML because it is used by a wider community. While only librarians use MARC, SGML is a standard for markup languages that is used by people in many communities including government and industry. Tennant (2002a) suggests that MARC exit strategies be adopted immediately. The reasons for his view are many: MARC is inflexible, difficult to read, and used only by libraries. Therefore the choice of software that libraries can use is limited to vendors that make MARC compatible programs (Tennant, 2002b). With these serious criticisms of MARC and the increased interest in the markup languages, it will be interesting to see how many library systems convert over to new formats.  
        After an item is described using a metadata standard and an encoding standard, it must next undergo subject cataloging. During subject cataloging, the aboutness of an item is decided and subject headings and classification numbers are assigned. (Taylor, 1999) Determining aboutness continues to be a tricky task. Merill (1912), writing almost a hundred years ago addresses concerns that are still with us today. For example, he addressed the question of how a book about two or more subjects should be treated, as well as other matters in subject analysis.  
        While determining aboutness the cataloger must assign subject headings based on the subject heading list or thesaurus (for example LCSH, Sears, MeSH) in use by the library. Then the work is assigned a place in the library’s classification scheme based on its assigned subject(s). Like many other aspects of cataloging, this process has also been fraught with controversy. For example, how deep in the classification scheme should the cataloger go? Should he/she classify the work based only on where it falls within the main classes of the scheme? Or should a deeper level of classification be pursued, perhaps down to the finest subdivisions that are available? (Taylor, 1999) Some things to consider in choosing between broad and close classification is to what degree collocation is desired. If the library is very large and only broad classification is carried out collocation will be minimal; works will be classed near other works with which they only share a broad relationship. However in a small library broad classification may be sufficient to accomplish collocation (Taylor, 1999).
        Another process that goes into producing a library catalog is authority control. Authority control is a process in which authorized forms of names and titles and authorized terms for subjects are chosen and maintained for use in resource descriptions. (Taylor, 1999) This allows for standardization; in theory a library user should be able to type an unauthorized version of an author name and be directed to the authorized version, and hence all the materials written by said author. The same should be true for subject or title searches. 
        Some authors express the merits and need for the continuation of authority control. Others are of the opposite view, declaring that authority control does not do what it claims to do. Ayres, (2001) while supporting the idea of authority control, claims that libraries are not living up to it. Instead, he claims, users are missing materials held by the library when searching the OPAC because of a lack of consistency in authority control. Using the Library of Congress OPAC to support this claim, he shows that some searches do not yield all the information contained in the library. Searching Dostoevski (an unauthorized variant of Dostoevsky) yields only 5 out of the 329 holdings in the LC. Other unauthorized versions of the name also yield incomplete results. He also supplies seven examples of subject searches which yield results inconsistent with proper authority control. Ayres' main concern is that libraries are touting authority control as a feature that sets them apart from Internet search engines, yet consistent authority control is not being delivered.
        Some other authority control difficulties were discovered by this author when completing practice exercises in OCLC. When searching for the author Juan Diego, no results came up under J. Diego. The same was true when searching for Ruth Underhill as R. Underhill. However just searching under Diego or Underhill did yield the correct results. Vocabulary control for titles was also problematic. When searching for the title Navajo Peyote Ceremonial Songs, I found that a search under Ceremonial Songs Dine did not yield the title. Since the Navajo call themselves the Dine, the two names should be cross referenced. A Dine person searching OCLC would probably search under Dine, since that is the culturally appropriate term. However, the search will not yield results that are classed only under Navajo.               

        Jeng (2002) argues that authority control is a costly, unnecessary measure. Users don’t mind sorting through search results which include a lot of unrelated materials, he claims. In addition, says Jeng, the more controlled the vocabulary is the greater the knowledge base the user must have about the system in order to complete a successful search. In contrast, argues Jeng, indexing and abstracting services are serving the user well with little authority control, but very flexible interfaces that give users a lot of control in designing their searches.
        Jackson (2003) counters these argument against authority control. He says it is needed more than ever as the amount of information continues to grow. He claims automated authority control can be done efficiently. As one example, he cites the conversion of 600 subject headings beginning with Afro-American to African American completed overnight in his library.
        Another authority control controversy concerns whether the process should be outsourced. Jackson (2003) and others claim that vendor supplied authority control can be done successfully. Aschmann (2003) discusses the outcomes of vendor supplied authority control at Virginia Tech. Authority control was indeed improved, however the expected decrease in staff time did not follow. Instead a permanent team had to be formed to carry out quality assurance.
        It is not clear how the controversies surrounding authority control will flesh out. It seems difficult to imagine that libraries will do away with the idea of authority control completely, as suggested by Jeng. However, universal authority control has certainly not been accomplished, as illustrated by Ayers.
        Once an item is described using a metadata standard, encoded with an encoding standard, analyzed as to subject, classified by subject headings, assigned a call number, and had authority work carried out on its access points, it needs to become part of the library’s inventory- become part of its catalog.
        The card catalog was the predominant form of the catalog for the greater part of the 20th century. Not until the 1960’s did a computer generated catalog come into play, and it was in a limited fashion. The computer generated a microform readable catalog; however this version of the catalog was unpopular because of the difficulty of working with microform. OPACs or online public access catalogs first introduced in the late 1970’s have now become the dominant form of catalog. At first they were readable from computers only at the library through CD-ROM. Then as the internet grew, OPACs became accessible through gophers and telnet, and finally through a WWW homepage allowing patrons to access the catalog from their homes. (Taylor, 1999)
        Today’s OPAC’s, despite their convenience and greater accessibility, have their critics. According to Theimer (2002) OPAC’s are not designed for 21st century users who are versed in the internet search engines, and usually expect the OPAC to respond in the same way as do the search engines. Many internet search engines forgive misspellings while most OPAC’s do not.
        Borgman (1996) posits that the design of OPAC’s does not match the information seeking behavior of users at all. The catalog is based upon Cutter’s principles, yet most library users do not come to the library actually knowing one of the three access points (even Jackson writing in 1958 noted that card catalog users rarely came with complete bibliographic information). In addition, the user must know library terminology to some degree in order to successfully execute a search. This includes, for example some knowledge of subject headings. In today's culturally diverse world, word choice is increasingly diverse.
Library of Congress subject headings (and other classification schemes) may not adequately take this into account. 
        Cochrane (2000) suggests that OPAC keyword searching should be improved. As is stands now, the burden of the search is placed completely on the user, with no assistance provided in finding synonyms or related terms. Keyword searching can be integrated with controlled vocabulary searching to overcome this difficulty.
            Other authority control issues impacting the OPAC came to light during Bowman’s study (2000). Thirty eight OPAC’s were author searched using five variants of the name Leonardo da Vinci. Cross- referencing to the authorized version of the name was not found in all cases. The use of corporate names were also found to be inconsistent across the OPAC’s studied.
This was particularly true with subordinate names, such as Atomic Energy Commission. The same results were not obtained when searching under Atomic Energy Commission and U.S. Atomic Energy Commission.
          Many of these findings support Spanhoff’s (2002) contention that OPAC’s find, but do not gather well. Some solutions that have been put forward to improve OPAC’s include: forcing users to view a definition of terms (for subject headings), so that they can be sure it agrees with what they are looking for, automatic spelling correction, and the inclusion of author browse which allows users to be directed to the authorized name. (Cochrane, 2000; Bowman, 2000; Borgman,1996; Theimer, 2002)
        The OPAC is also being influenced by the new technologies of the internet. Because of the explosion of online materials, Medeiros (1999) recommends that libraries develop OPAC representation policies which lay out exactly what online materials will be included in the OPAC.
        Concern that the internet is replacing the library as the place that people turn to for information has prompted several authors to make suggestions for how to deal with this situation. Medeiros (1999) suggests MARC records be converted into XML which would allow OPAC content to be searched by internet search engines. Casciato (1999) defends the design of OPAC’s vis á vis the Amazon.com model. The OPAC is objective and not commercially driven whereas the Amazon catalog is highly subjective, allowing anyone to upload a book review. He believes that the OPAC (and the service of cataloging as performed by librarians) will remain viable because of its demonstrated authenticity and authority.
        Another future direction the OPAC could take is conversion to an online bibliography. Compared to catalogs, bibliographies are known for their greater variety of records and more detailed indexing. An online bibliography could include all the materials considered important in a particular field, with links to holdings information (of libraries chosen by the user). Such a bibliography or bibliographies could be a collaboration between many institutions. Another aspect of this concept is access to detailed access information. As the user finds an appropriate title, he/she could click on local holdings and discover not only if the title can be found at the location in question, but also obtain the call number and circulation status of the item. (Buckland, 1997)
        New computer technologies have led to the possibility of not only new futures for OPAC’s but have also spawned a new form of library, the digital library. A digital library can be defined as a “computer-based system for acquiring, storing, organizing, searching, and distributing digital materials for end user access” (Sharma and Vishwanathan, 2001). Digital libraries can be open access (freely available to all) or they can follow the restricted access model. Some digital libraries are sponsored by libraries (for example the California Digital Library sponsored by UC at
http://www.cdlib.org/), or they may be run by other organizations (such as netLibrary, a commercial venture at http://www.netlibrary.com.) Despite these differences, digital libraries face similar organizational challenges.
        Various protection issues arise in the online world. The digital library’s materials are vulnerable to viruses, and as yet no comprehensive solution has been found to this problem (Sharma and Vishwanathan, 2001). If the library is to be for authorized users only, seamless, cost effective, controlled access systems must be incorporated into its design. (Pope, 1998). The intellectual property rights of contributors must be protected. The system must be set up so that resources in the library cannot be tampered with, thus ensuring their authenticity (Pope, 1998; Sharma and Vishwanathan, 2001).
        The future requirements of the library must be considered in its design. The system must allow resources to continue to be added in a variety of formats (even in new types of format that cannot be anticipated yet). Long term preservation of the materials must also be considered (Greenstein, 2001; Pope, 1998). This includes a consideration of how the system should be designed to accommodate archival functions, how the costs will be handled, and how the decisions as to which materials will be kept should be decided.
        Sharma and Vishwanathan (2001) raise some interesting questions regarding equity, social justice, and digital libraries. Because so many developing countries do not yet have the technology (or enough of the technology) to support digital libraries, the development of the digital library may increase the gap between the haves and have-nots. In addition, digital library collections are primarily written in the five major world languages: English, Chinese, Hindi, Russian, and Spanish. Those who speak other languages will be cut off from accessing digital library collections. Another aspect of equity that must be considered when designing digital libraries is that of access for persons with disabilities. This is a complex issues because different types of disabilities require different types of modifications (Pope, 1998).
        Another challenge facing digital libraries is that of maintenance. Traditional libraries have had centuries to work out the issues involved in maintaining their collections. Digital libraries are too new to have done this (Sharma and Vishwanthan, 2001). In addition, the maintenance needs of digital libraries may exceed that of traditional libraries (Ackerman and Fielding,1995).
        Greenstein (2000) argues that standards and best practices should be developed. He especially calls for the development of benchmarks that would allow users to evaluate the digital library. Collection development policies that take into consideration the costs, benefits, and values of different types of resources should also be developed.
         A further challenge of digital libraries is that of deciding how to handle peer review. Pitti’s vision of a digital community (1995a) proposes a solution to this dilemma. Researchers can publish their works online, and once endorsed by the relevant scholarly organization, a link can be created to that work from an online bibliography. Then anyone using the bibliography can access the peer reviewed and approved work. The digital community proposal also includes the idea of allowing scholars private spaces in which to add annotations to collections, and then, if desired to move these items into public spaces. This would create an informal learning and research community. Although digital libraries face many organizational challenges they possess substantial potential for reaching a wide audience and for transforming the world of information.
            The process of cataloging  has been discussed from the viewpoint of the trends and challenges surrounding it. The nature of the catalog is under question, with recommendations on how to transform it to better conform to the digital age. The new internet technologies are also impacting how surrogate records are created, in terms of both encoding and metadata standards. Despite changes due to technology, many age old cataloging controversies remain, such as which cataloging theory to embrace and whether to practice broad or close classification. Enshrined practices, such as authority control have come under fire, while at the same time garnering support. The digital library made possible by the internet faces its own unique challenges. As can be seen then, the field of cataloging while preserving many of its traditions, is also in the process of embracing many exciting changes. 

  

     

References

Ackerman, MS & Fielding, RT (1995) Collection maintenance in the digital library. In Proceedings of Digital Libraries, Austin, TX pg 38-48. Retrieved 11/25/2003 from <http://csdl.tamu.edu/DL95/papers/ackerman/ackerman.html>

Aschmann, Althea. (2003) The lowdown on vended supplied authority control at Virginia Tech. Technical Services Quarterly 20(3):33-44.

Ayers, F.H. (2001) Authority control simply does not work. Cataloging and Classification Quarterly 32(2):49-59

Borgman, C (1996) Why are online catalogs still hard to use? Journal of the American Society for Information Science 47(7):493-503.

Bowman, J.H. (2003) The catalog as barrier to retrieval Part II: forms of name. Cataloging and Classification Quarterly 30(4): 51-73. 

Buckland, M. (1997) Bibliographic access reconsidered. In Redesigning library services: a manifesto. Accessed 12/1/2003 at  <http://sunsite.berkeley.edu/Literature/Library/Redesigning/bibaccess.html>

Casciato, DC (1999) Authority and objectivity in a time of transformative growth: the future of the library catalog. Library Computing 18(4):295-300.

Cochrane, Pauline. (2000) Improving LCSH for use in online catalogs: what progress has been made? What issues remain? Cataloging and Classification Quarterly 29(1-2):73-89. 

Cutter, Charles A. (1904) Rules for a dictionary catalog, 4th Edition.  Washington: Government Printing Office. 

Greenstein, Daniel (2000) Digital libraries and their challenges. Library Trends 49(2): 290-303. 

Jackson, Richard V. (2003) Authority control is alive and.... well?  OLA Quarterly 9(1):9-12. 

Jackson, Sidney L. (1958) Catalog Use Study. Chicago: ALA.

Jeng, Ling Hwey. (2002) What authority? What control? Cataloging and Classification Quarterly. 34(4):91-7

Kokabi, Mortaza (1996) Is the future of MARC assured? Library Review 45(2): 68-73.

Mayes, Bessie (2003) Beyond MARC: New trends for the library of the future. Oregon Library Association 9(1): 2-4.

Medeiros, Norm (1999) Driving with eyes closed: the perils of traditional catalogs and cataloging in the internet age. Library Computing 18(4):300-306.

Merrill, William S. (1912) A code for classifers-its scope and problems. Library Journal, May 1912: 245-310.

Pitti, Daniel (1995a) Settling the digital frontier: the future of scholarly communication in the humanities. Accessed 12/1/2003 at <http://sunsites.berkeley.edu/FindingAids/EAD/dpitti.html

Pitti, Daniel (1995b) SGML and the transformation of cataloging. The Serials Librarian 25 (3-4): 243-253.

Osbourn, Andrew (1941) The crisis in cataloging. Library Quarterly 11(4): 393-411.

Pope, Nolan (1998) Digital libraries: future potentials and challenges. Library Hi Tech 16:3-4:147-155.

Sharma, R.K. & K.R. Vishwanathan (2001) Digital libraries: development and challenges. Library Review 50(1): 10-15.

Spanhoff, Elizabeth de Rijk (2002) Cataloging paradigms: old and new. Cataloging and classification quarterly 35(1-2): 37-59.    

Taylor, Arlene (1999) The organization of information. Englewood, CO: Libraries Unlimited.

Tennant, R (2002a) MARC exit strategies. Library Journal 127(19):27-8.

Tennant, R (2002b) MARC must die. Library Journal 127(17):26-28.

Theimer, Sarah (2002) When a 21st century user meets a 20th century OPAC: how word choice impacts search success. PNLA Quaterly 66(3): 11-25.