“Digitizing and organizing
information in Libraries, an overview.”
By: Samer Alshawwa
IRLS 501 Final Project
I
give permission for my work to be published in the SIRLS LIS Learning Showcase.
Introduction:
In this project I intent to present an overview of information organization in libraries by digitizing catalogs and metadata. The information I include in this project is influenced by the reading I have been doing. This project will be more of an overview of the need of digital libraries and information organization than a critique. I will attempt to cover the different elements of the process of organizing information from an information professional view due to my humble and short exposure to library sciences. Furthermore, I will include other suggested methods of preserving catalogs and information in libraries. I hope this paper will be of benefit to others.
_________________________________________________________________
In many
respects the invention of the MARC record and related standards has
been the most important event in librarianship and bibliography
since the Library of Congress began its catalog card distribution
service early in this century. It has enabled the creation of
immense multinational bibliographic databases for scholars and
researchers; it has allowed libraries to make use of automated
support for most basic library functions, such as cataloging,
acquisitions, and online public access catalogs. It proved the
value of standard protocols and content guidelines in promoting the
sharing and processing of information. And it put libraries,
archives, and others at the forefront of the electronic
information revolution.
But, in most respects,
libraries are no longer on the forefront of that revolution.
The electronic information environment has exploded outside of
libraries in ways that we're all too familiar with, and yet incapable of
really understanding the dimensions of. New information
technology is now happening, for the most part, in the burgeoning
private information industry, in computer science departments, and
in scientific research centers.
That this has happened is
of course overwhelmingly positive. Looked at one way, librarians
will no longer need to continue to invent all our own standards and
protocols and database systems from scratch.
Better-capitalized and far more innovative groups are now taking
care of that for us. How true this is can be illustrated
by considering the database search and retrieval systems that
libraries and the undercapitalized library automation industry have
created for us and our patrons. We have been working on our
library public access catalogs for over twenty years now, and what
search and retrieval techniques have we implemented?
Implemented search and retrieval techniques Simple author searching Simple
title searching Combined partial author-title searching Simple
subject searching Keyword-Boolean searching
Yes, after twenty years,
the major new retrieval technique we've made available to users of
library catalogs, at least, is keyword-Boolean searching. One
might say that since much library cataloging data has the great
advantage of controlled name and subject vocabularies, created at great
expense, that little more in the way of retrieval technology was needed.
Unfortunately, there has
been ample evidence in the literature and in practice that shows
that it has not made it nearly easy enough in our online systems to
use our own subject thesauri and classification schemes; nor have
created the functionality that would truly allow data to be used
interactively to excavate all the "intelligence" we have
built into our databases.
The purpose of a digital library is to
provide coherent organization and convenient access to typically large amounts
of digital information. The following principles provide working definitions of
a digital library from both a conceptual and a practical standpoint:
A digital library is an integrated set of
services for capturing, cataloging, storing, searching, protecting, and
retrieving information.
Digital library services bring order where
data floods and information mismanagement have caused much critical information
to be incoherent, unavailable, or lost. Digital library architecture emphasizes
organization, acquisition, preservation, and utilization of information.
Digital library systems are realizations of an architecture in a specific
hardware, networking, and software situation.
Digital library systems compose a family of
automated systems that together provide a comprehensive capability to manage
the digital content of an enterprise. It is useful to divide the capabilities
of digital library systems into the following areas:
capture or creation of content, indexing and
cataloging (metadata), storage, search and query, asset and property rights
protection, and retrieval and distribution.
Content exists in multiple sizes, formats,
and media, each with accompanying technical challenges. Content may be
structured or unstructured. It may have exact, precise meaning; or it may be
fundamentally ambiguous. Content may directly or indirectly support a business
process or function.
A digital library architecture shows how
capabilities are realized and related, and does this at several levels. Digital
library architectures show how business processes or functions are enhanced;
they show how technology components fit together and how, in detail, components
interoperate with each other.
Such functions and relationships, when
reduced to a particular software and hardware implementation, lead to
operational digital library systems.
Digital library functions, insofar as they
purport to organize information, may be compared with traditional library
functions. Consider digitization, which technically is the conversion of analog
to digital formats. A common human artifact, such as a bound book, loses value
when simply scanned into bits. In a library context, where organization,
access, protection, and preservation are important business functions, digitization
technologies are starting points for a complicated set of computational
processes that in the first instance reconstruct the cultural, conventional,
and intuitive significance, structure, and external relationships that defined
the original artifact. Additionally, digitization and other processes may be
able to add value and support certain fiduciary responsibilities that resemble
functions of traditional libraries.
In a similar way, other core capabilities of
traditional libraries can be transposed to the digital domain. Cataloging is
transposed to the generation of metadata, and is an area where much work needs
to be done to develop automated, multidimensional indexing and cataloging
procedures. Just as the public card catalog is a gateway to the holdings of a
conventional library, search of content and metadata is the gateway to a
digital library. Circulation in a conventional library transposes to network
access, retrieval and delivery.
The fiduciary responsibilities of traditional
libraries are related to issues of copyright protection and intellectual
property rights. The table below relates digital library capabilities to
well-known capabilities of traditional libraries. The point is that traditional
libraries have established uniform business processes and highly interoperable
data formats which support especially bibliographic catalogs, item ordering,
and interlibrary loan. Although many of these procedures pre-date
"digital" libraries, digital library design can benefit from the
comparisons.
|
Comparison of Digital and Traditional
Library Capabilities |
|
|
Digital Library Capability |
Traditional Library Capability |
|
Capture |
Acquisitions and collection development |
|
Catalog and Index |
Cataloging rules and bibliographic control |
|
Store |
Stacks, inventory management and shelf
lists |
|
Search |
Public
card catalog |
|
Protect |
Patron privileges and circulation rules
consistent with public law and policy |
|
Retrieve |
Loan management and interlibrary loans |
Having made these comparisons, it must be
emphasized that in the United States the digital library is not regarded as a
technology related to library automation or the provision of integrated library
systems for operating traditional libraries.
Library functions
in the online catalog are now integrated. This linking of library software for
different processes, such as circulation, cataloging, the catalog, and other
databases, is in turn dependent on the connectivity of hardware hard drives,
CD-ROM drives, local area networks, and the Internet.
Integration of
Technical Services.
In earlier systems,
the integration of the circulation function within the catalog was considered an
innovation. School library patrons in the '90s were quite familiar with the
automated catalogs they used in both school and public libraries where they may
handle their own check-ins, check-outs, and reserves. Cataloging used to be
handled off-line, but it has become more common to create a bibliographic
record within the catalog system, or to search for it on a CD-ROM database or a
Web site and download it to the local catalog. The Library of Congress, state
networks (Texas Library Connection, Florida SUNLINK, etc.) and bibliographic
networks such as OCLC are some of the MARC (Machine Readable Cataloging)
databases that are available on the Internet. The online state networks will
also offer interlibrary loan (ILL) services. Acquisitions (orders) and serials
may also be integrated into the school's management system. The streamlining of
these technical services translates into improved access to information for the
library's clients.
Integration of
Public Services.
More visible to the
catalog user than the integration of enhanced technical services is the
integration of reference databases and the OPAC. The OPAC serves as an index, a
gateway to full-text information. The citation in the catalog leads to a
book-in-hand available in the local library or from another library via ILL.
More and more high school libraries are adding reference databases to the
catalog menu, accessed from a CD-ROM database or from the Internet. The
references may be periodical indexes or full-text databases, encyclopedias, or special
references. In the most recent developments, school libraries are putting their
catalogs on the Web so that users can access them from any location. School
media specialists also cataloged Web sites and creating direct links from the
bibliographic record (MARC tag 856) to the Web site address or URL.
Networked Hardware
and Software.
Early microcomputer
systems were limited to stand-alone circulation modules, and while it is true
that some school libraries have not progressed beyond that stage, or are not
automated at all, the majority has several kinds of systems. A biennial survey
of School Library Journal subscribers reports that figures for 1995-96: online
catalog, 60 percent; circulation, 77 percent; LAN, 66 percent;
telecommunications, Internet, and e-mail, 62 percent; and CD-ROM, 84 percent.
These technologies support the integrated functions just described.
Windows and UNIX
programs are becoming more evident in the school library environment. Mac
systems are still favored by a minority, but these different platforms can
co-exist in a network. As more schools have realized that the future is in
telecommunications, libraries have been included (or sometimes led the race) in
making plans for network development and Internet access.
User Interface.
Client-server
architecture and interface standards like Z39.50 make it possible for
microcomputer systems to access remote databases in the ways previously
described. Workstations in the local library will now have a GUI (Graphical
User Interface) that replaces most text commands. The elements of this
interface will include colorful screens with scroll bars, pop-up windows,
point-and-click menus, hot buttons for special reading lists or resources, and
maps of the library. Different command languages are offered; searching options
include browsing as well as keyword and Boolean. These bells and whistles are
being put to the test of facilitating the search process.
Searching the
Catalog.
The first
generation of OPAC development was characterized by a card catalog model with
some information system features. The second generation, which describes
current technology, improved the user interface in the ways just described.
Nonetheless, users are still running into some of the same searching problems
encountered in previous systems. These problems include difficulties in
spelling or keying in search terms, understanding commands, finding or
modifying search terms, using suitable search strategies, getting feedback, and
interpreting displays.
The Research.
Christine Borgman,
one of the major researchers in children's use of online catalogs, states that
catalogs are still hard to use because they operate as if the user has a fixed
information goal represented by an appropriate query. What really happens,
Borgman says, is that users formulate questions in stages and only gradually
come to the point where they can articulate a query. It was found that children
break down in searching the catalog according to their previous experiences,
and that developing search strategies is more difficult for them than learning
keying and commands. Borgman and others discovered that browsing is easier than
keyword searching.
There seems to be
general consensus among researchers that there needs to be more study of
user-information interaction. While most observers expect systems to become
more intuitive as the technology advances, it will be some time before such
systems are widely available. This suggests an important role for all
librarians to train users to search the catalog. It is not enough to teach
technical skills such as keying or semantic skills such as understanding
commands. The emphasis must be on teaching the information-seeking process.
Why are Libraries Digitizing? Some reasons are
due to Space. It may take less space to store collections
electronically, but the costs are high. Unlike off-site storage, you can't walk
away and come back in thirty years and expect to be able to read your converted
books. The infrastructure to migrate electronic documents reliably is not in
place and the costs and risks are high. Another reason is because everyone
else is. In an attempt to be able to say they are creating digital
collections some libraries are undertaking conversion projects without understanding
the resources it takes and without careful analysis in their choice of
collections. Developing internal expertise by carrying out exploratory
conversion projects can bring definite benefits to a library, but if this is
done without fairly broad-based institutional consideration and buy-in on the
decision of what collections to digitize, the drain of money and professional
time in such projects could easily derail other important programs. Some other
better and more logical reasons for going digital in libraries are; Electronic
access is a big part of our future. The Internet is remaking higher
education, as well as scholarly culture and communication. Libraries are
uniquely placed to participate in shaping that future so that it serves in the
best interests of research and instruction. Another reason is access.
Electronic access is in many ways an improvement. Virtual collections can pull
together disparate and large collections that couldn't be physically viewed at
one time and place. The ability to tap image databases and to integrate text
and images will enrich scholarship. Electronic journals with links to citations
offer efficiency. Conventional scholarly research will be enhanced by
electronic access to media collections. These materials, which have always been
difficult to access, can now be incorporated in research publications and
easily exchanged between scholars. Yet another reason is information
organization. Digital surrogates minimize handling of fragile materials,
but the imaging process is demanding and must be done with oversight by library
staff and with a high enough level of quality to ensure the reusability of the
archival electronic file for as long as possible. Another good reason for going
digital is new scholarly tools. While full text databases are not new,
image databases are an exciting application of electronic access. They draw
together images of different formats: objects, models, plans, in addition to
conventional images such as photographs and drawings, allowing scholars to
reference a broad spectrum of visual materials. Furthermore, the ability to
combine multimedia sources with print creates a different aesthetic and
intellectual experience. We are still in the infancy of electronic delivery,
but as the quantity and quality of electronic resources grow, we can expect to
see innovative applications and new ways of utilizing research materials.
There are currently many library departments
with an interest in managing digital conversion projects: systems departments,
academic computing units, and special collections. Each brings a different and
relevant form of expertise. We often hear that librarians shouldn't be doing
imaging because electronic files are not sufficiently archival to warrant
inclusion in the arsenal of preservation techniques. And this is currently the
case. In limited instances, however, it may be legitimate to think of digital
conversion as preservation. One such instance might be a black and white photo
collection of a non-unique nature, which is rapidly deteriorating and for which
sufficient funds for traditional film duplication do not exist. In this case,
the choice is between some loss of information, plus the risk of uncertain
future maintenance, vs. certain loss. Far more frequently offered as a reason
for information organization staff's involvement is the belief that creating a
digital surrogate will relieve use on the original. Yet there is a reason to
believe that the increased awareness of the items from their presence on the
web will lead to increased serious scholarly interest and a need to handle the
original.
Preserving Information!!
A more promising basis for information
organization's immediate involvement is "Preservation," the
intertwining of traditional microfilming with digitizing. There is still active
debate in the professional community concerning whether to scan or film first
but the technique allows for the best of both worlds. We can continue to use
microfilm as a long-lasting, low-maintenance archival format that can be converted
to digital format as needed, by either the institution or a scholar onsite.
Digitizing is systematically related to
microfilming, involving similar skills and workflow structures. Preservation professionals
have done an excellent job of developing the field of microfilm to a high
standard. They have imported and developed standards and guidelines to produce
a well documented process, and they are beginning to do the same with
digitizing.
Digitizing obviously involves many legitimate
digital information organization and Preservation issues: decision-making for
repair and the actual repair prior to scanning, handling and transport to and
through the scanning operation, the environmental concerns of the digital
capture location and process, and the specifications and handling of the
electronic surrogates to minimize the need for future scanning.
Digitizing creates both managerial and
ethical choices. If it is not balanced with needs introduced by use, brittle
collections, and exhibition, it may consume resources intended for conservation
treatment. How much treatment should you give something before imaging? How
much effort should you expend in editing an image as opposed to treating the
original?
So far, the focus is primarily on conversion
of paper-based collections to electronic forms. Soon the archiving of
electronic documents and collections will become a n information organization
concern. Running digital conversion programs is an excellent way to become
familiar with the technology and issues.
A final, and not the least significant,
reason to involve information organization in digital conversion is the
changing nature of research libraries and their priorities. If, as a field, we
are not actively involved in the central issues of libraries, we risk becoming
irrelevant. We are trained to evaluate the ways in which scholars use materials
and to ensure that the necessities of collecting, arranging, and describing
those materials do not damage or destroy the qualities that scholars find
critical to their work. Increasingly those research materials will be
electronic.
One is Downsizing. Changes in government funding of
universities dictates that in many institutions all new initiatives (for
everyone, not just libraries) must be funded from current budgets. It's not
that many libraries are doing things which are unrelated to their mission, it's
that we have to say which of these perfectly legitimate things we are not going
to do anymore, or will do less of, so that we can add new services and programs
to meet the needs of researchers.
Another reality is Outsourcing. This is not completely new to libraries,
but all traditional services are being systematically considered for
contracting out. Another library reality today is Operational efficiency.
Library processes are increasingly being reevaluated and traditional work flows
are being altered to streamline activity and reduce the number of people who
need to "touch" an item. As a result, staff must become familiar with
related areas of technical processing outside their own department or
specialty. Enterprise is another reality. As universities look to replace
the funding no longer available from governmental sources and since endowments
and tuition revenues are not adequate to close the gap, university officials
are looking at programs to produce income. Library directors will be subject to
this pressure as well. Money secured by a unit will not necessarily benefit the
unit or even the libraries, but may simply fill a gap or deficit in a lowered
operating budget.
Changes in priorities; In the last thirty years each decade has
seen a different area of librarianship capture interest and available funding:
cataloging, then information organization, then digitization. It is very likely
that information organization digitization will gain the support needed. Change
as a given; The rapid pace of change means the most important professional
skill to acquire is learning how to learn.
Lack of resources is a reality in today’s
libraries. Along with
decreased funding, libraries must cope with increased serial costs, digital
conversion costs, and acquisition of both traditional and new electronic
collections. All library functions will be continuously evaluated for cost
savings and relevance to service. Shedding stereotypes in Libraries that
are working very hard to shed their traditional image of conservatism. While it
is important to maintain our reputation of reliability, there is pressure to be
seen as innovative by the university community. Increasingly, willingness to
meet the needs of the community and a "can-do" attitude are seen as
more important than the traditional concerns of the library profession.
Traditional cataloging is one area that has been in conflict with the need to
increase efficiencies in cataloging, giving rise to simplified catalog records.
Bibliography:
1. Knowledge
organization in research: A conceptual model for organizing data. by
Given, Lisa M.
Olson, Hope A. Library & Information Science Research v. 25 no2 (2003) p.
157-76 ISSN: 0740-8188
2. Putting XML to work in the library: tools for improving
access and management. By Miller, Dick R.
3. A history of information storage and retrieval. By Stockwell,
Foster.
4.
TI: Cataloging
and metadata education: asserting a central role in information
organization.
AU: Hsieh-Yee-I
SO: Cataloging-and-Classification-Quarterly. 34 (1/2) 2002, p.203-22. il. refs.
WEBLH: Check for Holdings in the UA LIBRARY
http://sabio.library.arizona.edu/search/i?SEARCH=0163-9374
5.
TI: Managing cataloging and the organization of
information: philosophies,
practices and challenges at the onset of the 21st century.
AU: Carter-R-C
SO: Catalogue-and-Index. (144) Spring 2002, p.15-16.
WEBLH: Check for Holdings in the UA LIBRARY
http://sabio.library.arizona.edu/search/i?SEARCH=0008-7629
6. TI: 'Knowledge Organization', 1988-1999:
a bibliometric analysis.
AU: Rekha-G; Parameswaran-M
SO: SRELS-Journal-of-Information-Management. 39 (4) Dec 2002, p.355-62. il.
tbls.
WEBLH: Check for Holdings in the UA LIBRARY
7.
TI: Managing cataloging and the organization of information:
philosophies,
practices and challenges at the onset of the 21st century.
AU: Carter-R-C
SO: Journal-of-Internet-Cataloging. 5 (2) 2002, p.63-6.
8.
TI: Managing cataloging and the organization of
information: philosophies,
practices and challenges at the onset of the 21st century.
AU: Carter-R-C
SO: Technicalities-. 22 (1) Jan/Feb 2002, p.12-13.
WEBLH: Check for Holdings in the UA LIBRARY
http://sabio.library.arizona.edu/search/i?SEARCH=0272-0884
9.
TI: The
implementation of information technology in the corporate engineering
library.
AU: Schwarzwalder-R
SO: Science-and-Technology-Libraries. 19 (3/4) 2001, p.189-205. il. refs.
WEBLH: Check for Holdings in the UA LIBRARY
http://sabio.library.arizona.edu/search/i?SEARCH=0194-262X
10.TI: The design for authority-control systems in digital libraries.
[In Chinese]
AU: Chen-K-h
SO: Bulletin-of-Library-and-Information-Science. (34) Aug 2000, p.51-71. il.
Refs.
WEBLH: Check for Holdings in the UA LIBRARY
http://sabio.library.arizona.edu/search/i?SEARCH=1023
11.
Merrill. A Code
for Classifiers -Its scope and problems.
13. Pitti, D. Settling the Digital Frontier: The future of scholarly
communication in the Humanities. URL: <http://sunsite.berkeley.edu/FindingAids/EAD.dpitti.html>
14. Buckland, M. Bibliographic Access Reconsidered. In Redesigning
Library Services: A Manifesto. URL:<http://sunsite.berkeley.edu/Literature/Library/Redesigning/bi>baccess.html