I give permission for my final project to be made available through the LIS Learning Showcase web server.

Bibliometric Analysis:
Computers & Geosciences, Karl Stattegger, Keywords in GIS, and Bibliometrics/Webmetrics


Trevor Smith
May 7, 2004


Final Project
IRLS 589 -- Scholarly Communication
Dr. A. Coleman
School of Information Resources and Library Science
University of Arizona, Tucson, AZ



Introduction

    Citation analysis can provide valuable insights about the social and informational structure of science. Eugene Garfield was not the first to undertake citation indexing, but his vision and perseverance created ISI’s vast “Web of Science” enabling relatively easy interdisciplinary bibliometrics. (Cronin & Atkins, 2000) This work will detail a brief bibliometric analysis of Computers & Geosciences, a journal in the field of GIS.  From the list of most frequently cited C&G authors, Karl Stattegger was selected to be the subject of a personal bibliometric profile.  Finally, two essays have been included; the first on keywords in C&G and their relationship to either Geographic Information Science or Geographic Information Systems, the second on the usefulness of bibliometrics and the future of webmetrics.

Computers & Geosciences: the journal

    The journal Computers & Geosciences (C&G) is a print publication with a several online components.  At the Internet web site, http://www.elsevier.com/locate/cageo, the tables of contents and abstracts are available for all issues as well as a sample issue without a subscription.  Full text articles are available through Science Direct for all issues.  Additionally, programs and data sets featured in the articles can be downloaded from http://www.iamg.org.

    During the first year of C&G publication in 1975, it was produced quarterly.  Now it is published ten times per year.  The changes in frequency are described in the table below:

Year(s)
Frequency
1975-1984
Quarterly
1985
6 issues/year
1986-1990
8 issues/year
1991-Present
10 issues/year
Table 1
Changes in Publication Frequency


     There have been two statements of the journal's aims since its inception.  The initial one described the publication in 1975, "An international journal devoted to the rapid publication of computer programs in widely used languages and their applications."  Then, in 1996, the aim was changed to read, "An international journal devoted to the publication of papers on all aspects of geocomputation and to the distribution of computer programs and data sets."  1996 was also the year the founding Editor, D. F. Merriam of the University of Kansas, stepped down to allow G. F. Bonham-Carter of the Geological Survey of Canada to be the sole Editor-in-Chief.  The international aspect of the journal is further emphasized by noting that the publishers, originally Pergamon Press then later Elsevier, are based in the United Kingdom.

     From 1975-1995 entire computer program listings and data sets were published in C&G, with the readers presumably left to do a lot of typing (or even key-punching).  As of 1996, only "short programs, subroutines and pseudocode are printed", with the rest being available via anonymous ftp from the sponsoring organization, The International Association of Mathematical Geology (IAMG).  While papers on computer methods and theory still occupy a large portion of the journal, they publish full length research papers on many topics under the broad banner of "geoscience", short notes, book and software reviews,  letters to the editor and a regular column dealing with Internet issues.

    C&G absorbed the journal, COGS computer contributions, the technical journal of the Computer Oriented Geological Society, in 1990.

General Statistics

    The overview that follows is culled from statistics about C&G from January, 1998, through December of 2002.  Citations were retrieved from ISI's Web of Science database for that period as well as ISI's Journal Citation Reports.  The following figure describes the distribution of types of documents published in C&G:

Type of Document
Frequency 1998-2002
Article
550
Bibliography
1
Biographical-Item
4
Correction
2
Correction, Addition
0
Database Review
0
Editorial Material
34
Hardware Review
0
Item About an Individual
0
Letter
4
Note
0
Review
1
Software Review
6
TOTAL
602
Table 2
Frequency Distribution of Documents

In keeping with the primary mission of the journal, to "publish papers on all aspects of geocomputation," over 91.3% of the documents published in C&G during the sampled time period are scholarly articles.  Document distribution is skewed enough toward articles that generalizable conclusions will likely be focused on this category.

    Citivity is another important measure of journal productivity.  The following table lists by year the number of documents, and the average and median number of citations for each:

Year
Total Documents
Total References
Mean References
Median References
1998
119
1678
14.10
12
1999 114
1813
15.90
14
2000
121
1905
15.74
11
2001
126
2410
19.13
16
2002
122 2325
19.07
17
Table 3
Citivity

In 1999 the total number of documents was at its lowest level of the studied time frame, but the mean and median number of references was in line with other years.  There is no clear increase or decrease in the number of references per document.  Were we to observe a significant trend in one direction, it might provide an indication of the authors integration into their scientific community with emphasis on collegial "trust" as described in "The Citation Network as a Prototype for Representing Trust in Virtual Environments." (Davenport & Blaise, 2000)

     The following information is taken from ISI-JCR for Computers & Geosciences.  Impact factor is the frequency with which the 'average' article in the journal has been sited in the given year, the immediacy index describes how quickly the 'average' article is cited. 

ISI-JCR Impact and Immediacy Factors
Figure 1
Impact and Immediacy

Something appears to be amiss in 1999.  The Immediacy Index remains relatively constant throughout all five years, as does the Impact Factor, except in that one year.  It is obvious from the graph above that there is a significant discontinuity for some reason.  To calculate the impact factor, the total number citations is divided by the number of articles published in the previous two years.  Does this mean that for one year out of five, the impact of C&G was reduced to a quarter of the norm?

    Sociological and statistical factors can effect the measurement of impact.  Sociological effects could be in the form of the subject area of the journal, the type of documents published, and the average number of authors per paper; none of which changed between 1998 and 2000.  Statistical influences include the number of issues and articles being measured as well as the duration of the measurement. (Amin & Mabe, 2000)  C&G published 10 issues with 114 documents in 1999, as mentioned above, less articles than the five-year average.  1999 had over 10% fewer documents than the most prolific year in the study, 2001.  In a the small statistical universe of 1999 C&G articles,  this may be able to explain the drop in Impact Factor.  By definition Impact Factor is always a one-year grouping of citations; for many journals this may be a small enough number to cause some statistical fluctuations.
 
       Based on the data from ISI-WOS, the following charts represent the top 5 most cited authors in C&G during 1998-2002:

Author
Times Cited # Docs
# Articles
Schulz, M
84
2
2
Stattegger, K
77
1
1
Wesseling, CG; Pebsma, EJ
49
1
1
Schweiger, AJ; Key, JR
36
1
1
Yavuz, F
35
8
8
Table 4
Top Five Most Cited Authors

This data was collected from ISI-WOS and downloaded in groups of two and three year sets. Individual downloaded files were concatenated to make a master 5 year file.  Two concerns regarding the information were, first, finding an automated way to add up the citations from multiple papers, and second, determining how to credit individually the co-authors of a paper.
   
       The solution was to write a perl script, cit_counter, that read through the data file breaking apart multiple authors and crediting each of them with the citations; it creates a cumulative citation total by author from both individual and group contributions as well as tracking the number of documents and research articles.  The output is sorted alphabetically to bring attention to duplicate names and other possible problems.  Normalization of the data was a necessary step, many older articles in WOS listed the author's name in capital letters (compensated for in cit_counter).  Also, inconsistent spelling and use of initials created potential issues (the editor, G. F. Bonham-Carter, had four different representations).  After the data was as clean as possible, the Unix sort command was used to bring the most frequently cited authors to the top of the list.

Collaborative Analysis

    The number of authors per document is described by Figure 2:

Figure 2 - Authors per Document
Figure 2
Documents by Number of Authors

Scientific collaboration can be examined by the number of authors per paper as described in "The Web of Scientometrics". (Schubert, 2002)  In contrast with journal Scientometrics, the pattern of collaboration for C&G falls closer to the physical sciences than the social sciences.  Only 41.4% of the papers published in C&G were from single authors compared to 55.1% in Scientometrics and multi-authored (more than three authors) papers made up 10.5% of the C&G total contrasted with 5.4% of Scientometrics.  The average number of authors is 2.04 in C&G and 1.61 in Scientometrics.  From this small sample, GIS seems to be more collaborative than Scientometrics. The amount of collaboration in C&G has not grown in a statistically significant way during the past five years.

    C&G is truly an international journal. During the 5 year period, article authors were from 50 different countries; the most significant twenty are displayed in Figure 3, the rest constitute less than 1% of productivity.

Figure 3 -- Authors by Country
Figure 3
Internationality

     International collaboration can be difficult to define and measure.  Individual contributions can range from valuable to negligible, and can even change during the course of a project. (Bordons & Gomez, 2000)  For the purposes of this paper, the data will reflect only that an individual was listed as an author.

Total Documents
Single Author
Multiple Author / Single Institution
Multiple Author / Multiple Institution
Multiple Author / Multiple Country
602
249
140
213
75
Table 5
Collaboration

GIS appears to be a reasonably collaborative field across institutions as indicated by this limited data.  Over 35% of the total documents and 60% of the multiple author documents indicated more than one institutional affiliation as reported by the author's address field from ISI's WOS.  12.5% of the total papers and 21.2% of the multiple author documents in C&G have authors from more than one country.  This is much higher than the 7% reported for the journal Scientometrics (Schubert, 2002), and may reflect either the international intentions of C&G or a greater internationality of GIS, or some combination.

Summary of Computers & Geosciences Citation Analysis

    Based upon the increasing frequency of publication since 1975 which was accompanied by an increasing amount of articles, the volume of productivity has increased in the years since the inception Computers & Geosciences.  The primary document type published in C&G in the five years examined (1998-2002) was the scholarly article--other types of documents make up only a small fraction of the total.  Significant articles and authors are published in C&G as determined by the number of references to these papers in other documents (5 year maximum, 84).  Subjects of these documents vary over the broad continuum of Geoscience but tend to have aspects related to computers, software and technology.

    Both the Impact Factor and the Immediacy Index have remained relatively constant over the 5 year period with the exception of the Impact Factor in 1999.  It is not clear  what caused this anomaly, but it could be a statistical aberration because of the number of documents in the sample. Another probable factor was the 10% drop in article publication in 1999.  Compared to other Geoscience journals in ISI-JCR for the year 2002, C&G was on the low end in impact (91 out of 122) and in the middle of the pack in immediacy (69 out of 122).

    Authors of papers in C&G tended to be more collaborative than authors of articles in the social sciences or mathematics.  Fewer papers (as a percentage) were single author in C&G than the Journal Scientometrics, and twice the percentage were from more than three authors.  C&G is a strongly international journal, from 1998-2002 papers were published from over 50 nations.  In 35% of the papers where the authors were from different institutions, they were also from different countries.

Bibliometric Profile:  Karl Stattegger

Introduction

   
Number two on the most cited author list of C&G provided above is Karl Stattegger.  His co-authored paper, Spectrum: spectral analysis of unevenly spaced paleoclimatic time series, in C&G had been cited 77 times when the original study was performed.  (In the past several months, it has received an additional 2 citations, bringing the total to 79.)  As a current and frequently cited author in the field of GIS, he was selected to be the subject of this bibliometric profile.  The bibliometric profile will consist of the development and analysis of his citation identity (authors he cited),  citation set (authors who cited him), and citation image (co-cited authors).

Methodology

    The methodology for developing Stattegger's bibliometric profile closely parallels the example provided by Howard D. White. (White, 2001b)  To gather the raw data, a connection was opened through telnet to the Dialog database at dialog.com.  A list of articles where Stattegger was listed one of the authors was retrieved with the following command: SELECT AU=STATTEGGER K.  This list was sorted with the Dialog RANK command, downloaded, and filtered with the Unix tools of sed and awk to become the data for Stattegger's citation identity.  His citation set and citation image were generated in much the same way with the exception that the data was selected with the following command:  SELECT CA= STATTEGGER K. 

    A prime concern is keeping the data retrieved from ISI's databases, including Dialog, "clean".  Because of the way the authors are identified, last name and initial(s), multiple authors can be identified by the same character stream and one author is often identified in multiple ways.  Selecting Karl Stattegger was beneficial in that he is identified  consistently as an author, STATTEGGER K.  The citation data was visually inspected and edited by hand, but no automated steps were taken to insure every author homonym and allonym was weeded out.

    The Dialog database has information indexed since 1990, the following profile is limited to that time range.

Results

    Stattegger has published 25 articles since 1990, White referred to this as the author's "oeuvre".  (White 2001b)  Table 6 is Stattegger's empirical profile:

Citation Identity
Citation Set
Citation Image
9   STATTEGGER K
6   DUPLESSY JC
6   HANEBUTH T
6   LEVITUS S
6   SARNTHEIN M
6   VITAL H
5   BOND G
5   IRION G
5   STUIVER M
4   BARD E
4   BROECKER WS
4   BRYAN K
4   CARROLL AR
4   CHAPPELL J
4   COX MD
4   DIETRICH G
4   FAIRBANKS RG
4   GIBBS RJ
4   HU JY
4   MCLENNAN SM
4   MILLIMAN JD
4   MOLENGRAAFF GAF
4   NADEAU MJ
4   NITTROUER CA
4   PACANOWSKI R
4   ROLLINSON HR
4   RUDDIMAN WF
4   SEIDOV D
4   TAYLOR SR
4   TJIA HD
4   TORRES AM
8   STATTEGGER K
5   HANEBUTH T
5   KIENAST M
3   STEINKE S
2   GARBESCHOENBERG CD
2   GRIMALT JO
2   LOPEZAGUAYO F
2   PELEJERO C
2   VITAL H
2   WAGREICH M
2   WANG LJ
1   ALAM MM 
1   BASU A
1   BELLON AS
1   BLOM M
1   CABALLERO MA
1   CALVERT SE
1   CATANE SG
1   CRIMES TP
1   DOMINGUEZBELLA S
1   ERLENKEUSER H
1   FAUPL P
1   GROOTES PM
1   GROVES JR
1   GUTIERREZMAS JM
1   HALLSWORTH CR
1   HINRICHS J
1   KOTOPOULI CN
1   LARGHI C
1   LISTANCO EL
1   LUDMANN T
28   STATTEGGER K
11   HANEBUTH T
9   MOLENGRAAFF GAF
8   PELEJERO C
8   WANG LJ
7   BARD E
7   PETTIJOHN FJ
7   STUIVER M
6   GROOTES PM
6   MORTON AC
6   NADEAU MJ
5   KIENAST M
5   SARNTHEIN M
5   SCHLEICHER M
5   STEINKE S
5   WANG PX
4   BHATIA MR
4   BROECKER WS
4   DAVIS JC
4   DICKINSON WR
4   FAIRBANKS RG
4   GUPTA A
4   IMBRIE J
4   INGERSOLL RV
4   MCLENNAN SM
4   POTTER PE
4   VILLANUEVA J
4   WONG HK
3   AITCHISON J
3   ALLEY RB
3   BASU A
Table 6
Stattegger Bibliometric Profile

The above data has been truncated at the bottom of the column; names which occur later alphabetically have been arbitrarily disregarded.  White believes this data is the "most robust depiction of an author's intellectual world".  (White, 2000b) 

    The first column describes Stattegger's citation behavior.  Over his twenty-six indexed papers, he has drawn references from a variety of authors.  Indeed,  there are a huge number of authors that he has cited four or fewer times.  This would seem to indicate that his scientific-social network was broad but not necessarily deep.  His publishable work may not be parallel to other active projects.  Exceptions to this include the close citational relationship between Stattegger and Hanebuth.  They cite each other frequently and are also regularly co-cited.  Stattengger cites Molengraaff, Sarnthein, and Bard, and they are co-cited frequently, but none of these authors cite Stattergger with any consistency.  Perhaps there is enough of a difference in their specific foci in the discipline that they do not reciprocate with citations.  My belief is that many of  Stattegger's papers deal with applied aspects of GIS, often the application or results of using a tool and the authors that he sites may emphasize the theoretical aspects of GIS.  He would have reason to cite them, but they would not necessarily have reason to cite him as often.  Another possibility is suggested by White is the idea of intellectual cohorts.  Scientists tend to group authors in to senior, peer, and junior categories.  They tend to cite those they believe are senior in the field, or at the least, peers. (White, 2001a) Stattegger may not be looked upon as a senior scientific member of his community by the other three authors.

    Any of the above possibilities could apply in reverse regarding Kienast and Steinke who are high in the citation set and the image, but not cited by Stattegger.  Something may be at work with the details of the discipline, practical vs. theoretical emphasis, or aspects of the perceived cohort.  Overall, Stattegger does not receive a disproportionate amount of citations from other authors, nearly all only refer to him once or twice.

    Pelejero, LJ Wang, Pettijohn and Stuiver all appear near the top of the co-citation list without frequently citing or being cited by Stattegger.  Their papers may not overlap, but be complementary in some aspect.  It would be interesting to create a bibliometric profile on them to determine where the interconnections might lie.

Conclusion

    Researchers in most branches of science have a long tradition of being personally accountable for the quality of work, including the sources they cite. Reasons for one author citing another's work can relate to relevance, social ties, and egoism--but there is invariably a motivation for the citation.  Scientometrics provides frameworks to examine, usually through the analysis of citations, the relationship of scholars and ideas to each other.

    Howard White suggests that there is much to learn from using an author-centered view of citation analysis.  Unique as DNA, no two authors use citations in identical ways, and a deconstruction can yield insight about how the individual works, thinks, and socializes.  Three measurements go in to a personal bibliometric profile, citation identity, citation set, and citation image.

    Even a cursory examination of the bibliometric profile created for Karl Stattegger provided some interesting observations.  He has a strong professional tie to Hanebuth on a more-or-less equal level.  There is a disproportionate relationship to Molengraaff, Sarnthein, and Bard, as well as Kienast and Steinke.  The first three are cited by, but rarely cite Stattegger; the last two cite but are rarely cited by him.  The idea of cohort groupings in science could account for some of the one-sidedness.

    This personal Bibliometric profile is just the first step in what could be a much more detailed analysis of GIS author centered bibliometrics.  A possible course of action would be to select a researcher and create profiles on the authors that appeared to have interesting connections.  In this way the "web" could be unraveled a bit at a time.

Subject Keywords in Computers & Geosciences

    Does GIS really mean "Geographic Information Science" or "Geographic Information Systems"?  Certainly there are journals in the field that emphasize one over the other.  There is some suggestion by keywords of the articles in Computers & Geosciences for either or both; although, the preponderance of evidence seems to lay in one direction.  The GIS debate is suggestive of one raging in our own field, Library Science or Information Science?  Information Science evokes the vision of a broad-based discipline with formal theoretical underpinnings and applicability in many domains.  Library Science has a rich and specific tradition  and a professional focus on a much narrower set of circumstances (i.e. those found in a library).  Discipline or application, good question!

    Some of the common keywords found for articles in C&G included sedimentation, cretaceous, paelostress, and fault, all words that have strong ties to specific geosciences like geology and geophysics.  If these were the most common, I would argue for "Geographic Information Science."  However, another category of keyword vastly outnumbered the geoscience related terms--computer and information systems related expressions.  Words like software, Internet, sorting, graphics, interactive, and phrases like object-oriented programming, image processing, pattern recognition,  and process simulation.  It is clear that Computers &  Geosciences is focused on the use and application of computer tools rather more than the broader geoscience discipline.  In the mission statement of the journal, they refer to their niche as "geocomputation" rather than geoscience.

    Areas like GIS and LIS have much in common; they are struggling to adopt some kind of unified vision, an understanding of the boundaries of their field.  A journal like C&G fits in very nearly at one end of the spectrum, providing articles for a sub-group of GIS researchers.  Even as the larger GIS community continues to try to define its self, C&G should continue to be able to support a core following of geo-computer scientists.

Biblometrics and Webmetrics: An Essay

    “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.” (Kelvin, 1891)

    Bibliometrics is the application of numerical and statistical techniques to texts and documents.  Publication counts, co-term analysis, and citation analysis are examples of specific bibliometric practices.  When bibliometrics is used to examine aspects of a specific scientific discipline, it is often referred to as Scientometrics.  A fundamental idea behind bibliometrics is that, by measuring and analyzing the empirical bibliographic data contained in books and journals, new knowledge can be found in the trends and patterns. 

    Sociology has two distinct schools of research, statistical and experiential.  Those that use the statistical approach create surveys, pour over population data, and present papers with summary statistics and confidence intervals.  Experiential sociologists tend to get up-close and personal with the subjects of their research; riding with a motorcycle gang to study aspects of competitive group interaction.  Of the two, bibliometrics clearly has its roots in the former.  To study a field like GIS, we analyze the literature of the discipline, we do not attend a GIS conference or take notes at a happy hour with the geography department. 

    One form of research is not necessarily “better” than another, but they do produce different types of results.  Anecdotal experience has been instrumental in giving birth to great theories in many disciplines, but statistical research can be a more reliable contributor when the numbers produced are meaningful.  In this context, meaningful means accurate, representative, and generalizable—the criteria upon which bibliographic methods will be evaluated for their usefulness in gaining an understanding of patterns of scholarly communication.

    Eugene Garfield contributed an enormous amount toward making accurate bibliographic information available to scholars. Data of this type was simply not available in nearly any scientific field until he produced it.  His seminal paper on citation indexing (Garfield, 1955) advocated its creation and use to prevent uncritical citation of data in scientific literature, but as a side-effect it provided an essential tool for bibliometrics.   Garfield’s company, ISI, has become the leader in citation indexing with products such as the Web Of Science (WOS).

    Not to disparage such an important contribution, but there remain two accuracy problems with the information provided by ISI’s products that can create barriers to bibliometric analysis, completeness and dirty data.  The WOS currently indexes “approximately 8,500 of the most prestigious, high impact research journals in the world,” (Thompson-ISI, 2004) but it is still far from complete.  If the author or the discipline being studied is obscure, significant data might not be available in the citation index.  Dirty data can be a problem in analysis, generally in the form of homonyms and allonyms of author’s names.  (White, 2001)

    Citations can be made for many reasons; biblometric analysis makes no contextual evaluation.  So, are they representative of the connection for which we are giving them credit?  Cole describes the skepticism and early resistance to the use of citations a qualitative measurement tool. (Cole, 2000) Papers were written about the problems associated with citation counts and what they really represented, for instance, how to take negative citations in to account. The consensus seems to be that they are still a useful indicator, but research continues on exactly what they represent in a formal bibliometric sense.  It can be asserted that to study the patterns of scholarly communication, the type of citation is not as important as the abstract connection that a citation indicates.

    “The Complementarity of Scientometrics and Economics” (Diamond, 2000) is an excellent commentary on the generalizability of bibliometric and scientometric techniques.  He argues that, not only are the principles behind scientometrics similar to a particular branch of economics, but bibliometrics has been used to validate economic theories.  Patterns of communication, central to bibliometrics, seem to be pervasive and applicable in many arenas.

    In summary, there are still problems with the accuracy and completeness of citation data, as well as unresolved issues regarding how to take in to account the variable meanings of a citation.  Patterns of scholarly communication exist, and the scientometric techniques for exploring them can even be generalized to other areas.  Bibliometrics is a useful way, and certainly the best empirical method, we have for understanding the connections in the literature of a discipline.

    Webmetrics is potentially a much broader field than bibliometrics.  It is generally understood to be the collection and analysis of descriptive data gathered during online interactions.  The whole new dimension of user behavior is added to the list of characteristics that can be empirically measured.  In contrast to the bibliometrician who may have difficulty finding enough information on a narrow topic, there is so much webmetric data that the problem is assigning meaning and categorizing it—determining what data is important and why.

    Webmetrics is its closely tied to automation.  Data gathering must be implemented manually, but it can continue to run indefinitely without consuming human resources.  The information is usually kept in a machine-readable form, making it easy to search, analyze, and manipulate (functions that can also be automated to various degrees).  Inexpensive storage allows large quantities of collected data to be saved for later scrutiny.  There is tremendous future potential for webmetrics in both academic and commercial realms.

    The challenges that must be overcome to allow webmetrics to reach its potential fall in to two categories, significance and privacy.

    Significance is a label for the broad class of problems related to determining what data to collect and what meaning is associated with it.  Similar to representation, a criteria used to evaluate bibliometrics, significance asks how strongly the information being collected correlates to the behavior or structure being explained.  If an individual clicks on a web page, are we to interpret that to mean they read the whole page?  Half the page?  Just the banner ads?  Can the data be analyzed in such a way as to weed out visits by web-crawler and webbot software?  How does the distributed nature of the Internet impact data collection?  Compared to scholarly journals and books, online resources are both more numerous and less standardized.  Is a hypertext link analogous to a citation, and if so, is citation-style analysis appropriate?  Many open questions remain.

     Google, the successful private-sector search engine company, relies on the webmetric concept that links are analogous to citations in its PageRank algorithm.   “It is based on the premise, prevalent in the world of academia that the importance of a research paper can be judged by the number of citations the paper has from other research papers.  Brin and Page have simply transferred this premise to its web equivalent: the importance of a web page can be judged by the number of hyperlinks pointing to it from other web pages.” (Calishain & Dornfest, 2003)

    In the ALA's document, Privacy: An Interpretation of the Library Bill of Rights it is stated, "The American Library Association affirms that the rights of privacy are necessary for intellectual freedom and are fundamental to the ethics and practice of librarianship." (American Library Association, 2003)  Free and open inquiry is only possible where individuals have a reasonable assurance that they can investigate the topics of their choice without the scrutiny of others, be it by their peers, employer or government.

    Bibliometrics does not attempt to collect data that might be considered private.  Sources are published works, written for public consumption.  In the course of collecting webmetric data, an individual’s right to privacy could be compromised, either intentionally or unintentionally.  Webmetrics must not discourage open inquiry by infringing on personal intellectual freedom.

    There is an enormous amount of knowledge to be discovered through the use and analysis of webmetrics.  Scholars and business people alike have a vested interest in being able to extract meaningful content from the vast amount of data.  Many aspects of significance must be resolved—further research will be required in online behavior, resource evaluation, and metadata.  But, the overriding information seeking behavior concern is to insure that, even as data is collected, privacy and intellectual freedom will be preserved.


References

American Library Association, O. f. I. F. (2003). Privacy: An Interpretation of the Library Bill of Rights. Retrieved 15 Apr, 2004, from http://www.ala.org/ala/oif/statementspols/statementsif/librarybillrights.htm

Amin, M., & Mabe, M. (2000, October). Impact Factors: Use and Abuse. Retrieved 20 April, 2004, from Elsevier Science, Perspectives in Publishing 1(1),  http://www.ceraj.com/Downloads/Impact_factors.pdf

Bordons, M., & Gomez, I. (2000). Collaboration Networks in Science. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp. 197-213). Medford, NJ: Information Today, Inc.

Calishain, T., & Dornfest, R. (2003). Google hacks : [100 industrial-strength tips & tools]. Sebastopol, CA: O'Reilly.

Cole, J. R. (2000). A Short History of the Use of Citations as a Measure of the Impact of Scientific and Scholarly Work. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp. 281-300). Medford, NJ: Information Today, Inc.

Cronin, B., & Atkins, H. B. (2000). The Scholar's Spoor. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp. 1-7). Medford, NJ: Information Today, Inc.

Davenport, E., & Blaise, C. (2000). The Citation Network as a Prototype for Representing Trust in Virtual Environments. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp. 517-534). Medford, NJ: Information Today, Inc.

Diamond, J., Arthur M. (2000). The Complementarity of Scientometrics and Economics. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp. 321-336). Medford, NJ: Information Today, Inc.

Garfield, E. (1955). Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. Science, 122, 108-111.

Kelvin, W. T. (1891). Popular lectures and addresses. London, New York,: Macmillan and Co.

Schubert, A. (2002). The Web of Scientometrics: A statistical overview of the first 50 volumes of the journal. Scientometrics, 53(1), 3-20.

Thompson-ISI. (2004). Web of Science. Retrieved 1 May, 2004, from http://www.isinet.com/products/citation/wos/

White, H. D. (2001a). Author-centered bibliometrics through CAMEOs:  characterizations automatically made and edited online. Scientometrics, 51(3), 608-637.

White, H. D. (2001b). Authors as Citers over Time. Journal of the American Society for Information Science and Technology, 52(2), 87-108.