Dr. A. Coleman
Citation analysis can
provide valuable insights about the social and informational structure
of science. Eugene Garfield was not the first to undertake citation
indexing, but his vision and perseverance created ISI’s vast “Web of
Science” enabling relatively easy interdisciplinary bibliometrics.
(Cronin & Atkins, 2000) This work will detail a brief bibliometric
analysis of
Computers &
Geosciences, a journal in the field of GIS. From the list
of most frequently cited
C&G
authors, Karl Stattegger was selected to be the subject of a personal
bibliometric profile. Finally, two essays have been included; the
first on keywords in
C&G
and their relationship to either Geographic Information Science or
Geographic Information Systems, the second on the usefulness of
bibliometrics and the future of webmetrics.
Computers &
Geosciences: the journal
The journal Computers & Geosciences (C&G) is
a print
publication with a several online components. At the Internet web
site,
http://www.elsevier.com/locate/cageo,
the tables of contents and abstracts are available for all issues as
well as a sample issue without a subscription. Full text articles
are available through Science Direct for all issues.
Additionally,
programs and data sets featured in the articles can be downloaded from
http://www.iamg.org.
During the first year of
C&G publication in 1975, it was
produced quarterly. Now it is published ten times per year.
The changes in frequency are described in the table below:
Year(s)
|
Frequency
|
1975-1984
|
Quarterly
|
1985
|
6 issues/year
|
1986-1990
|
8 issues/year
|
1991-Present
|
10 issues/year
|
Table 1
Changes in Publication Frequency
There have been two statements of the
journal's aims since its inception. The initial one described the
publication in 1975, "An international journal devoted to the rapid
publication of computer programs in widely used languages and their
applications." Then, in 1996, the aim was changed to read, "An
international journal devoted to the publication of papers on all
aspects of geocomputation and to the distribution of computer programs
and data sets." 1996 was also the year the founding Editor, D. F.
Merriam of the University of Kansas, stepped down to allow G. F.
Bonham-Carter of the Geological Survey of Canada to be the sole
Editor-in-Chief. The international aspect of the journal is
further emphasized by noting that the publishers, originally Pergamon
Press then later Elsevier, are based in the United Kingdom.
From 1975-1995 entire computer program
listings and data sets were published in C&G, with the readers
presumably left to do a lot of typing (or even key-punching). As
of 1996, only "short programs, subroutines and pseudocode are printed",
with the rest being available via anonymous ftp from the sponsoring
organization, The International Association of Mathematical Geology
(IAMG). While papers on computer methods and theory still occupy
a large portion of the journal, they publish full length research
papers on many topics under the broad banner of "geoscience", short
notes, book and software reviews, letters to the editor and a
regular column dealing with Internet issues.
C&G
absorbed the journal,
COGS computer
contributions, the technical journal of the Computer Oriented
Geological Society, in 1990.
General Statistics
The overview that follows is culled from statistics
about
C&G from January,
1998, through December of 2002. Citations were retrieved from
ISI's Web of Science database for that period as well as ISI's Journal
Citation Reports. The following figure describes the distribution
of types of documents published in
C&G:
Type
of Document
|
Frequency
1998-2002
|
Article
|
550
|
Bibliography
|
1
|
Biographical-Item
|
4
|
Correction
|
2
|
Correction, Addition
|
0
|
Database Review
|
0
|
Editorial Material
|
34
|
Hardware Review
|
0
|
Item About an Individual
|
0
|
Letter
|
4
|
Note
|
0
|
Review
|
1
|
Software Review
|
6
|
TOTAL
|
602
|
Table 2
Frequency Distribution of Documents
In keeping with the primary mission of the journal, to "publish papers
on all aspects of geocomputation," over 91.3% of the documents
published in C&G during the sampled time period are scholarly
articles. Document distribution is skewed enough toward articles
that generalizable conclusions will likely be focused on this
category.
Citivity is another important measure of journal
productivity. The following table lists by year the number of
documents, and the average and median number of citations for each:
Year
|
Total
Documents
|
Total
References
|
Mean
References
|
Median
References
|
1998
|
119
|
1678
|
14.10
|
12
|
| 1999 |
114
|
1813
|
15.90
|
14
|
2000
|
121
|
1905
|
15.74
|
11
|
2001
|
126
|
2410
|
19.13
|
16
|
2002
|
122 |
2325
|
19.07
|
17
|
Table 3
Citivity
In 1999 the total number of documents
was at its lowest level of the studied time frame, but the mean and
median number of references was in line with other years. There
is no clear increase or decrease in the number of references per
document. Were we to observe a significant trend in one
direction, it might provide an indication of the authors integration
into their scientific community with emphasis on collegial "trust" as
described in "The Citation Network as a Prototype for Representing
Trust in Virtual Environments." (Davenport & Blaise, 2000)
The following information is taken from
ISI-JCR for Computers & Geosciences. Impact factor is the
frequency with which the 'average' article in the journal has been
sited in the given year, the immediacy index describes how quickly the
'average' article is cited.
Figure 1
Impact and Immediacy
Something appears to be amiss in 1999. The Immediacy Index
remains
relatively constant throughout all five years, as does the Impact
Factor, except in that one year. It is obvious from the graph
above that there is a significant discontinuity for some reason.
To calculate the impact factor, the total number citations is divided
by
the number of articles published in the previous two years. Does
this mean that for one year out of five, the impact of C&G was
reduced to a quarter of the norm?
Sociological and statistical factors can effect the
measurement of impact. Sociological effects could be in the form
of the subject area of the journal, the type of documents published,
and the average number of authors per paper; none of which changed
between 1998 and 2000. Statistical influences include the number
of issues and articles being measured as well as the duration of the
measurement. (Amin & Mabe, 2000)
C&G published 10 issues with
114 documents in 1999, as mentioned above, less articles than the
five-year average. 1999 had over 10% fewer documents than the
most prolific year in the study, 2001. In a the small statistical
universe of 1999
C&G
articles, this may be able to explain the drop in Impact
Factor. By definition Impact Factor is always a one-year grouping
of citations; for many journals this may be a small enough number to
cause some statistical fluctuations.
Based on the data from ISI-WOS, the
following charts represent the top 5 most cited authors in C&G
during 1998-2002:
Author
|
Times
Cited |
# Docs
|
#
Articles
|
Schulz,
M
|
84
|
2
|
2
|
Stattegger,
K
|
77
|
1
|
1
|
Wesseling,
CG; Pebsma, EJ
|
49
|
1
|
1
|
Schweiger,
AJ; Key, JR
|
36
|
1
|
1
|
Yavuz,
F
|
35
|
8
|
8
|
Table 4
Top Five Most Cited Authors
This data was collected from ISI-WOS and
downloaded in groups of two and three year sets. Individual downloaded
files were concatenated to make a master 5 year file. Two
concerns regarding the information were, first, finding an automated
way to
add up the citations from multiple papers, and second, determining how
to credit individually the co-authors of a paper.
The solution was to write a perl
script,
cit_counter, that read through the data file breaking apart multiple
authors and crediting each of them with the citations; it creates a
cumulative citation total by author from both individual and group
contributions as well as tracking the number of documents and research
articles. The output is sorted alphabetically to bring attention
to duplicate names and other possible problems. Normalization of
the data was a necessary step, many older articles in WOS listed the
author's name in capital letters (compensated for in
cit_counter). Also, inconsistent spelling and use of initials
created potential issues (the editor, G. F. Bonham-Carter, had four
different representations). After the data was as clean as
possible, the Unix sort command was used to bring the most frequently
cited authors to the top of the list.
Collaborative Analysis
The number of authors per document is described by
Figure 2:
Figure 2
Documents by Number of Authors
Scientific collaboration can be examined by the number of authors per
paper as described in "The Web of Scientometrics". (Schubert,
2002) In contrast with journal
Scientometrics,
the pattern of
collaboration for
C&G
falls closer to the physical sciences than the social sciences.
Only 41.4% of the papers published in
C&G
were from single authors compared to 55.1% in
Scientometrics and multi-authored
(more than three authors) papers made up 10.5% of the
C&G total contrasted with 5.4%
of
Scientometrics. The
average number of authors is 2.04 in
C&G
and 1.61 in
Scientometrics.
From this small sample, GIS seems to be more collaborative than
Scientometrics. The amount of collaboration in
C&G has not grown in a
statistically significant way during the past five years.
C&G is
truly an international journal. During the 5 year period, article
authors were from 50 different countries; the most significant twenty
are displayed in Figure 3, the rest constitute less than 1% of
productivity.
Figure 3
Internationality
International collaboration can be difficult
to define and measure. Individual contributions can range from
valuable to negligible, and can even change during the course of a
project. (Bordons & Gomez, 2000) For the purposes of this
paper, the data will reflect only that an individual was listed as an
author.
Total Documents
|
Single
Author
|
Multiple Author / Single Institution
|
Multiple Author / Multiple Institution
|
Multiple Author / Multiple Country
|
602
|
249
|
140
|
213
|
75
|
Table 5
Collaboration
GIS appears to be a reasonably collaborative field across institutions
as indicated by this limited data. Over 35% of the total
documents and 60% of the multiple author documents indicated more than
one institutional affiliation as reported by the author's address field
from ISI's WOS. 12.5% of the total papers and 21.2% of the
multiple author documents in
C&G
have authors from more than one country. This is much higher than
the 7% reported for the journal
Scientometrics
(Schubert, 2002), and may reflect either the international intentions
of C&G or a greater internationality of GIS, or some combination.
Summary
of Computers
&
Geosciences Citation Analysis
Based upon the
increasing frequency of publication since 1975 which was accompanied by
an increasing amount of articles, the volume of productivity has
increased in the years since the inception Computers &
Geosciences. The primary document type published in C&G in the five years examined
(1998-2002) was the scholarly article--other types of documents make up
only a small fraction of the total. Significant articles and
authors are published in C&G as determined by the number of
references to these papers in other documents (5 year maximum,
84). Subjects of these documents vary over the broad continuum of
Geoscience but tend to have aspects related to computers, software and
technology.
Both the Impact Factor and the Immediacy Index have
remained relatively constant over the 5 year period with the exception
of the Impact Factor in 1999. It is not clear what caused
this anomaly, but it could be a statistical aberration because of the
number of documents in the sample. Another probable factor was the 10%
drop in article publication in 1999. Compared to other Geoscience
journals in ISI-JCR for the year 2002, C&G was on the low end in
impact (91 out of 122) and in the middle of the pack in immediacy (69
out of 122).
Authors of papers in C&G tended to be more
collaborative than authors of articles in the social sciences or
mathematics. Fewer papers (as a percentage) were single author in
C&G than the Journal Scientometrics, and twice the percentage were
from more than three authors. C&G is a strongly international
journal, from 1998-2002 papers were published from over 50
nations. In 35% of the papers where the authors were from
different institutions, they were also from different countries.
Bibliometric Profile: Karl Stattegger
Introduction
Number two on the most cited author
list of C&G provided
above is Karl Stattegger. His co-authored paper, Spectrum:
spectral analysis of unevenly spaced paleoclimatic time series, in C&G had been cited 77 times
when the original study was performed. (In the past several
months, it has received an additional 2 citations, bringing the total
to 79.) As a current and frequently cited author in the field of
GIS, he was selected to be the subject of this bibliometric
profile. The bibliometric profile will consist of the development
and analysis of his citation identity (authors he cited),
citation set (authors who cited him), and citation image (co-cited
authors).
Methodology
The methodology for developing Stattegger's
bibliometric profile closely parallels the example provided by Howard
D. White. (White, 2001b) To gather the raw data, a connection was
opened through telnet to the Dialog database at dialog.com. A
list of articles where Stattegger was listed one of the authors was
retrieved with the following command: SELECT AU=STATTEGGER K.
This list was sorted with the Dialog RANK command, downloaded, and
filtered with the Unix tools of sed and awk to become the data for
Stattegger's citation identity. His citation set and citation
image were generated in much the same way with the exception that the
data was selected with the following command: SELECT CA=
STATTEGGER K.
A prime concern is keeping the data retrieved from
ISI's databases, including Dialog, "clean". Because of the way
the authors are identified, last name and initial(s), multiple authors
can be identified by the same character stream and one author is often
identified in multiple ways. Selecting Karl Stattegger was
beneficial in that he is identified consistently as an author,
STATTEGGER K. The citation data was visually inspected and edited
by hand, but no automated steps were taken to insure every author
homonym and allonym was weeded out.
The Dialog database has information indexed since
1990, the following profile is limited to that time range.
Results
Stattegger has published 25 articles since 1990,
White referred to this as the author's "oeuvre". (White
2001b) Table 6 is Stattegger's empirical profile:
Citation Identity
|
Citation Set
|
Citation Image
|
9 STATTEGGER K
6 DUPLESSY JC
6 HANEBUTH T
6 LEVITUS S
6 SARNTHEIN M
6 VITAL H
5 BOND G
5 IRION G
5 STUIVER M
4 BARD E
4 BROECKER WS
4 BRYAN K
4 CARROLL AR
4 CHAPPELL J
4 COX MD
4 DIETRICH G
4 FAIRBANKS RG
4 GIBBS RJ
4 HU JY
4 MCLENNAN SM
4 MILLIMAN JD
4 MOLENGRAAFF GAF
4 NADEAU MJ
4 NITTROUER CA
4 PACANOWSKI R
4 ROLLINSON HR
4 RUDDIMAN WF
4 SEIDOV D
4 TAYLOR SR
4 TJIA HD
4 TORRES AM
|
8 STATTEGGER K
5 HANEBUTH T
5 KIENAST M
3 STEINKE S
2 GARBESCHOENBERG CD
2 GRIMALT JO
2 LOPEZAGUAYO F
2 PELEJERO C
2 VITAL H
2 WAGREICH M
2 WANG LJ
1 ALAM MM
1 BASU A
1 BELLON AS
1 BLOM M
1 CABALLERO MA
1 CALVERT SE
1 CATANE SG
1 CRIMES TP
1 DOMINGUEZBELLA S
1 ERLENKEUSER H
1 FAUPL P
1 GROOTES PM
1 GROVES JR
1 GUTIERREZMAS JM
1 HALLSWORTH CR
1 HINRICHS J
1 KOTOPOULI CN
1 LARGHI C
1 LISTANCO EL
1 LUDMANN T
|
28 STATTEGGER K
11 HANEBUTH T
9 MOLENGRAAFF GAF
8 PELEJERO C
8 WANG LJ
7 BARD E
7 PETTIJOHN FJ
7 STUIVER M
6 GROOTES PM
6 MORTON AC
6 NADEAU MJ
5 KIENAST M
5 SARNTHEIN M
5 SCHLEICHER M
5 STEINKE S
5 WANG PX
4 BHATIA MR
4 BROECKER WS
4 DAVIS JC
4 DICKINSON WR
4 FAIRBANKS RG
4 GUPTA A
4 IMBRIE J
4 INGERSOLL RV
4 MCLENNAN SM
4 POTTER PE
4 VILLANUEVA J
4 WONG HK
3 AITCHISON J
3 ALLEY RB
3 BASU A
|
Table 6
Stattegger Bibliometric Profile
The above data has been truncated at the bottom of the column; names
which occur later alphabetically have been arbitrarily
disregarded. White believes this data is the "most robust
depiction of an author's intellectual world". (White,
2000b)
The first column describes Stattegger's citation
behavior. Over his twenty-six indexed papers, he has drawn
references from a variety of authors. Indeed, there are a
huge number of authors that he has cited four or fewer times.
This
would seem to indicate that his scientific-social network was broad but
not necessarily deep. His publishable work may not be parallel to
other active projects. Exceptions to this include the close
citational relationship between Stattegger and Hanebuth. They
cite each other frequently and are also regularly co-cited.
Stattengger cites Molengraaff, Sarnthein, and Bard, and they are
co-cited frequently, but none of these authors cite Stattergger with
any consistency. Perhaps there is enough of a difference in their
specific foci in the discipline that they do not reciprocate with
citations. My belief is that many of Stattegger's papers
deal with applied aspects of GIS, often the application or results of
using a tool and the authors that he sites may emphasize the
theoretical aspects of GIS. He would have reason to cite them,
but they would not necessarily have reason to cite him as often.
Another possibility is suggested by White is the idea of intellectual
cohorts. Scientists tend to group authors in to senior, peer, and
junior categories. They tend to cite those they believe are
senior
in the field, or at the least, peers. (White, 2001a) Stattegger may not
be looked upon as a senior scientific member of his community by the
other three authors.
Any of the above possibilities could apply in
reverse
regarding Kienast and Steinke who are high in the citation set and the
image, but not cited by Stattegger. Something may be at work with
the details of the discipline, practical vs. theoretical emphasis, or
aspects of the perceived cohort. Overall, Stattegger does not
receive a disproportionate amount of citations from other authors,
nearly all only refer to him once or twice.
Pelejero, LJ Wang, Pettijohn and Stuiver all appear
near the top of the co-citation list without frequently citing or being
cited by Stattegger. Their papers may not overlap, but be
complementary in some aspect. It would be interesting to create a
bibliometric profile on them to determine where the interconnections
might lie.
Conclusion
Researchers in most branches of science have a long
tradition of being
personally accountable for the quality of work, including the sources
they cite. Reasons for one author citing another's work can relate to
relevance, social ties, and egoism--but there is invariably a
motivation for the citation. Scientometrics provides frameworks
to examine, usually through the analysis of citations, the relationship
of scholars and ideas to each other.
Howard White suggests that there is much to learn
from using an author-centered view of citation analysis. Unique
as DNA, no two authors use citations in identical ways, and a
deconstruction can yield insight about how the individual works,
thinks, and socializes. Three measurements go in to a personal
bibliometric profile, citation identity, citation set, and citation
image.
Even a cursory examination of the bibliometric
profile created for Karl Stattegger provided some interesting
observations. He has a strong professional tie to Hanebuth on a
more-or-less equal level. There is a disproportionate
relationship to Molengraaff, Sarnthein, and Bard, as well as Kienast
and Steinke. The first three are cited by, but rarely cite
Stattegger; the last two cite but are rarely cited by him. The
idea of cohort groupings in science could account for some of the
one-sidedness.
This personal Bibliometric profile is just the first
step in what could be a much more detailed analysis of GIS author
centered bibliometrics. A possible course of action would be to
select a researcher and create profiles on the authors that appeared to
have interesting connections. In this way the "web" could be
unraveled a bit at a time.
Subject Keywords in
Computers
&
Geosciences
Does GIS really mean "Geographic Information
Science" or "Geographic Information Systems"? Certainly there are
journals in the field that emphasize one over the
other. There is some suggestion by keywords of the articles in
Computers & Geosciences
for either or both; although, the
preponderance of evidence seems to lay in one direction. The GIS
debate is suggestive of one raging in our own field, Library Science or
Information Science? Information Science evokes the vision of a
broad-based discipline with formal theoretical underpinnings and
applicability in many domains. Library Science has a rich and
specific tradition and a professional focus on a much narrower
set of
circumstances (i.e. those found in a library). Discipline or
application, good question!
Some of the common keywords found for articles in
C&G included sedimentation,
cretaceous, paelostress, and fault, all words that have strong ties to
specific geosciences like geology and geophysics. If these were
the most common, I would argue for "Geographic Information
Science." However, another category of keyword vastly outnumbered
the geoscience related terms--computer and information systems
related expressions. Words like software, Internet, sorting,
graphics, interactive, and phrases like object-oriented programming,
image processing, pattern recognition, and process
simulation. It is clear that
Computers
& Geosciences is focused on the use and application of
computer tools rather more than the broader geoscience
discipline.
In the mission statement of the journal, they refer to their niche as
"geocomputation" rather than geoscience.
Areas like GIS and LIS have much in common; they are
struggling to adopt some kind of unified vision, an understanding of
the boundaries of their field. A journal like
C&G fits in very nearly at one
end of the spectrum, providing articles for a sub-group of GIS
researchers. Even as the larger GIS community continues to try to
define its self, C&G should continue to be able to support a core
following of geo-computer scientists.
Biblometrics
and Webmetrics: An Essay
“I often say that when you can measure what you are
speaking about, and express it in numbers, you know something about it;
but when you cannot measure it, when you cannot express it in numbers,
your knowledge is of a meagre and unsatisfactory kind; it may be the
beginning of knowledge, but you have scarcely in your thoughts advanced
to the state of Science, whatever the matter may be.” (Kelvin, 1891)
Bibliometrics is the application of numerical and
statistical techniques to texts and documents. Publication
counts, co-term analysis, and citation analysis are examples of
specific bibliometric practices. When bibliometrics is used to
examine aspects of a specific scientific discipline, it is often
referred to as Scientometrics. A fundamental idea behind
bibliometrics is that, by measuring and analyzing the empirical
bibliographic data contained in books and journals, new knowledge can
be found in the trends and patterns.
Sociology has two distinct schools of research,
statistical and experiential. Those that use the statistical
approach create surveys, pour over population data, and present papers
with summary statistics and confidence intervals. Experiential
sociologists tend to get up-close and personal with the subjects of
their research; riding with a motorcycle gang to study aspects of
competitive group interaction. Of the two, bibliometrics clearly
has its roots in the former. To study a field like GIS, we
analyze the literature of the discipline, we do not attend a GIS
conference or take notes at a happy hour with the geography
department.
One form of research is not necessarily “better”
than another, but they do produce different types of results.
Anecdotal experience has been instrumental in giving birth to great
theories in many disciplines, but statistical research can be a more
reliable contributor when the numbers produced are meaningful. In
this context, meaningful means accurate, representative, and
generalizable—the criteria upon which bibliographic methods will be
evaluated for their usefulness in gaining an understanding of patterns
of scholarly communication.
Eugene Garfield contributed an enormous amount
toward making accurate bibliographic information available to scholars.
Data of this type was simply not available in nearly any scientific
field until he produced it. His seminal paper on citation
indexing (Garfield, 1955) advocated its creation and use to prevent
uncritical citation of data in scientific literature, but as a
side-effect it provided an essential tool for
bibliometrics. Garfield’s company, ISI, has become the
leader in citation indexing with products such as the Web Of Science
(WOS).
Not to disparage such an important contribution, but
there remain two accuracy problems with the information provided by
ISI’s products that can create barriers to bibliometric analysis,
completeness and dirty data. The WOS currently indexes
“approximately 8,500 of the most prestigious, high impact research
journals in the world,” (Thompson-ISI, 2004) but it is still far from
complete. If the author or the discipline being studied is
obscure, significant data might not be available in the citation
index. Dirty data can be a problem in analysis, generally in the
form of homonyms and allonyms of author’s names. (White, 2001)
Citations can be made for many reasons; biblometric
analysis makes no contextual evaluation. So, are they
representative of the connection for which we are giving them
credit? Cole describes the skepticism and early resistance to the
use of citations a qualitative measurement tool. (Cole, 2000) Papers
were written about the problems associated with citation counts and
what they really represented, for instance, how to take negative
citations in to account. The consensus seems to be that they are still
a useful indicator, but research continues on exactly what they
represent in a formal bibliometric sense. It can be asserted that
to study the patterns of scholarly communication, the type of citation
is not as important as the abstract connection that a citation
indicates.
“The Complementarity of Scientometrics and
Economics” (Diamond, 2000) is an excellent commentary on the
generalizability of bibliometric and scientometric techniques. He
argues that, not only are the principles behind scientometrics similar
to a particular branch of economics, but bibliometrics has been used to
validate economic theories. Patterns of communication, central to
bibliometrics, seem to be pervasive and applicable in many arenas.
In summary, there are still problems with the
accuracy and completeness of citation data, as well as unresolved
issues regarding how to take in to account the variable meanings of a
citation. Patterns of scholarly communication exist, and the
scientometric techniques for exploring them can even be generalized to
other areas. Bibliometrics is a useful way, and certainly the
best empirical method, we have for understanding the connections in the
literature of a discipline.
Webmetrics is potentially a much broader field than
bibliometrics. It is generally understood to be the collection
and analysis of descriptive data gathered during online
interactions. The whole new dimension of user behavior is added
to the list of characteristics that can be empirically measured.
In contrast to the bibliometrician who may have difficulty finding
enough information on a narrow topic, there is so much webmetric data
that the problem is assigning meaning and categorizing it—determining
what data is important and why.
Webmetrics is its closely tied to automation.
Data gathering must be implemented manually, but it can continue to run
indefinitely without consuming human resources. The information
is usually kept in a machine-readable form, making it easy to search,
analyze, and manipulate (functions that can also be automated to
various degrees). Inexpensive storage allows large quantities of
collected data to be saved for later scrutiny. There is
tremendous future potential for webmetrics in both academic and
commercial realms.
The challenges that must be overcome to allow
webmetrics to reach its potential fall in to two categories,
significance and privacy.
Significance is a label for the broad class of
problems related to determining what data to collect and what meaning
is associated with it. Similar to representation, a criteria used
to evaluate bibliometrics, significance asks how strongly the
information being collected correlates to the behavior or structure
being explained. If an individual clicks on a web page, are we to
interpret that to mean they read the whole page? Half the
page? Just the banner ads? Can the data be analyzed in such
a way as to weed out visits by web-crawler and webbot software?
How does the distributed nature of the Internet impact data
collection? Compared to scholarly journals and books, online
resources are both more numerous and less standardized. Is a
hypertext link analogous to a citation, and if so, is citation-style
analysis appropriate? Many open questions remain.
Google, the successful private-sector search
engine company, relies on the webmetric concept that links are
analogous to citations in its PageRank algorithm. “It is
based on the premise, prevalent in the world of academia that the
importance of a research paper can be judged by the number of citations
the paper has from other research papers. Brin and Page have
simply transferred this premise to its web equivalent: the importance
of a web page can be judged by the number of hyperlinks pointing to it
from other web pages.” (Calishain & Dornfest, 2003)
In the ALA's document, Privacy: An Interpretation of
the Library Bill of Rights it is stated, "The American Library
Association affirms that the rights of privacy are necessary for
intellectual freedom and are fundamental to the ethics and practice of
librarianship." (American Library Association, 2003) Free and
open inquiry is only possible where individuals have a reasonable
assurance that they can investigate the topics of their choice without
the scrutiny of others, be it by their peers, employer or government.
Bibliometrics does not attempt to collect data that
might be considered private. Sources are published works, written
for public consumption. In the course of collecting webmetric
data, an individual’s right to privacy could be compromised, either
intentionally or unintentionally. Webmetrics must not discourage
open inquiry by infringing on personal intellectual freedom.
There is an enormous amount of knowledge to be
discovered through the use and analysis of webmetrics. Scholars
and business people alike have a vested interest in being able to
extract meaningful content from the vast amount of data. Many
aspects of significance must be resolved—further research will be
required in online behavior, resource evaluation, and metadata.
But, the overriding information seeking behavior concern is to insure
that, even as data is collected, privacy and intellectual freedom will
be preserved.
References
American Library Association, O. f. I. F. (2003). Privacy: An
Interpretation of the Library Bill of Rights. Retrieved 15 Apr, 2004,
from
http://www.ala.org/ala/oif/statementspols/statementsif/librarybillrights.htm
Amin, M., & Mabe, M. (2000, October). Impact Factors: Use and
Abuse. Retrieved 20 April, 2004, from Elsevier Science, Perspectives in Publishing
1(1), http://www.ceraj.com/Downloads/Impact_factors.pdf
Bordons, M., & Gomez, I. (2000). Collaboration Networks in Science.
In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in
Honor of Eugene Garfield (pp. 197-213). Medford, NJ: Information
Today, Inc.
Calishain, T., & Dornfest, R. (2003). Google hacks : [100 industrial-strength
tips & tools]. Sebastopol, CA: O'Reilly.
Cole, J. R. (2000). A Short History of the Use of Citations as a
Measure of the Impact of Scientific and Scholarly Work. In B. Cronin
& H. B. Atkins (Eds.), The Web
of Knowledge, A Festschrift in Honor of Eugene Garfield (pp.
281-300). Medford, NJ: Information Today, Inc.
Cronin, B., & Atkins, H. B. (2000). The Scholar's Spoor. In B.
Cronin & H. B. Atkins (Eds.), The
Web of Knowledge, A Festschrift in Honor of Eugene Garfield (pp.
1-7). Medford, NJ: Information Today, Inc.
Davenport, E., & Blaise, C. (2000). The Citation Network as a
Prototype for Representing Trust in Virtual Environments. In B. Cronin
& H. B. Atkins (Eds.), The Web
of Knowledge, A Festschrift in Honor of Eugene Garfield (pp.
517-534). Medford, NJ: Information Today, Inc.
Diamond, J., Arthur M. (2000). The Complementarity of Scientometrics
and Economics. In B. Cronin & H. B. Atkins (Eds.), The Web of Knowledge, A Festschrift in
Honor of Eugene Garfield (pp. 321-336). Medford, NJ: Information
Today, Inc.
Garfield, E. (1955). Citation Indexes for Science: A New Dimension in
Documentation through Association of Ideas. Science, 122, 108-111.
Kelvin, W. T. (1891). Popular
lectures and addresses. London, New York,: Macmillan and Co.
Schubert, A. (2002). The Web of Scientometrics: A statistical overview
of the first 50 volumes of the journal. Scientometrics, 53(1), 3-20.
Thompson-ISI. (2004). Web of Science.
Retrieved 1 May, 2004, from http://www.isinet.com/products/citation/wos/
White, H. D. (2001a). Author-centered bibliometrics through
CAMEOs: characterizations automatically made and edited online. Scientometrics, 51(3), 608-637.
White, H. D. (2001b). Authors as Citers over Time. Journal of the American Society for
Information Science and Technology, 52(2), 87-108.