Introduction
Information Extraction (IE) is the extraction of pertinent
information from large volumes of text. It is a descendant of the parent
fields of Computational Linguistics, Artificial Intelligence and Natural
Language Processing. It is considered to be advanced language technology.
Scope
The Information Extraction thesaurus is currently a thesaurus of 140
terms directly relating to aspects within Information Extraction. The
complete hierarchy of terms allows the user to grasp a concept of the
Information Extraction field. Many terms within Information Extraction
are taken directly from broader disciplines including Computer Science,
Linguistics and the more narrow fields of Computational Linguistics,
Artificial Intelligence and Natural Language Processing. Terms that
are used within current Information Extraction literature were taken
from these fields and structured as they relate to Information Extraction.
Purpose
The purpose of this thesaurus is to identify and collect terms that
are specifically related to Information Extraction and distinguish the
relationships between these terms. Through an analysis of current literature
on Information Extraction, terms were selected that are heavily used
for descriptions of extraction systems. Terms that name individual extraction
software are excluded from this thesaurus.
Structure
This thesaurus includes a 4-level hierarchy that begins with a general
level of terms accepted within the fields of Computational Linguistics,
Artificial Intelligence and Natural Language Processing. The second,
third and fourth level terms relate specifically to Information Extraction
and its characteristics. Polyhierarchy was not used for this structure
because of the specificity of this thesaurus. Instead, terms were placed
in the hierarchy most appropriate to their context within Information
Extraction.
Audience
The construction of this thesaurus was created specifically for the
purpose of indexing Information Extraction abstracts. Terms were chosen
that reappear in contemporary literature about Information Extraction.
Information Extraction is a constantly developing field that requires
new terms and coordination . This thesaurus uses simple terms that that
can be post-coordinated as new terms are created in the field. It also
uses pre-coordinated terms that are already apparent within the literature.