Introduction

User Guide

Standards

Enter Thesaurus
Alphabetical
Hierarchical

Resources

 

Jenna Johnson
IRLS 601
Spring, 2002

Thesaurus


Introduction

Information Extraction (IE) is the extraction of pertinent information from large volumes of text. It is a descendant of the parent fields of Computational Linguistics, Artificial Intelligence and Natural Language Processing. It is considered to be advanced language technology.

Scope
The Information Extraction thesaurus is currently a thesaurus of 140 terms directly relating to aspects within Information Extraction. The complete hierarchy of terms allows the user to grasp a concept of the Information Extraction field. Many terms within Information Extraction are taken directly from broader disciplines including Computer Science, Linguistics and the more narrow fields of Computational Linguistics, Artificial Intelligence and Natural Language Processing. Terms that are used within current Information Extraction literature were taken from these fields and structured as they relate to Information Extraction.

Total Terms: 140
Preferred Terms: 115
Non-Preferred Terms: 19
Top Terms: 6

Purpose
The purpose of this thesaurus is to identify and collect terms that are specifically related to Information Extraction and distinguish the relationships between these terms. Through an analysis of current literature on Information Extraction, terms were selected that are heavily used for descriptions of extraction systems. Terms that name individual extraction software are excluded from this thesaurus.

Structure
This thesaurus includes a 4-level hierarchy that begins with a general level of terms accepted within the fields of Computational Linguistics, Artificial Intelligence and Natural Language Processing. The second, third and fourth level terms relate specifically to Information Extraction and its characteristics. Polyhierarchy was not used for this structure because of the specificity of this thesaurus. Instead, terms were placed in the hierarchy most appropriate to their context within Information Extraction.

Audience
The construction of this thesaurus was created specifically for the purpose of indexing Information Extraction abstracts. Terms were chosen that reappear in contemporary literature about Information Extraction. Information Extraction is a constantly developing field that requires new terms and coordination . This thesaurus uses simple terms that that can be post-coordinated as new terms are created in the field. It also uses pre-coordinated terms that are already apparent within the literature.