"Information extraction in text mining" by Matt Mulins

Computer Science Graduate and Undergraduate Student Scholarship

Title

Information extraction in text mining

Authors

Matt Mulins, Western Washington University

College Affiliation

College of Science and Engineering

Document Type

Research Paper

Publication Date

2008

Department or Program Affiliation

Computer Science Department

Keywords

text mining, data mining information extraction, natural language processing, knowledge discovery

Abstract

Text mining’s goal, simply put, is to derive information from text. Using multitudes of technologies from overlapping fields like Data Mining and Natural Language Processing we can yield knowledge from our text and facilitate other processing. Information Extraction (IE) plays a large part in text mining when we need to extract this data. In this survey we concern ourselves with general methods borrowed from other fields, with lower-level NLP techniques, IE methods, text representation models, and categorization techniques, and with specific implementations of some of these methods. Finally, with our new understanding of the field we can discuss a proposal for a system that combines WordNet, Wikipedia, and extracted definitions and concepts from web pages into a user-friendly search engine designed for topicspecific knowledge.

Subject – LCSH

Data mining, Text processing (Computer science), Natural language processing (Computer science)

Publisher

Western Washington University

Recommended Citation

Mulins, Matt, "Information extraction in text mining" (2008). Computer Science Graduate and Undergraduate Student Scholarship. 4.
https://cedar.wwu.edu/computerscience_stupubs/4

Genre/Form

term papers

Type

Text

Rights

Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this document for commercial purposes, or for financial gain, shall not be allowed without the author’s written permission.

Language

English

Format

application/pdf

Download

Included in

Computer Sciences Commons

COinS

Western CEDAR