MF-Ontology: an Ontology for the Text Mining Domain
Autores
4180 |
Daniel de Oliveira
|
51,893,1847
|
4181 |
51,893,1847
|
|
4182 |
51,893,1847
|
Informações:
Publicações do PESC
Text mining (TM) has emerged as a definitive technique for knowledge acquisition from text. The TM process is based on several phases that prepare the text for mining, process the text, and analyze the results. Effective and efficient use of the combination of TM algorithms and techniques is a challenge. Most of the research is focused on developing new data structures, algorithms and methods to achieve that. However, the TM process is still lacking of modeling support. The TM analyst faces many options when modeling a TM process. For instance, the analyst needs to choose the most effective solution to extract the desired knowledge. This is a complex decision involving choices for each one of the TM process phases where many algorithms and implementations are available for composition and several parameters must be tuned. This scenario tends to be chaotic and each time a new modeling starts, all this ad-hoc process is repeated. A first step towards this modeling is to add semantics to the TM process and register modeling results. The use of ontologies to describe the TM domain can help to structure the systematic composition of algorithms and techniques of the text mining process. By adopting the same structure, similar modeling can be identified and reuse of TM software components (web services, local applications) is facilitated. In this paper we describe the MF-Ontology, an ontology for the modeling of activity flow tailored to the TM domain. MF-Ontology that can be used to simplify the development of knowledge discovery applications based on texts. It represents a reference model to the different phases of text mining tasks, methodologies and software available in order to solve a problem. Thus, MF-Ontology offers semantic help for the TM analyst in finding the most appropriate solution. We describe the design of the MF-Ontology and analyze its different levels of abstraction to semantically represent the TM process. We also present an evaluation of MF-Ontology and show techniques for revising the ontology concepts based on interviews with specialists.