Extração de Relações Semânticas em Reivindicações de Patentes
Autores
5705 |
Danilo Silva de Carvalho
|
2624,131,162
|
5706 |
2624,131,162
|
|
5707 |
2624,131,162
|
Informações:
Publicações do PESC
In recent years, industrial economic focus has been changing all over the world, diverging from the production of tangible assets to the concept of Intellectual Property, for which protection is regulated by the patent system in many countries.
With the increasing number of granted patents, the management of innovation related information has become a very difficult task, leading to the development of several approaches for its automation. In such approaches, the use of Natural Language Processing techniques is predominant, but characteristics of those documents impose considerable difficulties to the use of such techniques without the employment of external resources, such as patent ontologies, limiting their application. This dissertation presents a method for information extraction from patent claims, by the identification of relevant units of meaning for the documents, in the form of text fragments called ``semantic segments''. This method uses only examples of already segmented claims as the starting point for extraction, thus being independent from external resources and can be applied to any type of patent. The hypothesis adopted in the course of this work was that there is a strong correlation between the form (syntax) and the meaning on factual texts, where the absence of ambiguity is an important requirement. The experiments conducted confirmed such hypothesis, showing that it is possible to distinguish and relate a significant part of the relevant information in the analyzed documents. The experiments have also shown that a small number of examples is enough for identifying the information with the most regular forms, and that the recall of the information obtained is positively related to the number of examples presented.