2010-03-23

Matic Korun - Phrase anchoring: outlining an approach

The presentation will address a concept for development of a text mining model. The core idea for the concept was derived from Koestler’s bisociation theory, which is also the framework in the Bison project. In general bisociation is concerned with identifying links among natively unconnected domains and development of link retrieval models.

On the other hand, phrase anchoring assumes that the same principles may be used to facilitate relations within a domain by probing the context that defines them. Hence, the taxonomy employed by the Bison project for notation of models is re-defined in a complementary fashion in order to elaborate an alternative view of bisociation.

At this point phase anchoring adopts a knowledge engineering approach that aims at developing a native method labeled ‘pool boundary estimation’, in regard to the bag-of-word model, which is substituted for a concept labeled ‘pools-of-word’. ‘Pools-of-words’ assumes that phrases (i.e. n-grams) can be used for splitting documents into meaningful portions linked by a concept; e.g. ‘on the other hand’. Thus, phrase anchoring refers to splitting a collection of vectors (i.e. documents) into at least two ‘pools-of-words’ suitable for analysis.

The objective shall be the development of a machine learning model. However, for now the concept rests on observation (i.e. experience) and is hence still intuitive. None the less, the accumulated knowledge base should permit the testing of established models for unstructured data analysis (i.e. text mining) under alternate conditions, in order to determine feasibility.

Wednesday, March 23rd, 14:00

0 komentarji: