2010-05-31

Claudiu Mihaila: Making the Invisible Visible: Finding Romanian Zero Pronouns

Anaphora resolution is still a challenging research field in NLP, lacking in an algorithm that correctly resolves anaphoric pronouns. In addition, anaphoric zero pronouns pose an even greater challenge, since this category is not lexically realised. Thus, their resolution is conditioned by their prior identification stage. We present a new study on the distribution and identification of zero pronouns in Romanian. A Romanian corpus that includes legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments have been performed on the created corpus for the identification of verbs which have a zero pronoun in the subject position. The evaluation results highlight that zero pronouns appear frequently in Romanian, and their distribution depends largely on the genre. Additionally, a search scope for the antecedent has been determined, increasing the chances of correct resolution. Furthermore, more than 70% of the zero pronouns have been accurately identified by various machine learning algorithms. The strong similarity between our results and those obtained for other Romance languages support our conclusions.

Thursday, June 3rd, 13:00, Orange room.

0 komentarji: