Home page
Home page

Cross-Lingual Textual Entailment Task @  SemEval 3

Cross-lingual Textual Entailment for Content Synchronization

The CLTE task is an application-oriented variant of the task of Recognizing Textual Entailment (RTE), targeting the content synchronization scenario. Given two documents about the same topic written in different languages (e.g. Wikipedia articles), the content synchronization problem consists of automatically detecting and resolving differences in the information they provide, in order to produce aligned, mutually enriched versions. Towards this objective, a crucial requirement is to identify the information in one page that is novel (or more specific) with respect to the content of the other. The task can be naturally cast as an entailment-related problem, with some variations with respect to the evaluation framework proposed by the RTE initiative. In particular, given a pair of topically related text fragments in different languages (e.g. T1 in English and T2 in Spanish), the CLTE task consists of automatically assigning to the pair one of the following judgements:

  •  Bidirectional (T1->T2 & T1<-T2): the two fragments entail each other (i.e. they are semantically equivalent);
  • Forward (T1->T2 & T!<-T2): strict unidirectional entailment from T1 to T2 (i.e. T1 is more specific and informative than T2);
  • Backward (T!->T2 & T1<-T2): strict unidirectional entailment from T2 to T1 (i.e. T2 is more specific and informative than T1);
  • Unknown (T!->T2 & T!<-T2): there is no entailment between T1 and T2 in both directions (i.e. T1 and T2 provide different information about the topic);

Spanish-English will be considered as a first language combination, but other combinations can be covered upon expressions of interest by participants. The dataset provided will consist of around 800 cross-lingual pairs (400 for development, 400 for test), balanced with respect to the 4 valid entailment judgements (bidirectional, forward, backward, and unknown). Systems’ performance will be measured by accuracy.

Task organizers

Matteo Negri (FBK)

Yashar Mehdad (FBK)

Luisa Bentivogli (FBK - CELCT)

Danilo Giampiccolo (CELCT)


For further information please see http://www.cs.york.ac.uk/semeval/task26/

© 2012 - Celct