Skip to main content
Log in

The TempEval challenge: identifying temporal relations in text

Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

TempEval is a framework for evaluating systems that automatically annotate texts with temporal relations. It was created in the context of the SemEval 2007 workshop and uses the TimeML annotation language. The evaluation consists of three subtasks of temporal annotation: anchoring an event to a time expression in the same sentence, anchoring an event to the document creation time, and ordering main events in consecutive sentences. In this paper we describe the TempEval task and the systems that participated in the evaluation. In addition, we describe how further task decomposition can bring even more structure to the evaluation of temporal relations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

  1. Event annotation is not as simple as annotating all expressions of the sort italicized in these examples, however. Negation and modal operators introduce another layer of complexity in the annotation process. For a full treatment of event annotation see Pustejovsky et al. (2003a).

  2. See http://fofoca.mitre.org/tern.html.

  3. See http://www.nist.gov/speech/tests/ace/.

  4. See http://www.nist.gov/speech/tests/ace/2007/doc/ace-evalplan.v1.3a.pdf.

  5. The training set consisted of 162 documents and the evaluation set of 20 documents. TimeBank 1.2 is available for free from the Linguistic Data Consortium at http://www.ldc.upenn.edu. The TempEval corpus is available at http://www.timeml.org.

  6. The scores were computed as micro-averages (i.e., averaged over all annotations rather than over documents). P-values <0.0001 for all scores. See Cohen (1960) for details on the Kappa score. Note that since all annotators were presented with the identical instances to annotate precision and recall will be the same and in fact the same as simple accuracy.

  7. The entry for USFD in the table is starred, as its developers were co-organizers of the TempEval task, although a strict separation was maintained at the site between people doing annotation work and those involved in system development.

  8. The lack of a significant difference for task B between XRCE-T and the baseline may appear puzzling, given the 10 point difference in f-measure. This is due to treating those tests instances to which XRCE-T did not assign a temporal relation as incorrect for purposes of the McNemar test (which requires a system response for each test instance). A similar move in calculating precision for the task would of course produce a lower f-measure score.

  9. The McNemar measure makes it possible for classifier 1 to differ significantly from classifier 2 and not from classifier 3 even if 2 and 3 have the same accuracy, as CU-TMP and NAIST do here, for instance.

  10. Bethard et al. (2007) also suggest using syntactic patterns.

  11. An extreme version of task decomposition would be to annotate relations based on lemmas or pairs of lemmas. For example, we could annotate the orderings of all instances of hear. We have decided not to follow this approach for two reasons: (i) data sparseness makes it unlikely that there are enough occurrences for many verbs to actually see this as a task, (ii) we expect that many verbs exhibit similar ordering characteristics. We have considered splitting on classes of verbs and it is clear that further research is needed to establish what classes we can employ.

  12. This works for both manually annotated data and results of automatic taggers. For manually annotated data we will take the results of adjudications, but assume that the inter-annotator agreement from the dual annotation phase is indicative of the precision. For automatic taggers we take the performance of the tagger on the task evaluation data.

  13. See Allen (1983) and Verhagen (2005) for details on the algorithm.

References

  • Aït-Mokhtar, S., Chanod, J.-P., & Roux, C. (2002). Robustness beyond shallowness: Increamental deep parsing. Natural Language Engineering, 8, 121–144.

    Article  Google Scholar 

  • Allen, J. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11), 832–843.

    Article  Google Scholar 

  • Allen, J. (1984). Towards a general theory of action and time. Artificial Intelligence, 23, 123–154.

    Article  Google Scholar 

  • Baker, C., Fillmore, C., & Lowe, J. (1998). The Berkeley FrameNet Project. In Joint 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computation Linguistics (COLING-ACL’98). pp. 86–90.

  • Bethard, S., & Martin, J. H. (2007). CU-TMP: Temporal relation classification using syntactic and semantic features. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 129–132). Prague, Czech Republic: Association for Computational Linguistics.

  • Bethard, S., Martin, J. H., & Klingenstein, S. (2007). Timelines from text: Identification of syntactic temporal relations. In ICSC ’07: Proceedings of the international conference on semantic computing (pp. 11–18). Washington, DC, USA: IEEE Computer Society.

  • Boguraev, B., & Ando, R. K. (2006). Analysis of TimeBank as a resource for TimeML parsing. In Language Resources and Evaluation Conference, LREC 2006. Genoa, Italy.

  • Boguraev, B., Pustejovsky, J., Ando, R., & Verhagen, M. (2007). TimeBank evolution as a community resource for TimeML parsing. Language Resource and Evaluation, 41(1), 91–115.

    Article  Google Scholar 

  • Bramsen, P., Deshpande P., Keok Y., & Barzilay, R. (2006). Inducting temporal graphs. In Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006) (pp. 189–198).

  • Chambers, N., Wang, S., & Jurafsky, D. (2007). Classifying temporal relations between events. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 173–176). Prague, Czech Republic: Association for Computational Linguistics.

  • Cheng, Y., Asahara, M., & Matsumoto, Y. (2007). NAIST.Japan: Temporal relation identification using dependency parsed tree. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 245–248). Prague, Czech Republic: Association for Computational Linguistics.

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.

    Article  Google Scholar 

  • Ferro, L., Mani, I., Sundheim, B., & Wilson, G. (2001). TIDES temporal annotation guidelines, version 1.0.2. Technical report, The MITRE Corporation, McLean, Virginia. Report MTR 01W0000041.

  • Filatova, E., & Hovy, E., (2001). Assigning time-stamps to event-clauses. In Proceedings of the 2001 ACL workshop on temporal and spatial information processing.

  • Freksa, C. (1992). Temporal reasoning based on semi-intervals. Artificial Intelligence, 54(1), 199–227.

    Article  Google Scholar 

  • Hagège, C., & Tannier, X. (2007). XRCE-T: XIP Temporal Module for TempEval campaign. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 492–495). Prague, Czech Republic: Association for Computational Linguistics.

  • Hepple, M., Setzer, A., & Gaizauskas, R. (2007). USFD: Preliminary exploration of features and classifiers for the TempEval-2007 task. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 438–441). Prague, Czech Republic: Association for Computational Linguistics.

  • Hovy, E., Marcus M., Palmer M., Ramshaw L., & Weischedel, R. (2006). OntoNotes: The 90% solution. In Proceedings of the human language technology conference of the NAACL, companion volume: Short papers (pp. 57–60). New York City, USA: Association for Computational Linguistics.

  • Katz, G., & Arosio, F. (2001). The annotation of temporal information in natural language sentences. In Proceedings of ACL-EACL 2001, workshop for temporal and spatial information processing (pp. 104–111). Toulouse, France.

  • Kim, J.-D., Ohta, T., & Tsujii, J. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(10).

  • Li, W., Wong, K.-F., & Yuan, C. (2005). A model for processing temporal references in Chinese. In The language of time. Oxford, UK: Oxford University Press.

  • Mani, I., Wellner, B., Verhagen, M., Lee, C. M., & Pustejovsky, J. (2006). Machine learning of temporal relations. In Proceedings of the 44th annual meeting of the association for computational linguistics. Sydney, Australia.

  • Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinkska, V., Young, B., & Grishman, R. (2004). The NomBank Project: An interim report. In Proceedings of HLT-EACL workshop: Frontiers in Corpus annotation.

  • Miltsakaki, E., Prasad, R., Joshi, A., & Webber, B. (2004). The Penn discourse Treebank. In Proceedings of fourth international conference on language resources and evaluation (LREC 2004).

  • Min, C., Srikanth, M., & Fowler, A. (2007). LCC-TE: A hybrid approach to temporal relation identification in news text. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 219–222). Prague, Czech Republic: Association for Computational Linguistics.

  • MUC-6. (1995). Proceedings of the sixth message understanding conference (MUC-6). Defense Advanced Research Projects Agency, Morgan Kaufmann.

  • MUC-7. (1998). Proceedings of the seventh message understanding conference (MUC-7). Defense Advanced Research Projects Agency. Available at http://www.itl.nist.gov/iaui/894.02/related_projects/muc.

  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1).

  • Puşcaşu, G. (2007). WVALI: Temporal relation identification by syntactico-semantic analysis. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 484–487). Prague, Czech Republic: Association for Computational Linguistics.

  • Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., & Katz, G. (2003a). TimeML: Robust specification of event and temporal expressions in text. In Proceedings of the fifth international workshop on computational semantics (IWCS-5). Tilburg.

  • Pustejovsky, J., Hanks, P., Saurí, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., & Lazo, M. (2003b) The TIMEBANK Corpus. In Proceedings of Corpus linguistics 2003 (pp. 647–656). Lancaster.

  • Pustejovsky, J., Knippen, R., Littman, J., & Saurí, R. (2005). Temporal and event information in natural language text. Language Resources and Evaluation, 39, 123–164.

    Article  Google Scholar 

  • Schilder, F. (1997). Temporal relations in English and German narrative discourse. Ph.D. thesis. Edinburgh, UK: University of Edinburgh.

  • Schilder, F., & Habel, C. (2001). From temporal expressions to temporal information: Semantic tagging of news messages. In Proceedings of the ACL-2001 workshop on temporal and spatial information processing (pp. 1–8). Toulouse, France: Association for Computational Linguistics.

  • Setzer, A., & Gaizauskas, R. (2000). Annotating events and temporal information in newswire texts. In LREC 2000.

  • Setzer, A., Gaizauskas, R., & Hepple, M. (2006). The role of inference in the temporal annotation and analysis of text. Journal of Language Resources and Evaluation, 39(2–3), 243–265.

    Google Scholar 

  • Verhagen, M. (2005). Temporal closure in an annotation environment. Language Resources and Evaluation, 39, 211–241.

    Article  Google Scholar 

  • Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., & Pustejovsky, J. (2007). SemEval-2007 Task 15: TempEval temporal relation identification. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 75–80). Prague, Czech Republic: Association for Computational Linguistics.

  • Vilain, M., Kautz, H., & van Beek, P. (1990). Constraint propagation algorithms: A revised report. In D.S. Weld & J. de Kleer (Eds.), Qualitative reasoning about physical systems (pp. 373–381). San Mateo, CA: Morgan Kaufman.

    Google Scholar 

  • Witten, I., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.

    Google Scholar 

Download references

Acknowledgements

We would like to thank the organizers of SemEval 2007: Eneko Agirre, Lluís Màrquez and Richard Wicentowski. TempEval may not have happened without SemEval as a home. Thanks also to the members of the six teams that participated in the TempEval task: Steven Bethard, James Martin, Congmin Min, Munirathnam Srikanth, Abraham Fowler, Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto, Andrea Setzer, Caroline Hagège, Xavier Tannier and Georgiana Puşcaşu. Additional help to prepare the data for the TempEval task came from Emma Barker, Yonit Boussany, Catherine Havasi, Emin Mimaroglu, Hongyuan Qiu, Anna Rumshisky, Roser Saurí and Amber Stubbs. Part of the work in this paper was carried out in the context of the DTO/AQUAINT program and funded under grant number N61339-06-C-0140, and part was performed under the UK MRC-funded CLEF-Services grant ref: GO300607.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Verhagen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verhagen, M., Gaizauskas, R., Schilder, F. et al. The TempEval challenge: identifying temporal relations in text. Lang Resources & Evaluation 43, 161–179 (2009). https://doi.org/10.1007/s10579-009-9086-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9086-z

Keywords

Navigation