Abstract
As more systems become PROV-enabled, there will be a corresponding increase in the need to communicate provenance data directly to users. Whilst there are a number of existing methods for doing this — formally, diagrammatically, and textually — there are currently no application-generic techniques for generating linguistic explanations of provenance. The principal reason for this is that a certain amount of linguistic information is required to transform a provenance graph — such as in PROV — into a textual explanation, and if this information is not available as an annotation, this transformation is presently not possible.
In this paper, we describe how we have adapted the common ‘consensus’ architecture from the field of natural language generation to achieve this graph transformation, resulting in the novel PROVglish architecture. We then present an approach to garnering the necessary linguistic information from a PROV dataset, which involves exploiting the linguistic information informally encoded in the URIs denoting provenance resources. We finish by detailing an evaluation undertaken to assess the effectiveness of this approach to lexicalisation, demonstrating a significant improvement in terms of fluency, comprehensibility, and grammatical correctness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The Gazette is the official public record of the United Kingdom. For an example of their provenance trail, see https://www.thegazette.co.uk/notice/2184651/provenance.
References
Berners-Lee, T.: Universal Resource Identifiers - Axioms of Web Architecture, Technical note, World Wide Web Consortium (1996). https://www.w3.org/DesignIssues/Axioms.html
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)
Ell, B., Harth, A.: A language-independent method for the extraction of RDF verbalization templates. In: Proceedings of the 8th International Natural Language Generation Conference, Philadelphia, PA, USA (2014)
Gatt, A., Reiter, E.: SimpleNLG: a realisation engine for practical applications. In: Proceedings of the 12th European Workshop on Natural Language Generation, Athens, Greece, pp. 90–93 (2009)
Hoekstra, R., Groth, P.: PROV-O-Viz - understanding the role of activities in provenance. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 215–220. Springer, Heidelberg (2015)
Lester, J.C., Porter, B.W.: Developing and empirically evaluating robust explanation generators: the KNIGHT experiments. Comput. Linguist. 23(1), 65–101 (1997)
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)
Mellish, C., Dale, R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998)
Mellish, C., Scott, D., Cahill, L., Paiva, D., Evans, R., Reape, M.: A reference architecture for natural language generation systems. Nat. Lang. Eng. 12(1), 1–34 (2006)
Moreau, L., Missier, P.: PROV-DM: The PROV Data Model. Recommendation of the World Wide Web Consortium (2013). http://www.w3.org/TR/prov-dm
Moreau, L., Missier, P.: PROV-N: The Provenance Notation. Recommendation of the World Wide Web Consortium (2013). http://www.w3.org/TR/prov-n
Moreau, L.: Aggregation by provenance types: a technique for summarising provenance graphs. In: Proceedings of Graphs as Models 2015 (An ETAPS 2015 Workshop), in Electronic Proceedings in Theoretical Computer Science, London, UK, pp. 129–144 (2015)
Packer, H.S., Moreau, L.: Sentence templating for explaining provenance. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 278–280. Springer, Heidelberg (2015)
PROV Working Group: PROV Graph Layout Conventions, Technical note, World Wide Web Consortium. https://www.w3.org/2011/prov/wiki/Diagrams
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, New Brunswick, NJ (1996)
Reiter, E.: Has a consensus NL generation architecture appeared, and is it psycholinguistically plausible? In: Proceedings of the Seventh International Workshop on Natural Language Generation, Kennebunkport, ME, pp. 163–170 (1994)
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)
Richardson, D.P., Moreau, L., Mott, D.: Beyond the graph: telling the story with PROV and controlled English. In: Proceedings of the 2014 Annual Fall Meeting of the International Technology Alliance, Cardiff, UK (2014)
Sun, X., Mellish, C.: Domain independent sentence generation from RDF representations for the semantic web. In: Proceedings of the Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems, Riva del Garda, Italy (2006)
Toniolo, A., Wentao Ouywang, R., Dropps, T., Oren, N., Norman, T.J., Srivastava, M., Allen, J.A., de Mel, G., Sullivan, P., Mastin, S., Pearson, G.: Assessing the credibility of information in collaborative intelligence analysis. In: Proceedings of the Annual Fall Meeting of the International Technology Alliance, Cardiff, UK, p. 2014 (2014)
Acknowledgements
Research was sponsored by US Army Research laboratory and the UK Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defence, or the UK Government. The US and UK Governments are authorised to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. The investigations and human experiment were subject to ethics approvals ERGO-FPSE-16722 and ERGO-FPSE-16731, and the source data used to generate the sentence pairs was drawn from the Southampton Provenance Store (https://provenance.ecs.soton.ac.uk/store). The research data can be found at http://dx.doi.org/10.5258/SOTON/393255 and http://dx.doi.org/10.5258/SOTON/393257.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Richardson, D.P., Moreau, L. (2016). Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users. In: Mattoso, M., Glavic, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2016. Lecture Notes in Computer Science(), vol 9672. Springer, Cham. https://doi.org/10.1007/978-3-319-40593-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-40593-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40592-6
Online ISBN: 978-3-319-40593-3
eBook Packages: Computer ScienceComputer Science (R0)